Re: Best practices on monitoring Solr

2015-12-23 Thread Emir Arnautovic

Hi Shail,
As William mentioned, our SPM  
allows you to monitor all main Solr/Jvm/Host metrics and also set up 
alerts for some values or use anomaly detection to notify you when 
something is about to be wrong. You can test all features for free for 
30 days (no credit card required). There is embedded chat if you have 
some questions.


HTH,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 23.12.2015 07:38, William Bell wrote:

Sematext.com has a service for this...

Or just curl "http://localhost:8983/solr//select?q=*:*" to see
if it returns ?

On Tue, Dec 22, 2015 at 12:15 PM, Tiwari, Shailendra <
shailendra.tiw...@macmillan.com> wrote:


Hi,

Last week our Solr Search was un-responsive and we need to re-boot the
server, but we were able to find out after customer complained about it.
What's best way to monitor that search is working?
We can always add Gomez alerts from UI.
What are the best practices?

Thanks

Shail








Solr Timeouts during query aggregation

2015-12-23 Thread Peter Lee
Greetings,

I'm having a hard time locating information about how solr handles timeouts (if 
it DOES handle them) during a search across multiple shards in a solr cloud 
configuration.

I have found information and have confirmed through empirical testing as to how 
solr handles timeouts on each shard when a query is made to a collection. What 
I have NOT been able to find is information or settings related to the time it 
takes Solr to aggregate the results returned from multiple shards before 
returning the response to the user. Does Solr not have any sort of timeout on 
this operation?

For clarity, I'll repeat my question and try to explain it in more detail.

If I send a query to a solr cloud setup that has 6 shards, the query will be 
sent to each of the 6 shards that will each return some number of hits. The 
answers from each of the shards is sent back to the server that originally 
caught the query, and that original server must then aggregate the data from 
all of the different shards to produce a single set of hits to return to the 
user. I see how to use "timeAllowed" to limit the time of the search on each 
shard...but I was wondering if there was a separate timeout for the 
"aggregation" step just before the response is returned.

I am asking this question because our existing search technology has this 
behavior and setting, and I and I am trying to determine if there is a related 
feature within the solr technology. At this point, since I have not seen any 
documentation nor configuration settings for this feature, I am ready to take 
it as truth that Solr does NOT include this functionality. However, I thought I 
should ask the mailing list to see if I've missed something.

Thank you!

Peter S. Lee, Software Engineer Lead
ProQuest | 789 E. Eisenhower Parkway | Ann Arbor, MI, 48106-1346 USA
www.proquest.com



Re: Best practices on monitoring Solr

2015-12-23 Thread Florian Gleixner
On 12/22/2015 08:15 PM, Tiwari, Shailendra wrote:
> Hi,
> 
> Last week our Solr Search was un-responsive and we need to re-boot the 
> server, but we were able to find out after customer complained about it. 
> What's best way to monitor that search is working?
> We can always add Gomez alerts from UI.
> What are the best practices?
> 
> Thanks
> 
> Shail
> 


Hi,

we are using check_mk to monitor our systems. You can monitor http
requests with a search query very easily. We also use jolokia (which
uses jmx) together with the check_mk jolokia plugin to monitor tomcats
and other java processes - also our zookeeper instances are monitored
this way. Next on my list is monitoring the solr instances with jolokia.
I can report here as soon as i have results.

check_mk: http://mathias-kettner.com/check_mk_download.php?LANG=en
Jolokia: https://jolokia.org/



signature.asc
Description: OpenPGP digital signature


Re: API accessible without authentication even though Basic Auth Plugin is enabled

2015-12-23 Thread Shawn Heisey
On 12/22/2015 11:45 PM, William Bell wrote:
> Why would someone stay on 5.3.x instead of upgrading to 5.4? Why backport
> when you can just upgrade?

We just had this same discussion on the dev list.  Anshum wants to cut a
5.3.2 release, somebody asked the same thing you did, and this is what I
wrote in reply:


I am using a third-party custom Solr plugin.  The latest version of that
plugin (which I have on my dev server) has only been certified to work
with Solr 5.3.x.  There's a chance that it won't work with 5.4, so I
cannot use that version yet.  If I happen to need any of the fixes that
are being backported, an official 5.3.2 release would allow me to use
official binaries, which will make my managers much more comfortable
than a version that I compile myself.

Additionally, the IT change policies in place for many businesses
require a huge amount of QA work for software upgrades, but those
policies may be relaxed for hotfixes and upgrades that are *only*
bugfixes.  For users operating under those policies, a bugfix release
will allow them to fix bugs immediately, rather than spend several weeks
validating a new minor release.

There is a huge amount of interest in the new security features in
5.3.x, functionality that has a number of critical problems.  Lots of
users who need those features have already deployed 5.3.1.  Many of the
critical problems are fixed in 5.4, and these are the fixes that Anshum
wants to make available in 5.3.2.  If a user is in either of the
situations that I outlined above, upgrading to 5.4 may be unrealistic.


I am already using a homegrown version (5.3.2-SNAPSHOT) for my dev
server, because I require a fix that's only in 5.4:  SOLR-6188.  Unless
the release manager objects, I plan to add this fix to 5.3.2.

Thanks,
Shawn



Re: Best practices on monitoring Solr

2015-12-23 Thread Jack Krupansky
Solr does have a monitoring wiki page, but it is fairly weak and could use
more serious contribution, including suggestions from this email thread.

This is also a good example of where the wiki still has value relative to
the formal Solr Reference Guide. E.g., third parties can add tool and
service descriptions and users can add their experiences and own tips. The
Reference Guide probably should have at least a cursory summary of Solr
monitoring (beyond just documenting JMX), probably simply referring users
to the wiki. IOW, details on monitoring are beyond the scope of the
Reference Guide itself (other than raw JMX and ping.)

-- Jack Krupansky

On Wed, Dec 23, 2015 at 6:27 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Shail,
> As William mentioned, our SPM 
> allows you to monitor all main Solr/Jvm/Host metrics and also set up alerts
> for some values or use anomaly detection to notify you when something is
> about to be wrong. You can test all features for free for 30 days (no
> credit card required). There is embedded chat if you have some questions.
>
> HTH,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 23.12.2015 07:38, William Bell wrote:
>
>> Sematext.com has a service for this...
>>
>> Or just curl "http://localhost:8983/solr//select?q=*:*" to
>> see
>> if it returns ?
>>
>> On Tue, Dec 22, 2015 at 12:15 PM, Tiwari, Shailendra <
>> shailendra.tiw...@macmillan.com> wrote:
>>
>> Hi,
>>>
>>> Last week our Solr Search was un-responsive and we need to re-boot the
>>> server, but we were able to find out after customer complained about it.
>>> What's best way to monitor that search is working?
>>> We can always add Gomez alerts from UI.
>>> What are the best practices?
>>>
>>> Thanks
>>>
>>> Shail
>>>
>>
>>
>>
>>
>


Re: Increasing Solr5 time out from 30 seconds while starting solr

2015-12-23 Thread Debraj Manna
Please see the logs below I am seeing :-

jabong@jabong1143:~/Downloads/software/dev/solr5$ sudo bin/solr start -p

Waiting to see Solr listening on port  [-]  Still not seeing Solr
listening on  after 30 seconds!
INFO  - 2015-12-23 16:23:46.006; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/carrot2-mini-3.9.0.jar'
to classloader
INFO  - 2015-12-23 16:23:46.006; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/simple-xml-2.7.jar'
to classloader
INFO  - 2015-12-23 16:23:46.006; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/hppc-0.5.2.jar'
to classloader
INFO  - 2015-12-23 16:23:46.006; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/mahout-math-0.6.jar'
to classloader
INFO  - 2015-12-23 16:23:46.007; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/simple-xml-2.7.jar'
to classloader
INFO  - 2015-12-23 16:23:46.007; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/jackson-core-asl-1.9.13.jar'
to classloader
INFO  - 2015-12-23 16:23:46.007; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/attributes-binder-1.2.1.jar'
to classloader
INFO  - 2015-12-23 16:23:46.008; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/jackson-mapper-asl-1.9.13.jar'
to classloader
WARN  - 2015-12-23 16:23:46.008; [   ]
org.apache.solr.core.SolrResourceLoader; Can't find (or read) directory to
add to classloader: /home/jabong/Downloads/software/dev/solr5/dist/
(resolved as: /home/jabong/Downloads/software/dev/solr5/dist).
INFO  - 2015-12-23 16:23:46.007; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/mahout-math-0.6.jar'
to classloader
INFO  - 2015-12-23 16:23:46.010; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/jackson-core-asl-1.9.13.jar'
to classloader
INFO  - 2015-12-23 16:23:46.010; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/attributes-binder-1.2.1.jar'
to classloader
INFO  - 2015-12-23 16:23:46.010; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/clustering/lib/jackson-mapper-asl-1.9.13.jar'
to classloader
WARN  - 2015-12-23 16:23:46.011; [   ]
org.apache.solr.core.SolrResourceLoader; Can't find (or read) directory to
add to classloader: /home/jabong/Downloads/software/dev/solr5/dist/
(resolved as: /home/jabong/Downloads/software/dev/solr5/dist).
INFO  - 2015-12-23 16:23:46.065; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/langid/lib/langdetect-1.1-20120112.jar'
to classloader
INFO  - 2015-12-23 16:23:46.065; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/langid/lib/langdetect-1.1-20120112.jar'
to classloader
INFO  - 2015-12-23 16:23:46.065; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/langid/lib/jsonic-1.2.7.jar'
to classloader
INFO  - 2015-12-23 16:23:46.065; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/langid/lib/jsonic-1.2.7.jar'
to classloader
WARN  - 2015-12-23 16:23:46.065; [   ]
org.apache.solr.core.SolrResourceLoader; Can't find (or read) directory to
add to classloader: /home/jabong/Downloads/software/dev/solr5/dist/
(resolved as: /home/jabong/Downloads/software/dev/solr5/dist).
WARN  - 2015-12-23 16:23:46.066; [   ]
org.apache.solr.core.SolrResourceLoader; Can't find (or read) directory to
add to classloader: /home/jabong/Downloads/software/dev/solr5/dist/
(resolved as: /home/jabong/Downloads/software/dev/solr5/dist).
INFO  - 2015-12-23 16:23:46.274; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/velocity/lib/commons-beanutils-1.8.3.jar'
to classloader
INFO  - 2015-12-23 16:23:46.274; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/velocity/lib/commons-beanutils-1.8.3.jar'
to classloader
INFO  - 2015-12-23 16:23:46.274; [   ]
org.apache.solr.core.SolrResourceLoader; Adding
'file:/home/jabong/Downloads/software/dev/solr5/contrib/velocity/lib/velocity-1.7.jar'
to classloader
INFO  - 2015-12-23 16:23:46.274; [   ]

Re: Streaming Expressions (/stream) NPE

2015-12-23 Thread Jason Gerlowski
Thanks for the heads up Joel.  Glad this was just user error, and not an
actual problem.

Though it is interesting that Solr's response didn't contain any
information about what was wrong.  I probably would've expected a message
to the effect of: "the required parameter 'expr' was not found".

Also, it was a little disappointing that when the thrown exception has no
message, ExceptionStream puts 'null' in the EXCEPTION Tuple (i.e.
{"EXCEPTION":null,"EOF":true}).  It might be nice if the name/type of the
exception was used when no message can be found.

I'd be happy to create JIRAs and push up a patch for one/both of those
behaviors if people agree that this would make the API a little nicer.

Thanks again Joel.

Best,

Jason

On Tue, Dec 22, 2015 at 10:06 PM, Joel Bernstein  wrote:

> The http parameter "stream" was recently changed to "expr" in SOLR-8443.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Dec 22, 2015 at 8:45 PM, Jason Gerlowski 
> wrote:
>
> > I'll preface this email by saying that I wasn't sure which mailing list
> it
> > belonged on.  It might fit on the dev list (since it involves a potential
> > Solr bug), but maybe the solr-users list is a better choice (since I'm
> > probably just misusing Solr).  I settled on the solr-users list.  Sorry
> if
> > I chose incorrectly.
> >
> > Moving on...
> >
> > I've run into a NullPointerException when trying to use the /stream
> > handler.  I'm not sure whether I'm doing something wrong with the
> commands
> > I'm sending to Solr via curl, or if there's an underlying bug causing
> this
> > behavior.
> >
> > I'm making the stream request:
> >
> > curl --data-urlencode 'stream=search(gettingstarted, q="*:*",
> > fl="title,url", sort="_version_ asc", rows="10")'
> > "localhost:8983/solr/gettingstarted/stream"
> >
> > Solr responds with:
> >
> > {"result-set":{"docs":[
> > {"EXCEPTION":null,"EOF":true}]}}
> >
> > At this point, I assumed that something was wrong with my command, so I
> > checked the solr-logs for a hint at the problem.  I found:
> >
> > ERROR - 2015-12-23 01:32:32.535; [c:gettingstarted s:shard2 r:core_node2
> > x:gettingstarted_shard2_replica2] org.apache.solr.common.SolrException;
> > java.lang.NullPointerException
> >   at
> >
> >
> org.apache.solr.client.solrj.io.stream.expr.StreamExpressionParser.generateStreamExpression(StreamExpressionParser.java:47)
> >   at
> >
> >
> org.apache.solr.client.solrj.io.stream.expr.StreamExpressionParser.parse(StreamExpressionParser.java:38)
> >   at
> >
> >
> org.apache.solr.client.solrj.io.stream.expr.StreamFactory.constructStream(StreamFactory.java:168)
> >   at
> >
> >
> org.apache.solr.handler.StreamHandler.handleRequestBody(StreamHandler.java:155)
> >   at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
> >
> > Has anyone seen this behavior before?  Is Solr reacting to something
> amiss
> > in my request, or is there maybe a bug here?  I'll admit this is my first
> > attempt at using the /stream API, so I might be getting something wrong
> > here.  I consulted the reference guide's examples on using the streaming
> > API (
> > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions)
> > when coming up with my curl command, but I might've missed something.
> >
> > Anyways, I'd appreciate any insight that anyone can offer on this.  If it
> > helps, I've included reproduction steps below.
> >
> > 1.) Download and compile Solr trunk.
> > 2.) Start Solr using one of the examples (bin/solr start -e cloud).
> Accept
> > default values.
> > 3.) Index some docs (bin/post -c gettingstarted
> > http://lucene.apache.org/solr -recursive 1 -delay 1)
> > 4.) Do a search to sanity check the ingestion (curl
> > "localhost:8983/solr/gettingstarted/select?q=*:*=json")
> > 5.) Make a /stream request for some docs (curl --data-urlencode
> > 'stream=search(gettingstarted, q="*:*", fl="title,url", sort="_version_
> > asc", rows="10")' "localhost:8983/solr/gettingstarted/stream")
> >
> > Thanks again for any ideas/help anyone can give.
> >
> > Best,
> >
> > Jason
> >
>


Re: Best practices on monitoring Solr

2015-12-23 Thread Ahmet Arslan


Hi,

http://newrelic.com is another option.

Ahmet

On Wednesday, December 23, 2015 4:26 PM, Florian Gleixner  
wrote:



On 12/22/2015 08:15 PM, Tiwari, Shailendra wrote:
> Hi,
> 
> Last week our Solr Search was un-responsive and we need to re-boot the 
> server, but we were able to find out after customer complained about it. 
> What's best way to monitor that search is working?
> We can always add Gomez alerts from UI.
> What are the best practices?
> 
> Thanks
> 
> Shail

> 


Hi,

we are using check_mk to monitor our systems. You can monitor http
requests with a search query very easily. We also use jolokia (which
uses jmx) together with the check_mk jolokia plugin to monitor tomcats
and other java processes - also our zookeeper instances are monitored
this way. Next on my list is monitoring the solr instances with jolokia.
I can report here as soon as i have results.

check_mk: http://mathias-kettner.com/check_mk_download.php?LANG=en
Jolokia: https://jolokia.org/


ToParentBlockJoinQuery.java

2015-12-23 Thread Rick Leir
Hi all,

This is working fine for me, searching for 'charlie':
$ curl http://localhost:8983/solr/dorsetdata/query -d '
q={!parent which="content_type:parentDocument" score=total} type:page AND
charlie
=json
=2
=true
=score,[child parentFilter=content_type:parentDocument
childFilter=charlie],*,[docid]'

I would like to put conditions on the parent document, so I tried this,
adding ' AND lang:eng':

$ curl http://localhost:8983/solr/dorsetdata/query -d '
q={!parent which="content_type:parentDocument AND lang:eng" score=total}
type:page AND charlie
=json
=2
=true
=score,[child parentFilter=content_type:parentDocument
childFilter=charlie],*,[docid]'

I got a Java exception. Maybe there is a syntax problem, or maybe it is not
possible?

5.4
Schemaless

Thanks
Rick

"msg":"child query must only match non-parent docs, but parent
docID=2147483647 matched childScorer=class
org.apache.lucene.search.ConjunctionScorer",

"java.lang.IllegalStateException: child query must only match non-parent
docs, but parent docID=2147483647 matched childScorer=class
org.apache.lucene.search.ConjunctionScorer
at
org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.nextDoc(ToParentBlockJoinQuery.java:311)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:216)
at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:169)
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:821)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)
at
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:202)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1672)

For the working query the debug is
  "debug":{
"rawquerystring":"{!parent which=\"content_type:parentDocument\"
score=total} type:page AND charlie",
"querystring":"{!parent which=\"content_type:parentDocument\"
score=total} type:page AND charlie",
"parsedquery":"ToParentBlockJoinQuery(ToParentBlockJoinQuery
(+type:page +_text_:charlie))",
"parsedquery_toString":"ToParentBlockJoinQuery (+type:page
+_text_:charlie)",
"explain":{
  "76039":"\n6.8171363 = Score based on child doc range from 3299331 to
3299911\n",
  "78579":"\n6.613722 = Score based on child doc range from 3356914 to
3357359\n"},
"QParser":"BlockJoinParentQParser",


Re: How to check when a search exceeds the threshold of timeAllowed parameter

2015-12-23 Thread Jeff Wartes
Looks like it’ll set partialResults=true on your results if you hit the 
timeout. 

https://issues.apache.org/jira/browse/SOLR-502

https://issues.apache.org/jira/browse/SOLR-5986






On 12/22/15, 5:43 PM, "Vincenzo D'Amore"  wrote:

>Well... I can write everything, but really all this just to understand
>when timeAllowed
>parameter trigger a partial answer? I mean, isn't there anything set in the
>response when is partial?
>
>On Wed, Dec 23, 2015 at 2:38 AM, Walter Underwood 
>wrote:
>
>> We need to know a LOT more about your site. Number of documents, size of
>> index, frequency of updates, length of queries approximate size of server
>> (CPUs, RAM, type of disk), version of Solr, version of Java, and features
>> you are using (faceting, highlighting, etc.).
>>
>> After that, we’ll have more questions.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> > On Dec 22, 2015, at 4:58 PM, Vincenzo D'Amore 
>> wrote:
>> >
>> > Hi All,
>> >
>> > my website is under pressure, there is a big number of concurrent
>> searches.
>> > When the connected users are too many, the searches becomes so slow that
>> in
>> > some cases users have to wait many seconds.
>> > The queue of searches becomes so long that, in same cases, servers are
>> > blocked trying to serve all these requests.
>> > As far as I know because some searches are very expensive, and when many
>> > expensive searches clog the queue server becomes unresponsive.
>> >
>> > In order to quickly workaround this herd effect, I have added a
>> > default timeAllowed to 15 seconds, and this seems help a lot.
>> >
>> > But during stress tests but I'm unable to understand when and what
>> requests
>> > are affected by timeAllowed parameter.
>> >
>> > Just be clear, I have configure timeAllowed parameter in a SolrCloud
>> > environment, given that partial results may be returned (if there are
>> any),
>> > how can I know when this happens? When the timeAllowed parameter trigger
>> a
>> > partial answer?
>> >
>> > Best regards,
>> > Vincenzo
>> >
>> >
>> >
>> > --
>> > Vincenzo D'Amore
>> > email: v.dam...@gmail.com
>> > skype: free.dev
>> > mobile: +39 349 8513251
>>
>>
>
>
>-- 
>Vincenzo D'Amore
>email: v.dam...@gmail.com
>skype: free.dev
>mobile: +39 349 8513251


Re: ToParentBlockJoinQuery.java

2015-12-23 Thread Yonik Seeley
On Wed, Dec 23, 2015 at 11:50 AM, Rick Leir  wrote:
> I would like to put conditions on the parent document, so I tried this,
> adding ' AND lang:eng':
>
> $ curl http://localhost:8983/solr/dorsetdata/query -d '
> q={!parent which="content_type:parentDocument AND lang:eng" score=total}
> type:page AND charlie

"which" and "of" should always identify the complete set of parent
documents, not any desired subset.

If you want conditions/filters on the parent document, that's easy...
your query is already mapping to parents, so simply add another "fq"
param.

q={!parent...}   // this puts us in the "parent" domain
fq=lang:eng
fq=another_filter_on_parent_documents

-Yonik


Re: Streaming Expressions (/stream) NPE

2015-12-23 Thread Joel Bernstein
Thanks for the feedback. Yes, please create the ticket. I believe all that
needs to be done is to check for null and then throw a better exception.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Dec 23, 2015 at 9:15 AM, Jason Gerlowski 
wrote:

> Thanks for the heads up Joel.  Glad this was just user error, and not an
> actual problem.
>
> Though it is interesting that Solr's response didn't contain any
> information about what was wrong.  I probably would've expected a message
> to the effect of: "the required parameter 'expr' was not found".
>
> Also, it was a little disappointing that when the thrown exception has no
> message, ExceptionStream puts 'null' in the EXCEPTION Tuple (i.e.
> {"EXCEPTION":null,"EOF":true}).  It might be nice if the name/type of the
> exception was used when no message can be found.
>
> I'd be happy to create JIRAs and push up a patch for one/both of those
> behaviors if people agree that this would make the API a little nicer.
>
> Thanks again Joel.
>
> Best,
>
> Jason
>
> On Tue, Dec 22, 2015 at 10:06 PM, Joel Bernstein 
> wrote:
>
> > The http parameter "stream" was recently changed to "expr" in SOLR-8443.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Dec 22, 2015 at 8:45 PM, Jason Gerlowski 
> > wrote:
> >
> > > I'll preface this email by saying that I wasn't sure which mailing list
> > it
> > > belonged on.  It might fit on the dev list (since it involves a
> potential
> > > Solr bug), but maybe the solr-users list is a better choice (since I'm
> > > probably just misusing Solr).  I settled on the solr-users list.  Sorry
> > if
> > > I chose incorrectly.
> > >
> > > Moving on...
> > >
> > > I've run into a NullPointerException when trying to use the /stream
> > > handler.  I'm not sure whether I'm doing something wrong with the
> > commands
> > > I'm sending to Solr via curl, or if there's an underlying bug causing
> > this
> > > behavior.
> > >
> > > I'm making the stream request:
> > >
> > > curl --data-urlencode 'stream=search(gettingstarted, q="*:*",
> > > fl="title,url", sort="_version_ asc", rows="10")'
> > > "localhost:8983/solr/gettingstarted/stream"
> > >
> > > Solr responds with:
> > >
> > > {"result-set":{"docs":[
> > > {"EXCEPTION":null,"EOF":true}]}}
> > >
> > > At this point, I assumed that something was wrong with my command, so I
> > > checked the solr-logs for a hint at the problem.  I found:
> > >
> > > ERROR - 2015-12-23 01:32:32.535; [c:gettingstarted s:shard2
> r:core_node2
> > > x:gettingstarted_shard2_replica2] org.apache.solr.common.SolrException;
> > > java.lang.NullPointerException
> > >   at
> > >
> > >
> >
> org.apache.solr.client.solrj.io.stream.expr.StreamExpressionParser.generateStreamExpression(StreamExpressionParser.java:47)
> > >   at
> > >
> > >
> >
> org.apache.solr.client.solrj.io.stream.expr.StreamExpressionParser.parse(StreamExpressionParser.java:38)
> > >   at
> > >
> > >
> >
> org.apache.solr.client.solrj.io.stream.expr.StreamFactory.constructStream(StreamFactory.java:168)
> > >   at
> > >
> > >
> >
> org.apache.solr.handler.StreamHandler.handleRequestBody(StreamHandler.java:155)
> > >   at
> > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
> > >
> > > Has anyone seen this behavior before?  Is Solr reacting to something
> > amiss
> > > in my request, or is there maybe a bug here?  I'll admit this is my
> first
> > > attempt at using the /stream API, so I might be getting something wrong
> > > here.  I consulted the reference guide's examples on using the
> streaming
> > > API (
> > > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> )
> > > when coming up with my curl command, but I might've missed something.
> > >
> > > Anyways, I'd appreciate any insight that anyone can offer on this.  If
> it
> > > helps, I've included reproduction steps below.
> > >
> > > 1.) Download and compile Solr trunk.
> > > 2.) Start Solr using one of the examples (bin/solr start -e cloud).
> > Accept
> > > default values.
> > > 3.) Index some docs (bin/post -c gettingstarted
> > > http://lucene.apache.org/solr -recursive 1 -delay 1)
> > > 4.) Do a search to sanity check the ingestion (curl
> > > "localhost:8983/solr/gettingstarted/select?q=*:*=json")
> > > 5.) Make a /stream request for some docs (curl --data-urlencode
> > > 'stream=search(gettingstarted, q="*:*", fl="title,url", sort="_version_
> > > asc", rows="10")' "localhost:8983/solr/gettingstarted/stream")
> > >
> > > Thanks again for any ideas/help anyone can give.
> > >
> > > Best,
> > >
> > > Jason
> > >
> >
>


Re: Best practices on monitoring Solr

2015-12-23 Thread Debraj Manna
We use datadog .

On Thu, Dec 24, 2015 at 12:44 AM, Ahmet Arslan 
wrote:

>
>
> Hi,
>
> http://newrelic.com is another option.
>
> Ahmet
>
> On Wednesday, December 23, 2015 4:26 PM, Florian Gleixner 
> wrote:
>
>
>
> On 12/22/2015 08:15 PM, Tiwari, Shailendra wrote:
> > Hi,
> >
> > Last week our Solr Search was un-responsive and we need to re-boot the
> server, but we were able to find out after customer complained about it.
> What's best way to monitor that search is working?
> > We can always add Gomez alerts from UI.
> > What are the best practices?
> >
> > Thanks
> >
> > Shail
>
> >
>
>
> Hi,
>
> we are using check_mk to monitor our systems. You can monitor http
> requests with a search query very easily. We also use jolokia (which
> uses jmx) together with the check_mk jolokia plugin to monitor tomcats
> and other java processes - also our zookeeper instances are monitored
> this way. Next on my list is monitoring the solr instances with jolokia.
> I can report here as soon as i have results.
>
> check_mk: http://mathias-kettner.com/check_mk_download.php?LANG=en
> Jolokia: https://jolokia.org/
>


Re: Providing own _version field in solr doc

2015-12-23 Thread Debraj Manna
For my use case I tried document centric versioning as mentioned here
.
But In my case this is not working I am seeing the document having version
older is overwriting the newer ones. I have attached my solrconfig.xml. I
have also added my version field in schema.xml as shown below:-


I am updating the doc with solrJ as below:-
solrClient.add(doc);
solrClient.commit();

I am using solr 5.2.1

Can someone let me know what I am doing wrong?

On Tue, Dec 22, 2015 at 9:29 PM, Debraj Manna 
wrote:

> Hi Alex,
>
> Can you let us know what do you mean by
>
> *"timestamps" are truly atomic and not local clock-based." ?*
>
> *Thanks,*
>
> On Mon, Dec 14, 2015 at 10:53 PM, Alexandre Rafalovitch <
> arafa...@gmail.com> wrote:
>
>> At the first glance, this sounds like a perfect match to
>>
>> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-DocumentCentricVersioningConstraints
>>
>> Just make sure your "timestamps" are truly atomic and not local
>> clock-based. The drift could cause interesting problems.
>>
>> Regards,
>>Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 14 December 2015 at 12:17, Debraj Manna 
>> wrote:
>> > We have a use case in which there are multiple clients writing
>> concurrently
>> > to solr. Each of the doc is having an 'timestamp' field which indicates
>> > when these docs were generated.
>> >
>> > We also have to ensure that any old doc doesn't overwrite any new doc in
>> > solr. So to achieve this we were thinking if we can make use of the
>> > _version field in solr doc and set the _version field equal to the
>> > 'timestamp' field that is present in each doc.
>> >
>> > Can someone let me know if the approach that we thought can be done? If
>> not
>> > can someone suggest some other approach of achieving the same with
>> minimum
>> > calls to solr?
>>
>
>




  5.0.0
 
  
  

  
  

  
  

  
  
  ${solr.data.dir:}
  

  

  
  
${solr.lock.type:native}

	3
	1
 true
  

  
  

  ${solr.ulog.dir:}


  
   ${solr.autoCommit.maxTime:15000} 
   false 
 
  

  
1024
-1







true
20

200


  
  


  
  

false
2

  
  



  


 
   explicit
   10
   false
 


	
 
   sku
   none
   true
	   400
   1 
   json
   map
   0.01
 
  

  
  

 sku
 
	 autoPhrasingParser
 none
 true
 1 
 json
 map
 0.01
 name^2.0 brand^2.0 category^2.0 fulltext^1.0
 name^2.5 fulltext^1.5
 100%
 0


   elevator

  
  
	   
autophrases.txt
_
edismax




suggester
org.apache.solr.spelling.suggest.Suggester
com.jabong.plugin.JBAnalyzingInfixLookupFactory
suggestion_terms
suggestions.txt
${solr.suggestions.dir:solr/discovery/suggestions}
string
false


suggesterB
org.apache.solr.spelling.suggest.Suggester
com.jabong.plugin.JBAnalyzingInfixLookupFactory
suggestion_terms
suggestions_b.txt
${solr.suggestions.dir:solr/discovery/suggestionsB}
string
false

 
  
 





true
suggester
50
true


suggester





true
suggesterB
50
true


suggester


  
  

  explicit
  json
  true

  


  

  explicit

  



  

  _text_

  

  

  
  _src_
  
  true

  

  

  true
  ignored_
  _text_

  


  

  {!xport}
  xsort
  false



  query

  



  

  json
  false

  




  



  


  

  explicit
  true

  



none
true
1
json
map
*:*
0
count
-1
suggestions






none
true
1
json
map
*:*
0
count
-1
suggestion_segment
suggestion_category
suggestion_brand
suggestion_ty
suggestion_catsegment
suggestion_brandsegment

Re: Providing own _version field in solr doc

2015-12-23 Thread Shawn Heisey
On 12/23/2015 10:30 AM, Debraj Manna wrote:
> For my use case I tried document centric versioning as mentioned here
> .
> But In my case this is not working I am seeing the document having
> version older is overwriting the newer ones. I have attached my
> solrconfig.xml. I have also added my version field in schema.xml as
> shown below:-
> 
>
> I am updating the doc with solrJ as below:-
> solrClient.add(doc);
> solrClient.commit();
>
> I am using solr 5.2.1
>
> Can someone let me know what I am doing wrong?

You have added the doc-based version processor to a processor chain
named "add-unknown-fields-to-the-schema" ... but I do not see anywhere
in your configuration where you have opted to *use* this processor
chain, and that chain has not been designated as the default chain.  I
don't think that the processor chain is actually being used, so nothing
related to your doc_version field is being honored.

In the data-driven example config, the processor chain is activated with
the following config:

  

  add-unknown-fields-to-the-schema

  

Thanks,
Shawn



Re: Providing own _version field in solr doc

2015-12-23 Thread Debraj Manna
Thanks Shawn. But after making the below changes (making versionable_chain
as default) all my update,inserts & deleteByQuery are failing with the
error "*Error from server at http://localhost:/solr/discovery
: DistributedUpdateProcessor must
follow DocBasedVersionConstraintsProcessor*"



  doc_version
  false

  

On Thu, Dec 24, 2015 at 12:06 AM, Shawn Heisey  wrote:

> On 12/23/2015 10:30 AM, Debraj Manna wrote:
> > For my use case I tried document centric versioning as mentioned here
> > <
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-DocumentCentricVersioningConstraints
> >.
> > But In my case this is not working I am seeing the document having
> > version older is overwriting the newer ones. I have attached my
> > solrconfig.xml. I have also added my version field in schema.xml as
> > shown below:-
> > 
> >
> > I am updating the doc with solrJ as below:-
> > solrClient.add(doc);
> > solrClient.commit();
> >
> > I am using solr 5.2.1
> >
> > Can someone let me know what I am doing wrong?
>
> You have added the doc-based version processor to a processor chain
> named "add-unknown-fields-to-the-schema" ... but I do not see anywhere
> in your configuration where you have opted to *use* this processor
> chain, and that chain has not been designated as the default chain.  I
> don't think that the processor chain is actually being used, so nothing
> related to your doc_version field is being honored.
>
> In the data-driven example config, the processor chain is activated with
> the following config:
>
>   
> 
>   add-unknown-fields-to-the-schema
> 
>   
>
> Thanks,
> Shawn
>
>


Re: ToParentBlockJoinQuery.java

2015-12-23 Thread Rick Leir
> If you want conditions/filters on the parent document, that's easy...
> your query is already mapping to parents, so simply add another "fq"
> param.

That is perfect. Thanks!!
Cheers -- Rick


Re: Providing own _version field in solr doc

2015-12-23 Thread Debraj Manna
Thanks Shawn :) .
On Dec 24, 2015 1:12 AM, "Shawn Heisey"  wrote:

> On 12/23/2015 12:25 PM, Debraj Manna wrote:
> > Thanks Shawn. But after making the below changes (making
> versionable_chain
> > as default) all my update,inserts & deleteByQuery are failing with the
> > error "*Error from server at http://localhost:/solr/discovery
> > : DistributedUpdateProcessor must
> > follow DocBasedVersionConstraintsProcessor*"
> >
> > 
> > 
> >   doc_version
> >   false
> > 
> >   
>
> I believe that the standard update processor chain looks like this:
>
>   
>   
>   
>
> I think that when you define a processor chain but do not include these
> default processors, then they are automatically added in a predetermined
> order and what you end up with is this:
>
>   
>   
>   
>   
>
> The error message you are getting suggests that this order will not work
> and what you actually need to do is the following:
>
> 
>   
> doc_version
> false
>   
>   
>   
>   
> 
>
> Thanks,
> Shawn
>
>


Solr 6 - Relational Index querying

2015-12-23 Thread Troy Edwards
In Solr 5.1.0 we had to flatten out two collections into one

Item - about 1.5 million items with primary key - ItemId (this mainly
contains item description)

FacilityItem - about 10,000 facilities - primary key - FacilityItemId
(pricing information for each facility) - ItemId points to Item

We are currently using this index for only about 200 facilities. We are
using edismax parser to query and boost results

I am hoping that in Solr 6 with Parallel SQL or stream innerJoin we can use
two collections so that it will be helpful in doing updates.

But so far I have not seen something that will exactly fit what we need.

Any thoughts/suggestions on what documentation to read or any samples on
how to approach what we are trying to achieve?

Thanks


Re: mlt and document boost

2015-12-23 Thread CrazyDiamond
So no way to apply boost to mlt or any other way to change order of document
in mlt result? also may be there is a way to make to mlt query  at once and
merge.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/mlt-and-document-boost-tp4246522p4247154.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Providing own _version field in solr doc

2015-12-23 Thread Shawn Heisey
On 12/23/2015 12:25 PM, Debraj Manna wrote:
> Thanks Shawn. But after making the below changes (making versionable_chain
> as default) all my update,inserts & deleteByQuery are failing with the
> error "*Error from server at http://localhost:/solr/discovery
> : DistributedUpdateProcessor must
> follow DocBasedVersionConstraintsProcessor*"
>
> 
> 
>   doc_version
>   false
> 
>   

I believe that the standard update processor chain looks like this:

  
  
  

I think that when you define a processor chain but do not include these
default processors, then they are automatically added in a predetermined
order and what you end up with is this:

  
  
  
  

The error message you are getting suggests that this order will not work
and what you actually need to do is the following:


  
doc_version
false
  
  
  
  


Thanks,
Shawn



Re: 5.4 facet performance thumbs-up

2015-12-23 Thread Yonik Seeley
Awesome, thanks for the feedback!

-Yonik

On Tue, Dec 22, 2015 at 5:36 PM, Aigner, Max  wrote:
> I'm happy to report that we are seeing significant speed-ups in our queries 
> with Json facets on 5.4 vs regular facets on 5.1. Our queries contain mostly 
> terms facets, many of them with exclusion tags and prefix filtering.
> Nice work!


Re: mlt and document boost

2015-12-23 Thread Binoy Dalal
Have you tried applying the boosts to individual fields with mlt.qf?
Optionally, you could get the patch that is on jira and integrate it into
your code if you're so inclined.

On Thu, 24 Dec 2015, 03:17 CrazyDiamond  wrote:

> So no way to apply boost to mlt or any other way to change order of
> document
> in mlt result? also may be there is a way to make to mlt query  at once and
> merge.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/mlt-and-document-boost-tp4246522p4247154.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Data import issue

2015-12-23 Thread Midas A
Hi ,


Please provide the steps to resolve the issue.


com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException:
Communications link failure during rollback(). Transaction resolution
unknown.


DIH errors

2015-12-23 Thread Midas A
Please help us

a)

java.sql.SQLException: Streaming result set
com.mysql.jdbc.RowDataDynamic@755ea675 is still active. No statements
may be issued when any streaming result sets are open and in use on a
given connection. Ensure that you have called .close() on any active
streaming result sets before attempting more queries.

b) java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to execute query: SELECT pf.feature_id,
REPLACE(REPLACE(REPLACE(pfd.filter, '\n', ''), '', ''), '\r','') as
COLcolor_col, pfv.variant_id,
CONCAT(pfv.variant_id,'_',(REPLACE(REPLACE(REPLACE(fvd.variant, '\n',
''), '', ''), '\r',''))) as COLcolor_col_val  FROM
cscart_product_filters pf  inner join
cscart_product_filter_descriptions pfd on pfd.filter_id = pf.filter_id
inner join cscart_categories c on find_in_set (c.category_id,
pf.categories_path) inner join cscart_products_categories pc on
pc.category_id = c.category_id left join
cscart_product_features_values pfv on pfv.feature_id = pf.feature_id
and pfv.product_id = pc.product_id left join
cscart_product_feature_variant_descriptions fvd on fvd.variant_id =
pfv.variant_id where pf.feature_id != 53 and pc.product_id =
'72664486' and pf.status = 'A' order by pf.position asc Processing
Document # 517710

at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:266)
at 
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:451)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:489)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to execute query: SELECT pf.feature_id,
REPLACE(REPLACE(REPLACE(pfd.filter, '\n', ''), '', ''), '\r','') as
COLcolor_col, pfv.variant_id,
CONCAT(pfv.variant_id,'_',(REPLACE(REPLACE(REPLACE(fvd.variant, '\n',
''), '', ''), '\r',''))) as COLcolor_col_val  FROM
cscart_product_filters pf  inner join
cscart_product_filter_descriptions pfd on pfd.filter_id = pf.filter_id
inner join cscart_categories c on find_in_set (c.category_id,
pf.categories_path) inner join cscart_products_categories pc on
pc.category_id = c.category_id left join
cscart_product_features_values pfv on pfv.feature_id = pf.feature_id
and pfv.product_id = pc.product_id left join
cscart_product_feature_variant_descriptions fvd on fvd.variant_id =
pfv.variant_id where pf.feature_id != 53 and pc.product_id =
'72664486' and pf.status = 'A' order by pf.position asc Processing
Document # 517710
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:406)
at 
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:353)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:219)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: SELECT pf.feature_id,
REPLACE(REPLACE(REPLACE(pfd.filter, '\n', ''), '', ''), '\r','') as
COLcolor_col, pfv.variant_id,
CONCAT(pfv.variant_id,'_',(REPLACE(REPLACE(REPLACE(fvd.variant, '\n',
''), '', ''), '\r',''))) as COLcolor_col_val  FROM
cscart_product_filters pf  inner join
cscart_product_filter_descriptions pfd on pfd.filter_id = pf.filter_id
inner join cscart_categories c on find_in_set (c.category_id,
pf.categories_path) inner join cscart_products_categories pc on
pc.category_id = c.category_id left join
cscart_product_features_values pfv on pfv.feature_id = pf.feature_id
and pfv.product_id = pc.product_id left join
cscart_product_feature_variant_descriptions fvd on fvd.variant_id =
pfv.variant_id where pf.feature_id != 53 and pc.product_id =
'72664486' and pf.status = 'A' order by pf.position asc Processing
Document # 517710
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
... 5 more


Re: How to check when a search exceeds the threshold of timeAllowed parameter

2015-12-23 Thread David Santamauro



On 12/23/2015 01:42 AM, William Bell wrote:

I agree that when using timeAllowed in the header info there should be an
entry that indicates timeAllowed triggered.


If I'm not mistaken, there is
 => partialResults:true

  "responseHeader":{ "partialResults":true }

//



This is the only reason why we have not used timeAllowed. So this is a
great suggestion. Something like: 1 ??
That would be great.


0
1
107

*:*
1000





On Tue, Dec 22, 2015 at 6:43 PM, Vincenzo D'Amore 
wrote:


Well... I can write everything, but really all this just to understand
when timeAllowed
parameter trigger a partial answer? I mean, isn't there anything set in the
response when is partial?

On Wed, Dec 23, 2015 at 2:38 AM, Walter Underwood 
wrote:


We need to know a LOT more about your site. Number of documents, size of
index, frequency of updates, length of queries approximate size of server
(CPUs, RAM, type of disk), version of Solr, version of Java, and features
you are using (faceting, highlighting, etc.).

After that, we’ll have more questions.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Dec 22, 2015, at 4:58 PM, Vincenzo D'Amore 

wrote:


Hi All,

my website is under pressure, there is a big number of concurrent

searches.

When the connected users are too many, the searches becomes so slow

that

in

some cases users have to wait many seconds.
The queue of searches becomes so long that, in same cases, servers are
blocked trying to serve all these requests.
As far as I know because some searches are very expensive, and when

many

expensive searches clog the queue server becomes unresponsive.

In order to quickly workaround this herd effect, I have added a
default timeAllowed to 15 seconds, and this seems help a lot.

But during stress tests but I'm unable to understand when and what

requests

are affected by timeAllowed parameter.

Just be clear, I have configure timeAllowed parameter in a SolrCloud
environment, given that partial results may be returned (if there are

any),

how can I know when this happens? When the timeAllowed parameter

trigger

a

partial answer?

Best regards,
Vincenzo



--
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251






--
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251







how to use distribute facet.pivot

2015-12-23 Thread soledede_w...@ehsy.com
I have two shards

facet.pivot=categoryId1,categoryId2,categoryId3,categoryId4
when I  excute facet.pivot, cpu and mem both full

How can I solve this problem 


Who can help me 

Thanks




soledede_w...@ehsy.com


Re: Unable to extract images content (OCR) from PDF files using Solr

2015-12-23 Thread Upayavira
If your needs of Tika fall outside of those provided by the embedded
Tika, I would suggest you include Tika in your own ingestion pipeline,
and just post raw content to Solr. This will probably perform better
anyway, as you are otherwise using up valuable Solr resources to do your
extraction work, and, as you are seeing, have far less control over what
happens inside than you would if Tika was consumed by your own
application.

Upayavira

On Wed, Dec 23, 2015, at 03:11 AM, Zheng Lin Edwin Yeo wrote:
> Hi,
> 
> I'm also facing the same issue as what you faced 2 months back, like able
> to extract the image content if there are in .jpg or .png format, but not
> able to extract the images in pdf, even after setting
> "extractInlineImages
> true" in the PDFParser.properties.
> 
> Have you managed to find alternative solutions to this problem?
> 
> Regards,
> Edwin
> 
> On 22 October 2015 at 18:05, Damien Picard 
> wrote:
> 
> > Hi,
> >
> > I'm using Solr 5.3.0 on a Red Hat EL 7 and I try to extract content from
> > PDF, Word, LibreOffice, etc. docs using the ExtractingRequestHandler.
> >
> > Everything works fine, except when I want to extract content from embedding
> > images in PDF/Word etc. documents :
> >
> > I send an extract request like this :
> > POST /update/extract?literal.id
> > =ocrpdf8=attr_content=attr_
> >
> > In attr_content, I get :
> > \n \n date 2015-08-28T13:23:03Z \n
> > pdf:PDFVersion 1.4 \n
> > xmp:CreatorTool PDFCreator Version 1.2.3 \n
> >  stream_content_type application/pdf \n
> >  Keywords \n
> >  subject \n
> >  dc:creator S050735 \n
> >  dcterms:created 2015-08-28T13:23:03Z \n
> >  Last-Modified 2015-08-28T13:23:03Z \n
> >  dcterms:modified 2015-08-28T13:23:03Z \n
> >  dc:format application/pdf; version=1.4 \n
> >  Last-Save-Date 2015-08-28T13:23:03Z \n
> >  stream_name imagepdf.pdf \n
> >  meta:save-date 2015-08-28T13:23:03Z \n
> >  pdf:encrypted false \n
> >  dc:title imagepdf \n
> >  modified 2015-08-28T13:23:03Z \n
> >  cp:subject \n
> >  Content-Type application/pdf \n
> >  stream_size 423660 \n
> >  X-Parsed-By org.apache.tika.parser.DefaultParser \n
> >  X-Parsed-By org.apache.tika.parser.pdf.PDFParser \n
> >  creator S050735 \n
> >  meta:author S050735 \n
> >  dc:subject \n
> >  meta:creation-date 2015-08-28T13:23:03Z \n
> >  stream_source_info the-file \n
> >  created Fri Aug 28 13:23:03 UTC 2015 \n
> >  xmpTPg:NPages 1 \n
> >  Creation-Date 2015-08-28T13:23:03Z \n
> >  meta:keyword \n
> >  Author S050735 \n
> >  producer GPL Ghostscript 9.04 \n
> >  imagepdf \n
> >  \n
> >  page \n
> >  Page 1 sur 1\n \n
> >  28/08/2015
> > http://confluence/download/attachments/158471300/image2015-3-3+18%3A10%3A4.
> > ..
> > \n \n embedded:image0.jpg image0.jpg embedded:image1.jpg image1.jpg
> > embedded:image2.jpg image2.jpg \n
> >
> > So, tika works fine, but it doesn't apply OCR content extraction on the
> > embedded images.
> >
> > When I post an image (JPG) on /update/extract, I get its content indexed
> > throught Tesseract OCR (attr_content) field :
> > \n \n stream_size 55422 \n
> >  X-Parsed-By org.apache.tika.parser.DefaultParser \n
> >  X-Parsed-By org.apache.tika.parser.ocr.TesseractOCRParser \n
> >  stream_content_type image/jpeg \n
> >  stream_name OM_1.jpg \n
> >  stream_source_info the-file \n
> >  Content-Type image/jpeg \n \n \n
> >  ‘ '\"I“ \" \"' ./\nlrast. Shortly before the classes started I was
> > visiting a.\ncertain public school, a school set in a typically
> > English\ncountryside, which on the June clay of my visit was wonder-\nfully
> > beauliful. The Head Master—-no less typical than his\nschool and the
> > country-side—pointed out the charms of\nboth, and his pride came out in the
> > final remark which he made\nbeforehe left me. He explained that he had a
> > class to take\nin'I'heocritus. Then (with a. buoyant gesture); “ Can
> > you\n\n, conceive anything more delightful than a class in
> > Theocritus,\n\non such a day and in such a place?\"\n\n \n \n \n
> > stream_size 55422 \n X-Parsed-By org.apache.tika.parser.DefaultParser \n
> > X-Parsed-By org.apache.tika.parser.ocr.TesseractOCRParser \n X-Parsed-By
> > org.apache.tika.parser.jpeg.JpegParser \n stream_content_type image/jpeg \n
> > Resolution Units inch \n stream_source_info the-file \n Compression Type
> > Progressive, Huffman \n Data Precision 8 bits \n Number of Components 3 \n
> > tiff:ImageLength 286 \n Component 2 Cb component: Quantization table 1,
> > Sampling factors 1 horiz/1 vert \n Component 1 Y component: Quantization
> > table 0, Sampling factors 2 horiz/2 vert \n Image Height 286 pixels \n X
> > Resolution 72 dots \n Image Width 690 pixels \n stream_name OM_1.jpg \n
> > Component 3 Cr component: Quantization table 1, Sampling factors 1 horiz/1
> > vert \n tiff:BitsPerSample 8 \n tiff:ImageWidth 690 \n Content-Type
> > image/jpeg \n Y Resolution 72 dots
> >
> > I see on Tika JIRA that I have to enable extractInlineImages in
> > 

Multiple Unique Keys

2015-12-23 Thread Salman Ansari
Hi,

I am wondering if I can specify multiple unique keys in the same document
in Solr. My scenario is that I want to integrate with another system that
has an ID and our system has a reference number (auto-generated for each
document on the fly) as well that is unique.

What I am trying to achieve is to have uniqueness applied on both "ID" and
"Reference Number" so if I get a duplicate document from the source (which
will have the same ID) I want to override my existing document. What I am
not sure about is

1) Does Solr support multiple unique keys for a document?
2) What if the "ID" was the same but we generated a different "Reference
Number", will that override the existing document? (I mean one field among
the unique field is the same but the other is not)

Appreciate your feedback and comments.

Regards,
Salman


Re: Multiple Unique Keys

2015-12-23 Thread Alexandre Rafalovitch
No.

Whichever one triggers the document override should be your primary key.
The rest is application logic.

You can make the field required, but that's about it.

Regards,
   Alex
On 23 Dec 2015 3:32 pm, "Salman Ansari"  wrote:

> Hi,
>
> I am wondering if I can specify multiple unique keys in the same document
> in Solr. My scenario is that I want to integrate with another system that
> has an ID and our system has a reference number (auto-generated for each
> document on the fly) as well that is unique.
>
> What I am trying to achieve is to have uniqueness applied on both "ID" and
> "Reference Number" so if I get a duplicate document from the source (which
> will have the same ID) I want to override my existing document. What I am
> not sure about is
>
> 1) Does Solr support multiple unique keys for a document?
> 2) What if the "ID" was the same but we generated a different "Reference
> Number", will that override the existing document? (I mean one field among
> the unique field is the same but the other is not)
>
> Appreciate your feedback and comments.
>
> Regards,
> Salman
>