Re: Solr Atomic Updates

2015-06-03 Thread Erick Erickson
Basically, I think about using SolrCloud whenever you have to split your corpus into more than one core (shard in SolrCloud terms). Or when you require fault tolerance in terms of machines going up and down. Despite the name, it does _not_ require AWS or similar, and you can run SolrCloud on a

Re: retrieving large number of docs

2015-06-03 Thread Robust Links
Hi Erick they are on the same JVM. I had already tried the core join strategy but that doesnt solve the faceting problem... i.e if i have 2 cores, core0 and core1, and I run this query on core0 /select?q=QUERYfq={!join from=id1 to=id2 fromIndex=core1}facet=truefacet.field=tag has 2 problems 1)

Re: retrieving large number of docs

2015-06-03 Thread Jack Krupansky
Specify the join query parser for the main query. See: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser -- Jack Krupansky On Wed, Jun 3, 2015 at 3:32 PM, Robust Links pey...@robustlinks.com wrote: Hi Erick they are on the same JVM. I had already

Re: Solr Atomic Updates

2015-06-03 Thread Jack Krupansky
BTW, does anybody know how SolrCloud got that name? I mean, SolrCluster would make a lot more sense since a cloud is typically a very large collection of machines and more of a place than a specific configuration, while a Solr deployment is more typically a more modest number of machines, a

Re: SolrCloud 5.1 startup looking for standalone config

2015-06-03 Thread tuxedomoon
Yes adding _solr worked, thx. But I also had to populate the SOLR_HOST param for each of the 4 hosts, as in SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com. I'm in an EC2 VPN environment which might be the problem. This command now works (leaving off port)

Re: Solr Atomic Updates

2015-06-03 Thread Shawn Heisey
On 6/3/2015 2:19 PM, Jack Krupansky wrote: BTW, does anybody know how SolrCloud got that name? I mean, SolrCluster would make a lot more sense since a cloud is typically a very large collection of machines and more of a place than a specific configuration, while a Solr deployment is more

Re: retrieving large number of docs

2015-06-03 Thread Robust Links
that doesnt work either, and even if it did, joining is not going to be a solution since i cant query 1 core and facet on the result of the other. To sum up, my problem is core0 field:id field: text core1 field:id field tag I want to 1) query text field of core0, 2) use the

Re: BoolField fieldType

2015-06-03 Thread Erick Erickson
I took a quick look at the code and it _looks_ like any string starting with t, T or 1 is evaluated as true and everything else as false. sortMissingLast determines sort order if you're sorting on this field and the document doesn't have a value. Should the be sorted after or before docs that

BoolField fieldType

2015-06-03 Thread Steven White
Hi everyone, This is a two part question: 1) I see the following: fieldType name=boolean class=solr.BoolField sortMissingLast=true/ a) what does sortMissingLast do? b) what kind of data is considered Boolean? TRUE, True, true, 1, yes,, Yes, FALSE, etc. 2) When searching, what do I search on:

Re: SolrCloud 5.1 startup looking for standalone config

2015-06-03 Thread Shawn Heisey
On 6/3/2015 2:48 PM, tuxedomoon wrote: Yes adding _solr worked, thx. But I also had to populate the SOLR_HOST param for each of the 4 hosts, as in SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com. I'm in an EC2 VPN environment which might be the problem. This command now works (leaving

Lost connection to Zookeeper

2015-06-03 Thread Joseph Obernberger
Hi All - I've run into a problem where every-once in a while one or more of the shards (27 shard cluster) will loose connection to zookeeper and report updates are disabled. In additional to the CLUSTERSTATUS timeout errors, which don't seem to cause any issue, this one certainly does as that

How http connections are handled in Solr?

2015-06-03 Thread Manohar Sripada
Hi, I wanted to know in detail on how it is http connections are handled in Solr. 1. From my code, I am using CloudSolrServer of solrj client library to get the connection. From one of my previous discussion in this forum, I understood that Solr uses Apache's HttpClient for connections and the

Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Shawn Heisey
On 6/3/2015 12:20 AM, Clemens Wyss DEV wrote: Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for Solr. I am seeing the following OOMs: ERROR - 2015-06-03 05:17:13.317; [ customer-1-de_CH_1] org.apache.solr.common.SolrException; null:java.lang.RuntimeException:

Re: Derive suggestions across multiple fields

2015-06-03 Thread Alessandro Benedetti
Can you share you suggester configurations ? Have you read the guide I linked ? Has the suggestion index/fst has been built ? ( you need to build the suggester) Cheers 2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you for your explanation. I'll not need to care

Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Clemens Wyss DEV
Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for Solr. I am seeing the following OOMs: ERROR - 2015-06-03 05:17:13.317; [ customer-1-de_CH_1] org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space

Re: Number of clustering labels to show

2015-06-03 Thread Zheng Lin Edwin Yeo
Thank you so much for your explanation. On 2 June 2015 at 17:31, Alessandro Benedetti benedetti.ale...@gmail.com wrote: The scope in there is to try to make clustering lighter and more related to the query. The summary produced is a fragment that is surrounding the query terms in the

AW: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Clemens Wyss DEV
Ciao Shawn, thanks for your reply. The oom script just kills Solr with the KILL signal (-9) and logs the kill. I know. But my feeling is, that not even this happens, i.e. the script is not being executed. At least I see no solr_oom_killer-$SOLR_PORT-$NOW.log file ... Btw: Who re-starts solr

Re: Derive suggestions across multiple fields

2015-06-03 Thread Zheng Lin Edwin Yeo
This is my suggester configuration: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str

Re: Solr Atomic Updates

2015-06-03 Thread Jack Krupansky
Explain a little about why you have separate cores, and how you decide which core a new document should reside in. Your scenario still seems a bit odd, so help us understand. -- Jack Krupansky On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! Thanks for your

Re: Solr Atomic Updates

2015-06-03 Thread Upayavira
If you are using stand-alone Solr instances, then it is your responsibility to decide which node a document resides in, and thus to which core you will send your update request. If, however, you used SolrCloud, it would handle that for you - deciding which node should contain a document, and

Re: How to tell when Collector finishes collect loop?

2015-06-03 Thread Joel Bernstein
I think there are easier ways to do what you are trying to do. Take a look at the Function query parser. It will allow you to control the score for each document from within a function query. The basic use case is this: q={!func}myFunc()fq=my+query In this scenario the func qparser plugin

Re: How to tell when Collector finishes collect loop?

2015-06-03 Thread Joel Bernstein
The finish method would still be a problem using the func qparser. Out of curiosity, why do you need to call close on the scorer? Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 10:53 AM, Joel Bernstein joels...@gmail.com wrote: I think there are easier ways to do what you

Re: How http connections are handled in Solr?

2015-06-03 Thread Shawn Heisey
On 6/3/2015 4:12 AM, Manohar Sripada wrote: 1. From my code, I am using CloudSolrServer of solrj client library to get the connection. From one of my previous discussion in this forum, I understood that Solr uses Apache's HttpClient for connections and the default maxConnections per host is 32

AW: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Clemens Wyss DEV
Hi Mark, what exactly should I file? What needs to be added/appended to the issue? Regards Clemens -Ursprüngliche Nachricht- Von: Mark Miller [mailto:markrmil...@gmail.com] Gesendet: Mittwoch, 3. Juni 2015 14:23 An: solr-user@lucene.apache.org Betreff: Re: Solr OutOfMemory but no heap

Re: Derive suggestions across multiple fields

2015-06-03 Thread Alessandro Benedetti
I can see a lot of confusion in the configuration! Few suggestions : - read carefully the document and try to apply the suggesting guidance - currently there is no need to use spellcheck for suggestions, now they are separated things - i see text used to derive suggestions, I would prefer there

Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Mark Miller
File a JIRA issue please. That OOM Exception is getting wrapped in a RuntimeException it looks. Bug. - Mark On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV clemens...@mysign.ch wrote: Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for Solr. I am seeing the following

How to tell when Collector finishes collect loop?

2015-06-03 Thread adfel70
Hi guys, need your help (again): I have a search handler which need to override solr's scoring. I chose to implement it with RankQuery API, so when getTopDocsCollector() gets called it instantiates my TopDocsCollector instance, and every dicId gets its own score: public class MyScorerrankQuet

Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Mark Miller
We will have to a find a way to deal with this long term. Browsing the code I can see a variety of places where problem exception handling has been introduced since this all was fixed. - Mark On Wed, Jun 3, 2015 at 8:19 AM Mark Miller markrmil...@gmail.com wrote: File a JIRA issue please. That

Re: Solr Atomic Updates

2015-06-03 Thread Ксения Баталова
Hi! Thanks for your quick reply. The problem that all my index is consists of several parts (several cores) and while updating I don't know in advance in which part updated id is lying (in which core the document with specified id is lying). For example, I have two cores (*Core1 *and *Core2*)

Could not find configName for collection client_active found:nul

2015-06-03 Thread David McReynolds
I’m helping someone with this but my zookeeper experience is limited (as in none). They have purportedly followed the instruction from the wiki. https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble Jun 02, 2015 2:40:37 PM

Re: AW: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Shawn Heisey
On 6/3/2015 1:41 AM, Clemens Wyss DEV wrote: The oom script just kills Solr with the KILL signal (-9) and logs the kill. I know. But my feeling is, that not even this happens, i.e. the script is not being executed. At least I see no solr_oom_killer-$SOLR_PORT-$NOW.log file ... Btw: Who

Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Erick Erickson
bq: what exactly should I file? What needs to be added/appended to the issue? Just what Mark said, title it something like OOM exception wrapped in runtime exception Include your original post and that you were asked to open the JIRA after discussion on the user's list. Don't worry too much, the

Re: Could not find configName for collection client_active found:nul

2015-06-03 Thread Erick Erickson
It's not entirely clear what you're trying to do when this is pushed out, but I'm guessing it's create a collection. If that's so, then this is your problem: Could not find configName for collection client_active You've set up Zookeeper correctly. But _before_ you create a collection, you have

Re: Derive suggestions across multiple fields

2015-06-03 Thread Zheng Lin Edwin Yeo
Thank you for your suggestions. Will try that out and update on the results again. Regards, Edwin On 3 June 2015 at 21:13, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I can see a lot of confusion in the configuration! Few suggestions : - read carefully the document and try to

Re: Sorting in Solr

2015-06-03 Thread Chris Hostetter
: https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter : : I think we may have an omission from the docs -- docValues can also be : used for sorting, and may also offer a performance advantage. I added a note about that. -Hoss

retrieving large number of docs

2015-06-03 Thread Robust Links
Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions:

Re: How to identify field names from the suggested values in multiple fields

2015-06-03 Thread Walter Underwood
Configure two suggesters, one based on each field. Use both of them and you’ll get separate suggestions from each. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 3, 2015, at 10:03 PM, Dhanesh Radhakrishnan dhan...@hifx.co.in wrote: Hi Anyone

Re: Solr Atomic Updates

2015-06-03 Thread Ксения Баталова
Upayavira, I'm using stand-alone Solr instances. I've not learnt SolrCloud yet. Please, give me some advice when SolrCloud is better then stand-alone Solr instances. Or when it is worth to choose SolrCloud. _ _ _ Batalova Kseniya If you are using stand-alone Solr instances, then it is your

Re: retrieving large number of docs

2015-06-03 Thread Robust Links
what would be a custom solution? On Wed, Jun 3, 2015 at 1:58 PM, Joel Bernstein joels...@gmail.com wrote: You may have to do something custom to meet your needs. 10,000 DocID's is not huge but you're latency requirement are pretty low. Are your DocID's by any chance integers? This can make

Re: retrieving large number of docs

2015-06-03 Thread Joel Bernstein
Erick makes a great point, if they are in the same VM try the cross-core join first. It might be fast enough for you. A custom solution would be to build a custom query or post filter that works with your specific scenario. For example if the docID's are integers you could build a fast PostFilter

Re: Derive suggestions across multiple fields

2015-06-03 Thread Zheng Lin Edwin Yeo
My previous suggester configuration is derived from this page: https://wiki.apache.org/solr/Suggester Does it mean that what is written there is outdated? Regards, Edwin On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Thank you for your suggestions. Will try that

How to identify field names from the suggested values in multiple fields

2015-06-03 Thread Dhanesh Radhakrishnan
Hi Anyone help me to build a suggester auto complete based on multiple fields? There are two fields in my schema. Category and Subcategory and I'm trying to build suggester based on these 2 fields. When the suggestions result, how can I distinguish from which filed it come from? I used a

Re: Derive suggestions across multiple fields

2015-06-03 Thread Erick Erickson
This may be helpful: http://lucidworks.com/blog/solr-suggester/ Note that there are a series of fixes in various versions of Solr, particularly buildOnStartup=false and working on multivalued fields. Best, Erick On Wed, Jun 3, 2015 at 8:04 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: My

Re: How to identify field names from the suggested values in multiple fields

2015-06-03 Thread Dhanesh Radhakrishnan
Thank you for the quick response. If I use 2 suggesters, can I get the result in a single request? http://192.17.80.99:8983/solr/core1/suggest?suggest=truesuggest.dictionary=mySuggesterwt=xmlsuggest.q=school Is there any helping document to build multiple suggesters?? On Thu, Jun 4, 2015 at

Re: retrieving large number of docs

2015-06-03 Thread Erick Erickson
Are these indexes on different machines? Because if they're in the same JVM, you might be able to use cross-core joins. Be aware, though, that joining on high-cardinality fields (which, by definition, docID probably is) is where pseudo joins perform worst. Have you considered flattening the data

Re: Solr Atomic Updates

2015-06-03 Thread Erick Erickson
I have to ask then why you're not using SolrCloud with multiple shards? It seems to me that that gives you the indexing throughput you need (be sure to use CloudSolrServer from your client). At 300M complex documents, you pretty much certainly will need to shard anyway so in some sense you're

Re: retrieving large number of docs

2015-06-03 Thread Joel Bernstein
A few questions for you: How large can the list of filtering ID's be? What's your expectation on latency? What version of Solr are you using? SolrCloud or not? Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote: Hi I

Re: retrieving large number of docs

2015-06-03 Thread Robust Links
Hey Joel see below On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote: A few questions for you: How large can the list of filtering ID's be? 10k What's your expectation on latency? 10 latency 100 What version of Solr are you using? 5.0.0 SolrCloud or

Re: retrieving large number of docs

2015-06-03 Thread Joel Bernstein
You may have to do something custom to meet your needs. 10,000 DocID's is not huge but you're latency requirement are pretty low. Are your DocID's by any chance integers? This can make custom PostFilters run much faster. You should also be aware of the Streaming API in Solr 5.1 which will give

Re: Solr Atomic Updates

2015-06-03 Thread Ксения Баталова
Jack, Decision of using several cores was made to increase indexing and searching performance (experimentally). In my project index is about 300-500 millions documents (each document has rather difficult structure) and it may be larger. So, while indexing the documents are being added in