Re: How large is your solr index?

2014-12-30 Thread Bram Van Dam
On 12/29/2014 08:08 PM, ralph tice wrote: Like all things it really depends on your use case. We have 160B documents in our largest SolrCloud and doing a *:* to get that count takes ~13-14 seconds. Doing a text:happy query only takes ~3.5-3.6 seconds cold, subsequent queries for the same terms

Re: How large is your solr index?

2014-12-30 Thread Bram Van Dam
On 12/29/2014 09:53 PM, Jack Krupansky wrote: And that Lucene index document limit includes deleted and updated documents, so even if your actual document count stays under 2^31-1, deleting and updating documents can push the apparent document count over the limit unless you very aggressively

Re: How large is your solr index?

2014-12-30 Thread Bram Van Dam
On 12/29/2014 10:30 PM, Toke Eskildsen wrote: That being said, I acknowledge that it helps with stories to get a feel of what can be done. That's pretty much what I'm after, mostly to reassure myself that it can be done. Even if it does require a lot of hardware (which is fine). At

Index process migration from 3.6.x to 4.x

2014-12-30 Thread Samuel García Martínez
Hi folks! I'm studying the migration process from our current solr 3.6 multitenant cluster (single master, multiple slaves) setup to a solrcloud 4.10.3 but I have a a question about the tlog. First of all, I will try to give some context: - 1 single master and N slaves. - around 300

Re: Solr server becomes non-responsive.

2014-12-30 Thread Modassar Ather
Hi, In the query having lots of wildcard can we put a limitation on number of expansion of terms done against a wildcard token something like maxBooleanClauses? Thanks, Modassar On Mon, Dec 29, 2014 at 11:15 AM, Modassar Ather modather1...@gmail.com wrote: Thanks Jack for your suggestions.

RE: How large is your solr index?

2014-12-30 Thread Toke Eskildsen
Shawn Heisey [apa...@elyograg.org] wrote: I believe it would be useful to organize a session at Lucene Revolution, possibly more interactive than a straight presentation, where users with very large indexes are encouraged to attend. The point of this session would be to exchange war stories,

Re: Index process migration from 3.6.x to 4.x

2014-12-30 Thread Shawn Heisey
On 12/30/2014 2:16 AM, Samuel García Martínez wrote: I'm studying the migration process from our current solr 3.6 multitenant cluster (single master, multiple slaves) setup to a solrcloud 4.10.3 but I have a a question about the tlog. First of all, I will try to give some context: -

Re: Solr server becomes non-responsive.

2014-12-30 Thread Shawn Heisey
On 12/30/2014 4:16 AM, Modassar Ather wrote: In the query having lots of wildcard can we put a limitation on number of expansion of terms done against a wildcard token something like maxBooleanClauses? I'm not aware of anything for limiting wildcard terms, but I'm willing to be surprised. As

Re: How large is your solr index?

2014-12-30 Thread Norgorn
Please, tell a bit more about how you run SOLRs. When we trying to run SOLR with 5 shards, 50GB per shard, we often get OutOfMemory (especially for group queries). And while indexing SOLR often falls (without exceptions - some JVM issue). We are using Heliosearch. -- View this message in

Re: How large is your solr index?

2014-12-30 Thread Shawn Heisey
On 12/30/2014 5:43 AM, Toke Eskildsen wrote: Shawn Heisey [apa...@elyograg.org] wrote: I believe it would be useful to organize a session at Lucene Revolution, possibly more interactive than a straight presentation, where users with very large indexes are encouraged to attend. The point of

Re: Solr server becomes non-responsive.

2014-12-30 Thread Jack Krupansky
I actually did that once as a test years ago, as well as support for paging through the wildcard terms with a starting offset, and it worked great. One way to think of the feature is as the ability to sample the values of the wildcard. I mean, not all queries require absolute precision. Sometimes

Re: How large is your solr index?

2014-12-30 Thread Erick Erickson
bq: I did at some point try to write a long blog entry on Solr hardware and setup for non-small corpuses, but have to give up: Man, this makes me laugh! Oh the memories! A common question from sales, quite a reasonable one at that; can we have a checklist that we can use to give clients an idea

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
Thanks Erick! Yes, if I set splitOnCaseChange=0, then of course it'll work -- but then query for mixedCase will no longer also match mixed Case. I think I want WDF to... kind of do all of the above. Specifically, I had thought that it would allow a query for mixedCase to match both/either

Re: How large is your solr index?

2014-12-30 Thread Jack Krupansky
If people are so gung-ho to go down the lots on endless pain rabbit-hole route by heavily under-configuring their clusters, I guess that's their choice, but I would strongly advise against it. Sure, a small the few and the proud warhorses can proudly proclaim how they did it, and a small number of

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jack Krupansky
Right, that's what I meant by WDF not being magic - you can configure it to match any three out of four use cases as you choose, but there is no choice that matches all of the use cases. To be clear, this is not a bug in WDF, but simply a limitation. -- Jack Krupansky On Tue, Dec 30, 2014 at

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
I guess I don't understand what the four use cases are, or the three out of four use cases, or whatever. What the intended uses of the WDF are. Can you explain what the intended use of setting: generateWordParts=1 catenateWords=1 splitOnCaseChange=1 Is that supposed to do something useful (at

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Alexandre Rafalovitch
On 30 December 2014 at 11:12, Jonathan Rochkind rochk...@jhu.edu wrote: I'm a bit confused about what splitOnCaseChange combined with catenateWords is meant to do at all. It _is_ generating both the split and single-word tokens at query time Have you tried only having WDF during indexing with

Re: How large is your solr index?

2014-12-30 Thread Alexandre Rafalovitch
I bet that while there are no specific numbers, there are indicators that everybody - who knows what they are doing - look at to decide which particular aspect of configuration is hurting most. So perhaps a good article would be not so much the concrete numbers but the indicators to check. I

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
On 12/30/14 11:45 AM, Alexandre Rafalovitch wrote: On 30 December 2014 at 11:12, Jonathan Rochkind rochk...@jhu.edu wrote: I'm a bit confused about what splitOnCaseChange combined with catenateWords is meant to do at all. It _is_ generating both the split and single-word tokens at query time

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jack Krupansky
I do have a more thorough discussion of WDF in my Solr Deep Dive e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html You're not wrong about anything here... you just need to accept that WDF is not magic and can't handle every

Re: Suggester: weight (term frequency) and 'mm' feasibility (allTermsRequired)

2014-12-30 Thread Boon Low
Hi, Re. AND/OR boolean lookup for ‘infix’ suggestion. I checked that Lucene does have an underlying support for this via the “allTermsRequired” boolean. However this feature, along with highlighting (on/off) are currently hardwired in Lucene, and hidden in Solr. This issue has previously been

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
Okay, thanks. I'm not sure if it's my lack of understanding, but I feel like I'm having a very hard time getting straight answers out of you all, here. I want the query mixedCase to match both/either mixed Case and mixedCase in the index. What configuration of WDF at index/query time would

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Walter Underwood
You want preserveOriginal=“1”. You should only do this processing at index time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 30, 2014, at 9:33 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Okay, thanks. I'm not sure if it's my lack of understanding,

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
On 12/30/14 12:35 PM, Walter Underwood wrote: You want preserveOriginal=“1”. You should only do this processing at index time. If I only do this processing at index time, then mixedCase at query time will no longer match mixed Case in the index/source material. I think I'm having trouble

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Michael Sokolov
On 12/30/14 12:42 PM, Jonathan Rochkind wrote: On 12/30/14 12:35 PM, Walter Underwood wrote: You want preserveOriginal=“1”. You should only do this processing at index time. If I only do this processing at index time, then mixedCase at query time will no longer match mixed Case in the

Re: How large is your solr index?

2014-12-30 Thread Shawn Heisey
On 12/30/2014 1:19 AM, Bram Van Dam wrote: We had a look at Heliosearch a while ago and found it unsuitable. Seems like they're trying to make use of some native x86_64 code and HotSpot JVM specific features which we can't use. Some of our clients use IBM's JVM so we're pretty much limited to

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Walter Underwood
There are two approaches for the query “mixedCase” to match “mixed Case” in the original document. 1. Add an index time synonym. 2. Add a ShingleFilterFactory to the index analysis chain. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 30, 2014, at 9:50

Re: Index process migration from 3.6.x to 4.x

2014-12-30 Thread Samuel García Martínez
Thanks for the quick reply! We just want to use solrcloud because it simplifies the operations process and the cluster management like centralized configurations, replica management and so on. I've been playing with a 4 node cluster and watching the tlog and possibles issues and it seems too

SpellCheck (AutoComplete) Not Working In Distributed Environment

2014-12-30 Thread Charles Sanders
I'm running Solr 4.8 in a distributed environment (2 shards). I have added the spellcheck component to my request handler. In my test system, which is not distributed, it works. But when I move it to the Dev box, which is distributed, 2 shards, it is not working. Is there something additional I

Re: SpellCheck (AutoComplete) Not Working In Distributed Environment

2014-12-30 Thread Erick Erickson
Did you try the shards parameter? See: https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-DistributedSpellCheck On Tue, Dec 30, 2014 at 2:20 PM, Charles Sanders csand...@redhat.com wrote: I'm running Solr 4.8 in a distributed environment (2 shards). I have added the

Re: no replication using commitWithin via curl?

2014-12-30 Thread Brendan Humphreys
I've raised https://issues.apache.org/jira/browse/SOLR-6903 for this, as I consider it a bug. Attached to the JIRA is a modified test demonstrating the failure. The test fails on 5.x and 4.x. Cheers, -Brendan On 30 December 2014 at 13:53, Brendan Humphreys bren...@canva.com wrote: Thanks for

Re: SpellCheck (AutoComplete) Not Working In Distributed Environment

2014-12-30 Thread Charles Sanders
Thanks for the suggestion. I did not do that originally because the documentation states: This parameter is not required for the /select request handler. Which is what I am using. But I gave it a go, even though I'm not certain of the shard names. Now I have a NPE.

Re: SpellCheck (AutoComplete) Not Working In Distributed Environment

2014-12-30 Thread Shawn Heisey
On 12/30/2014 5:03 PM, Charles Sanders wrote: Thanks for the suggestion. I did not do that originally because the documentation states: This parameter is not required for the /select request handler. Which is what I am using. But I gave it a go, even though I'm not certain of the

Re: solr export get wrong results

2014-12-30 Thread Joel Bernstein
For the initial release only JSON output format is supported with the /export feature. Also there is no built-in distributed support yet. Both of these features are likely to follow in future releases. For the initial release you'll need a client that can handle the JSON format and distributed

RE: Join in SOLR

2014-12-30 Thread Rajesh
Mikhail, How can I get a nightly build with fix for SOLR-5147 included. I've searched and found that nightly build will not be available to the general public. Is there any URL where they post their nightly build? Thanks in advance Rajesh Panneerselvam From: Mikhail Khludnev [via Lucene]

Re: Join in SOLR

2014-12-30 Thread Mikhail Khludnev
Rajesh, Nohow. Jira is still open, the patch wasn't committed anywhere. On Wed, Dec 31, 2014 at 8:27 AM, Rajesh rajesh.panneersel...@aspiresys.com wrote: Mikhail, How can I get a nightly build with fix for SOLR-5147 included. I've searched and found that nightly build will not be available

Pointing-solr-cloud-to-multiple-index-directories

2014-12-30 Thread Nishanth S
Thanks Eric and Shawn.Here is why I am trying to do so.I may be missing something here since this is relatively new to me.Appreciate your help and time.* I will elaborate on what I am trying to acheive here.* I am trying to install solr cloud and my machines typically have 5 drives which are

RE: Join in SOLR

2014-12-30 Thread Rajesh
Oh! Thanks Mikhail. But I could see a comment in that JIRA, above your comment which is from Thomas champagne that the patch was committed to current trunk. Is it not for this issue Mikhail? Thanks in advance Rajesh Panneerselvam From: Mikhail Khludnev [via Lucene]

Re: Join in SOLR

2014-12-30 Thread Shawn Heisey
On 12/30/2014 11:44 PM, Rajesh wrote: Oh! Thanks Mikhail. But I could see a comment in that JIRA, above your comment which is from Thomas champagne that the patch was committed to current trunk. Is it not for this issue Mikhail? The message from Thomas Champagne indicates that he updated the

Re: Join in SOLR

2014-12-30 Thread Rajesh
Is there a way to get the trunk and I can update the same patch to check this functionality. If so, where can I get the trunk build? -- View this message in context: http://lucene.472066.n3.nabble.com/Join-in-SOLR-tp4173930p4176678.html Sent from the Solr - User mailing list archive at

Re: Join in SOLR

2014-12-30 Thread Mikhail Khludnev
Rajesh, it seems you need the trunk to apply patch on. my favorite way to do this is https://github.com/apache/lucene-solr/ Have a good hack! On Wed, Dec 31, 2014 at 10:19 AM, Rajesh rajesh.panneersel...@aspiresys.com wrote: Is there a way to get the trunk and I can update the same patch to

Re: Join in SOLR

2014-12-30 Thread Shawn Heisey
On 12/31/2014 12:19 AM, Rajesh wrote: Is there a way to get the trunk and I can update the same patch to check this functionality. If so, where can I get the trunk build? http://wiki.apache.org/solr/HowToContribute#Getting_the_source_code You will need a number of software components,