bug in search with sloppy queries

2015-06-14 Thread Dmitry Kan
Hi guys, We observe some strange bug in solr 4.10.2, where by a sloppy query hits words it should not: lst name=debugstr name=rawquerystringthe e commerce/strstr name=querystringthe e commerce/strstr name=parsedquerySpanNearQuery(spanNear([Contents:the, spanNear([Contents:eä,

Phrase query get converted to SpanNear with slop 1 instead of 0

2015-06-14 Thread ariya bala
Hi, I encounter this peculiar case with solr 4.10.2 where the parsed query doesnt seem to be logical. PHRASE23(reduce workforce) == SpanNearQuery(spanNear([spanNear([Contents:reduceä, Contents:workforceä], 1, true)], 23, true)) The question is why does the Phrase(quoted string) gets converted

Re: What's wrong

2015-06-14 Thread Test Test
Re, Thanks for your reply. I mock my parser like this : @Overridepublic Query parse() {      SpanQuery[] clauses = new SpanQuery[2];       clauses[0] = new SpanTermQuery(new Term(details, london));        clauses[1] = new SpanTermQuery(new Term(details, city));      return new

Integrating Solr 5.2.0 with nutch 1.10

2015-06-14 Thread kunal chakma
Hi, I am very new to the nutch and solr plateform. I have been trying a lot to integrate Solr 5.2.0 with nutch 1.10 but not able to do so. I have followed all the steps mentioned at nutch 1.x tutorial page but when I execute the following command , bin/nutch solrindex

Re: Limitation on Collections Number

2015-06-14 Thread Jack Krupansky
As a general rule, there are only two ways that Solr scales to large numbers: large number of documents and moderate number of nodes (shards and replicas). All other parameters should be kept relatively small, like dozens or low hundreds. Even shards and replicas should probably kept down to that

Limitation on Collections Number

2015-06-14 Thread Arnon Yogev
We're running some tests on Solr and would like to have a deeper understanding of its limitations. Specifically, We have tens of millions of documents (say 50M) and are comparing several #collections X #docs_per_collection configurations. For example, we could have a single collection with 50M

Re: Issues with using Paoding to index Chinese characters

2015-06-14 Thread Upayavira
When in 2012? I'd give it a go with Solr 3.6 if you don't want to modify the library. Upayavira On Sun, Jun 14, 2015, at 04:14 AM, Zheng Lin Edwin Yeo wrote: I'm still trying to find out which version it is compatible for, but the document which I've followed is written in 2012.

Re: Limitation on Collections Number

2015-06-14 Thread Shai Erera
My answer remains the same - a large number of collections (cores) in a single Solr instance is not one of the ways in which Solr is designed to scale. To repeat, there are only two ways to scale Solr, number of documents and number of nodes. Jack, I understand that, but I still feel you're

Re: Integrating Solr 5.2.0 with nutch 1.10

2015-06-14 Thread Erick Erickson
No clue, you'd probably have better luck on the Nutch user's list unless there are _Solr_ errors. Does your Solr log show any errors? Best, Erick On Sun, Jun 14, 2015 at 6:49 AM, kunal chakma kchax4...@gmail.com wrote: Hi, I am very new to the nutch and solr plateform. I have been trying

Re: Limitation on Collections Number

2015-06-14 Thread Erick Erickson
To my knowledge there's nothing built in to Solr to limit the number of collections. There's nothing explicitly in place to handle many hundreds of collections either so you're really in uncharted, certainly untested waters. Anecdotally we've heard of the problem you're describing. You say you

Solrj Tika/Cell not using defaultField

2015-06-14 Thread Charlie Hubbard
I'm having trouble getting Solr to pay attention to the defaultField value when I send a document to Solr Cell or Tika. Here is my post I'm sending using Solrj POST /solr/collection1/update/extract?extractOnly=truedefaultField=textwt=javabinversion=2 HTTP/1.1 When I get the response back the

Re: Limitation on Collections Number

2015-06-14 Thread Erick Erickson
re: hybrid approach. Hmmm, _assuming_ that no single user has a really huge number of documents you might be able to use a single collection (or much smaller group of collections), by using custom routing. That allows you to send all the docs for a particular user to a particular shard. There are

Re: What's wrong

2015-06-14 Thread Jack Krupansky
Why don't you take a step back and tell us what you are really trying to do. Try using a normal Solr query parser first, to verify that the data is analyzed as expected. Did you try using the surround query parser? It supports span queries. Your span query appears to require that the two terms

Re: Limitation on Collections Number

2015-06-14 Thread Shai Erera
Thanks Jack for your response. But I think Arnon's question was different. If you need to index 10,000 different collection of documents in Solr (say a collection denotes someone's Dropbox files), then you have two options: index all collections in one Solr collection, and add a field like

Re: file index format

2015-06-14 Thread Frank Ralf
Looks like this has been solved recently in the current dev branch: SimplePostTool (and thus bin/post) cannot index files with unknown extensions https://issues.apache.org/jira/browse/SOLR-7546 -- View this message in context:

Re: bug in search with sloppy queries

2015-06-14 Thread Erick Erickson
My guess is that you have WordDelimiterFilterFactory in your analysis chain with parameters that break up E-Tail to both e and tail _and_ put them in the same position. This assumes that the result fragment you pasted is incomplete and commerce is in it From emE/em-Tail emcommerce/em or some

Re: Limitation on Collections Number

2015-06-14 Thread Jack Krupansky
My answer remains the same - a large number of collections (cores) in a single Solr instance is not one of the ways in which Solr is designed to scale. To repeat, there are only two ways to scale Solr, number of documents and number of nodes. -- Jack Krupansky On Sun, Jun 14, 2015 at 11:00 AM,

Re: Limitation on Collections Number

2015-06-14 Thread Shalin Shekhar Mangar
Yes, there are some known problems while scaling to large number of collections, say 1000 or above. See https://issues.apache.org/jira/browse/SOLR-7191 On Sun, Jun 14, 2015 at 8:30 PM, Shai Erera ser...@gmail.com wrote: Thanks Jack for your response. But I think Arnon's question was different.

Re: file index format

2015-06-14 Thread Frank Ralf
Hi, I face the same problem when trying to index DITA XML files. These are XML files but have the file extension .dita which Solr ignores. According to java -jar post.jar -h only the following file extensions are supported: /-Dfiletypes=type[,type,...]

Re: Division with Stats Component when Grouping in Solr

2015-06-14 Thread kingofhypocrites
I think I have this about working with the analytics component. It seems to fill in all the gaps that the stats component and the json facet don't support. It solved the following problems for me: - I am able to perform math on stats to form other stats.. Then i can sort on those as needed. -

Please help test the new Angular JS Admin UI

2015-06-14 Thread Erick Erickson
And anyone who, you know, really likes working with UI code please help making it better! As of Solr 5.2, there is a new version of the Admin UI available, and several improvements are already in 5.2.1 (release imminent). The old admin UI is still the default, the new one is available at

Re: Division with Stats Component when Grouping in Solr

2015-06-14 Thread Erick Erickson
Why it isn't in core Solr... Because it doesn't (and probably can't) support distributed mode. The Streaming aggregation stuff, and the (in trunk Real Soon Now) Parallel SQL support are where the effort is going to support this kind of stuff. https://issues.apache.org/jira/browse/SOLR-7560

Re: Issues with using Paoding to index Chinese characters

2015-06-14 Thread Zheng Lin Edwin Yeo
But I think Solr 3.6 is too far back to fall back to as I'm already using Solr 5.1. Regards, Edwin On 14 June 2015 at 14:49, Upayavira u...@odoko.co.uk wrote: When in 2012? I'd give it a go with Solr 3.6 if you don't want to modify the library. Upayavira On Sun, Jun 14, 2015, at 04:14 AM,

invalid index version and generation

2015-06-14 Thread Summer Shire
Hi all, Every time I optimize my index with maxSegment=2 after some time the replication fails to get filelist for a given generation. Looks like the index version and generation count gets messed up. (If the maxSegment=1 this never happens. I am able to successfully reproduce this by

RE: Solr Exact match boost Reduce the results

2015-06-14 Thread JACK
Hi chillra, I have changed the index and query filed configuration to tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ But still my problem not solved , it won't resolve my problem. -- View this message in context:

Re: file index format

2015-06-14 Thread Frank Ralf
This issue has also already been discussed in the Tika issue queue: Add method get file extension from MimeTypes https://issues.apache.org/jira/browse/TIKA-538 And http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml does support DITA