Re: Question on Lucene hot-backup functionality.

2013-07-24 Thread Shai Erera
Hi In Lucene 4.4 we've improved the snapshotting process so that you don't need to specify an ID. Also, there's a new Replicator module which can be used for just that purpose - take hot index backups of the index. It pretty much hides most of the snapshotting from you. You can read about it

Grouping field using pruned terms?

2013-07-24 Thread Ravikumar Govindarajan
TermFirstPassGroupingCollector loads all terms for a given group-by field, through FieldCache. Is it possible to instruct the class to group only pruned terms of a field, based on a user-supplied query [RangeQuery, TermQuery etc...] This way, only pruned terms are grouped and all others are

Matched words from document - Stemmed and Synonyms

2013-07-24 Thread venkatesham.gu...@igate.com
I am looking for a feature in solr that will give me all matched words in the document when I search with a word. My field uses Stemming and as well as Synonym filters. For example I have documents and part of the text goes like below 1.We were very careful about my surgery 2.are still needing

Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Ankit Murarka
Dear All, Say suppose I have 3 documents. The sample text is /*File 1 : */ Mr X David is a manager of the company. He is the senior most manager. I also want to become manager of the company. /*File 2 :*/ Mr X David manager of the company is also very senior. He happens to be the senior

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Michael McCandless
PhraseQuery? You can skip the holes created by stopwords ... e.g. QueryParser does this. Ie, the PhraseQuery becomes X David _ _ manager _ _ company if is/a/of/the are stop words, which isn't perfect (could return false matches) but should work well in practice ... Mike McCandless

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Ankit Murarka
I tried using Phrase Query with slops. Now since I am specifying the slop I also need to specify the 2nd term. In my case the 2nd term is not present. The whole string to be searched is still 1 single term. How do I skip the holes created by stopwords. I do not know before hand how many

Re: Question on Lucene hot-backup functionality.

2013-07-24 Thread Michael McCandless
This is unfortunately very trappy ... this happened with LUCENE-4876, where we added cloning of IndexDeletionPolicy on IW construction. It's very confusing that the IDP you set on your IWC is not in fact the one that IW uses... Mike McCandless http://blog.mikemccandless.com On Wed, Jul 24,

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Dawn Zoƫ Raison
Did you consider using shingles? It solves the to be or not to be problem quite nicely. Dawn On 24/07/2013 12:34, Ankit Murarka wrote: I tried using Phrase Query with slops. Now since I am specifying the slop I also need to specify the 2nd term. In my case the 2nd term is not present. The

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Michael McCandless
With PhraseQuery you can specify where each term must occur in the phrase. So X must occur in position 0, David in position 1, and then manager in position 4 (skipping 2 holes). QueryParser does this for you: when it analyzes the users phrase, if the resulting tokens have holes, then it sets the

Performance measurements

2013-07-24 Thread Sriram Sankar
I did some performance tests on a real index using a query having the following pattern: termA AND (termB1 OR termB2 OR ... OR termBn) The results were not good and I was wondering if I may be doing something wrong (and what I would need to do to improve performance), or is it just that the OR

Re: Performance measurements

2013-07-24 Thread Sriram Sankar
Clarification - I used an MMap'd index and warmed it up with similar queries, as well as running the identical query many times before starting measurements. I had ample heap space. Sriram. On Wed, Jul 24, 2013 at 9:11 AM, Sriram Sankar san...@gmail.com wrote: I did some performance tests on

Re: Performance measurements

2013-07-24 Thread Jack Krupansky
Thanks for the detailed numbers. Nothing seems unexpected to me. Increasing query complexity or term count is simply going to increase query execution time. I think I'll add a new rule to my informal performance guidance - Query complexity of no more than ten to twenty terms is a slam dunk,

Re: Performance measurements

2013-07-24 Thread Adrien Grand
Hi, On Wed, Jul 24, 2013 at 6:11 PM, Sriram Sankar san...@gmail.com wrote: termA AND (termB1 OR termB2 OR ... OR termBn) Maybe this comment is not appropriate for your use-case, but if you don't actually need scoring from the disjunction on the right of the query, a TermsFilter will be faster

Re: Performance measurements

2013-07-24 Thread Sriram Sankar
No I do not need scoring. This is a pure retrieval query - which matches what we used to do with Unicorn in Facebook - something like: (name:sriram AND (friend:1 OR friend:2 ...)) This automatically gives us second degree. With Unicorn, we would always get sub-millisecond performance even for

Re: Performance measurements

2013-07-24 Thread Jack Krupansky
Unicorn sounds like it was optimized for graph search. Specialized search engines can in fact beat out generalized search engines for specific use cases. Scoring has been a major focus of Lucene. Non-scored filters are also available, but the query parsers are focused (exclusively) on

Re: Performance measurements

2013-07-24 Thread Sriram Sankar
On Wed, Jul 24, 2013 at 10:24 AM, Jack Krupansky j...@basetechnology.comwrote: Unicorn sounds like it was optimized for graph search. Specialized search engines can in fact beat out generalized search engines for specific use cases. Yes and no (I worked on it). Yes, there are many aspect of

Re: Performance measurements

2013-07-24 Thread Jack Krupansky
I think I've exhausted my expertise in Lucene filters, but I think you can wrap a query with a filter and also wrap a filter with a query. So, for IndexSearcher.search, you could take a filter and wrap it with ConstantScoreQuery. So, if a BooleanQuery got wrapped as a filter, it could be

lucene indexwriter crash

2013-07-24 Thread ash nix
Hi, I am using lucene 4 to index very big data. The indexer crashed after three days (147Gig of current index size). I find the stack trash weird. Any ideas on this will be helpful. Exception in thread main java.io.FileNotFoundException:

Re: Tokenize String using Operators(Logical Operator, : operator etc)

2013-07-24 Thread dheerajjoshim
Greetings, I have wrote a custom tokenizer class which extends lucene tokenizer class. Thanks for all replies Regards DJ -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenize-String-using-Operators-Logical-Operator-operator-etc-tp4079673p4080225.html Sent from the

2 exceptions in IndexWriter

2013-07-24 Thread Yonghui Zhao
Recently I find my unit test will failed sometimes but no always. I use Lucene 4.3.0 After inverstigation, I found when I try to open a IndexWriter for a disk directory. Some time it will throw this exception: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: