WildCardQuery: TooManyClauses Exception
Hi Guys, I am using following queries: 1>WildCardQuery 2>BooleanQuery having a WildCardQuery and TermQuery. WildCardQuery is field:* or say field:ab* >From Lucene FAQs and earlier discussions about TooManyClausesException i see that WildCardQuery gets expanded before doing search. For that i was trying to simulate this exception with Lucene 3.0.2 and i don't get one for WildCardQuery. For BooleanQuery with many term queries/ clauses i could simulate but for BooleanQuery (with a WildCardQuery & TermQuery) and for WildCardQuery i couldn't. I have 1 tokens matching the search for a field:* or field:ab* in the index. I feel expansion doesn't happen for WildCardQuery and BooleanQuery having it considers it as one clause. Have some implementation changed in 3.0.2 ? Can anyone explain the Query expansion here ? Arun
RE: WildCardQuery: TooManyClauses Exception
Lucene 2.9+ has a different Wildcard Expansion using BooleanQuery only for few terms, otherwise it uses a filter-based approach. Same applies for range queries and prefix queries. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Arun Kumar K [mailto:arunk...@gmail.com] > Sent: Thursday, April 18, 2013 12:41 PM > To: java-user > Subject: WildCardQuery: TooManyClauses Exception > > Hi Guys, > > I am using following queries: > 1>WildCardQuery > 2>BooleanQuery having a WildCardQuery and TermQuery. > WildCardQuery is field:* or say field:ab* > > From Lucene FAQs and earlier discussions about TooManyClausesException i > see that WildCardQuery gets expanded before doing search. > > For that i was trying to simulate this exception with Lucene 3.0.2 and i don't > get one for WildCardQuery. > For BooleanQuery with many term queries/ clauses i could simulate but for > BooleanQuery (with a WildCardQuery & TermQuery) and for WildCardQuery i > couldn't. > I have 1 tokens matching the search for a field:* or field:ab* in the > index. > > I feel expansion doesn't happen for WildCardQuery and BooleanQuery > having it considers it as one clause. > > Have some implementation changed in 3.0.2 ? > Can anyone explain the Query expansion here ? > > Arun - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: WildCardQuery: TooManyClauses Exception
Thanks Uwe for clarification ! On Thu, Apr 18, 2013 at 4:37 PM, Uwe Schindler wrote: > Lucene 2.9+ has a different Wildcard Expansion using BooleanQuery only for > few terms, otherwise it uses a filter-based approach. Same applies for > range queries and prefix queries. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Arun Kumar K [mailto:arunk...@gmail.com] > > Sent: Thursday, April 18, 2013 12:41 PM > > To: java-user > > Subject: WildCardQuery: TooManyClauses Exception > > > > Hi Guys, > > > > I am using following queries: > > 1>WildCardQuery > > 2>BooleanQuery having a WildCardQuery and TermQuery. > > WildCardQuery is field:* or say field:ab* > > > > From Lucene FAQs and earlier discussions about TooManyClausesException i > > see that WildCardQuery gets expanded before doing search. > > > > For that i was trying to simulate this exception with Lucene 3.0.2 and i > don't > > get one for WildCardQuery. > > For BooleanQuery with many term queries/ clauses i could simulate but for > > BooleanQuery (with a WildCardQuery & TermQuery) and for WildCardQuery i > > couldn't. > > I have 1 tokens matching the search for a field:* or field:ab* in the > > index. > > > > I feel expansion doesn't happen for WildCardQuery and BooleanQuery > > having it considers it as one clause. > > > > Have some implementation changed in 3.0.2 ? > > Can anyone explain the Query expansion here ? > > > > Arun > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Why doesn't this code run - Adding synonyms from Wordnet to Lucene Index
I am writing this code as part of my CustomAnalyzer: public class CustomAnalyzer extends Analyzer { SynonymMap mySynonymMap = null; CustomAnalyzer() throws IOException { SynonymMap.Builder builder = new SynonymMap.Builder(true); FileReader fr = new FileReader("/home/watsonuser/Downloads/wordnetSynonyms.txt"); BufferedReader br = new BufferedReader(fr); String line = ""; while ((line = br.readLine()) != null) { String[] synset = line.split(","); for(String syn: synset) builder.add(new CharsRef(synset[0]), new CharsRef(syn), true); } br.close(); fr.close(); try { mySynonymMap = builder.build(); } catch (IOException e) { System.out.println("Unable to build synonymMap"); e.printStackTrace(); } } public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new PorterStemFilter(new SynonymFilter( (new StopFilter(true,new LowerCaseFilter (new StandardFilter(new StandardTokenizer (Version.LUCENE_36,reader) ) ),StopAnalyzer.ENGLISH_STOP_WORDS_SET)), mySynonymMap, true) ); } } Now, if I use the same CustomAnalyzer as part of my querying, then if I enter the query as myFieldName: manager it expands the query with synonyms for manager. But, I want the synonyms to be part of only my index and I don't want my query to be expanded with synonyms. So, when I removed the SynonymFilter from my CustomAnalyzer only when querying the index, the query remains as myFieldName: manager but, it fails to retrieve documents that have the synonyms of manager. How do we solve this problem? Thanks Abhishek S
Please explain the example
I am a student and studying the functionality of Lucene for my project work. The DocDelta example on this link is not clear http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene40/Lucene40PostingsFormat.html?is-external=true , Please explain the first part how we are getting 15,8,3 as the TermFreqs for the example. Thanks.
Re: Taking backup of a Lucene index
On Wed, Apr 17, 2013 at 8:10 AM, Ashish Sarna wrote: > The external backup utility would be used by some other person and it would > simply copy the index directory to take its backup. I have no control over > this utility. OK. > I have ensured that nothing would be written to index before the backup > utility is executed and now just need to ensure that it does not get changed > due to searches and or Lucene housekeeping activities. > > Is there a way to ensure this? Safest is to close the IndexWriter. But you could probably get away with 1) stopping all indexing actions, 2) commit, 3) IndexWriter.waitForMerges and only once that returns, do the full backup. > Does using the IndexReader.open method with 'readOnly' flag passed as 'true' > would help keeping the indexes from modifying when a search is performed? No, how an IndexReader is opened on the index won't alter what IndexWriter is doing to it. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Taking backup of a Lucene index
On Thu, Apr 18, 2013 at 12:32 AM, Hien Luu wrote: > It is difficult to associate a class named SnapshotDeletionPolicy with taking > backup of Lucene index. Naming is the hardest part :) It's a snapshot in the same sense as the ZFS file system, or a Network Appliance file server. What's hard here is this deletion policy can be used for things other than hot backups, eg protecting commit poings so you don't hit Stale NFS File Handle when searching over NFS, keeping a point-in-time searchable commit point alive with your index at different stages, etc. http://blog.mikemccandless.com/2012/03/transactional-lucene.html goes into more detail. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Complete re-indexing using lucene
Hi, I am using lucene in my project built in java. I am writing the index to a file using FSDirectory.open("c:\\temp"). Every hour boundary I need to re-index the complete system. But if I use the same directory "c:\\temp" for re-indexing, the directory size will grow eventually because of new segments being created every hour. Does lucene provide a clean way for handling this or would I have to handle it in my application by having new index location. Please let me know and Thanks in advance for the help!
Re: Complete re-indexing using lucene
Just pass IndexWriterConfig.OpenMode.CREATE when you open the index on the same location ... this will make IndexWriter remove the existing index. Mike McCandless http://blog.mikemccandless.com On Thu, Apr 18, 2013 at 3:33 PM, Sandeep Jangra wrote: > Hi, > > I am using lucene in my project built in java. > I am writing the index to a file using FSDirectory.open("c:\\temp"). > > Every hour boundary I need to re-index the complete system. > But if I use the same directory "c:\\temp" for re-indexing, the directory > size will grow eventually because of new segments being created every hour. > > Does lucene provide a clean way for handling this or would I have to > handle it in my application by having new index location. > > Please let me know and Thanks in advance for the help! - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
How is the term frequency calculated if I have to add a user-generated document.
I am a student and studying the functionality of Lucene for my project work. If I have to add a new user-generated document in lucene with a term having a particular frequency just like any text file, how do I do it? For eg, say I have to add the following documents analyzed from an image doc1 = { contents field: {"red (X15 times) blue(X10 times)"} , name field: {"doc1"} } doc2 = { contents field: {"red (X10 times) blue(X18 times)"} , name field: {"doc2"} } Now when indexing, I should have term freq for "red" as 15 for doc1 and 10 for doc2 ? The documents doc1 and doc2 can be indexed alongwith the normal text files if only we can update the frequencies manually. Here I need to have frequencies indexed as well (FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS). The DocDelta example provided on this link ( http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene40/Lucene40PostingsFormat.html?is-external=true) says : FreqFile (.frq) --> Header, TermCount Header --> CodecHeader TermFreqs --> DocFreq TermFreq --> DocDelta[, Freq?] SkipData --> < NumSkipLevels-1, SkipLevel> SkipLevel --> DocFreq/(SkipInterval^(Level + 1)) SkipDatum --> DocSkip,PayloadLength?,OffsetLength?,FreqSkip,ProxSkip,SkipChildLevelPointer? DocDelta,Freq,DocSkip,PayloadLength,OffsetLength,FreqSkip,ProxSkip --> VInt SkipChildLevelPointer --> VLong "For example, the TermFreqs for a term which occurs once in document seven and three times in document eleven, with frequencies indexed, would be the following sequence of VInts: 15, 8, 3 If frequencies were omitted (FieldInfo.IndexOptions.DOCS_ONLY) it would be this sequence of VInts instead: 7,4" So what should be the DocDelta values for doc1 and doc2 and how? Please provide any other useful links. Thanks.