WildCardQuery: TooManyClauses Exception

2013-04-18 Thread Arun Kumar K
Hi Guys,

I am using following queries:
1>WildCardQuery
2>BooleanQuery having a WildCardQuery and TermQuery.
WildCardQuery is field:* or say field:ab*

>From Lucene FAQs and earlier discussions about TooManyClausesException i
see that WildCardQuery gets expanded before doing search.

For that i was trying to simulate this exception with Lucene 3.0.2 and i
don't get one for WildCardQuery.
For BooleanQuery with many term queries/ clauses i could simulate but
for BooleanQuery (with a WildCardQuery & TermQuery) and for WildCardQuery i
couldn't.
I have 1 tokens matching the search for a field:* or field:ab* in the
index.

I feel expansion doesn't happen for WildCardQuery and BooleanQuery having
it considers it as one clause.

Have some implementation changed in 3.0.2 ?
Can anyone explain the Query expansion here ?

Arun


RE: WildCardQuery: TooManyClauses Exception

2013-04-18 Thread Uwe Schindler
Lucene 2.9+ has a different Wildcard Expansion using BooleanQuery only for few 
terms, otherwise it uses a filter-based approach. Same applies for range 
queries and prefix queries.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Arun Kumar K [mailto:arunk...@gmail.com]
> Sent: Thursday, April 18, 2013 12:41 PM
> To: java-user
> Subject: WildCardQuery: TooManyClauses Exception
> 
> Hi Guys,
> 
> I am using following queries:
> 1>WildCardQuery
> 2>BooleanQuery having a WildCardQuery and TermQuery.
> WildCardQuery is field:* or say field:ab*
> 
> From Lucene FAQs and earlier discussions about TooManyClausesException i
> see that WildCardQuery gets expanded before doing search.
> 
> For that i was trying to simulate this exception with Lucene 3.0.2 and i don't
> get one for WildCardQuery.
> For BooleanQuery with many term queries/ clauses i could simulate but for
> BooleanQuery (with a WildCardQuery & TermQuery) and for WildCardQuery i
> couldn't.
> I have 1 tokens matching the search for a field:* or field:ab* in the
> index.
> 
> I feel expansion doesn't happen for WildCardQuery and BooleanQuery
> having it considers it as one clause.
> 
> Have some implementation changed in 3.0.2 ?
> Can anyone explain the Query expansion here ?
> 
> Arun


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: WildCardQuery: TooManyClauses Exception

2013-04-18 Thread Arun Kumar K
Thanks Uwe for clarification !


On Thu, Apr 18, 2013 at 4:37 PM, Uwe Schindler  wrote:

> Lucene 2.9+ has a different Wildcard Expansion using BooleanQuery only for
> few terms, otherwise it uses a filter-based approach. Same applies for
> range queries and prefix queries.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Arun Kumar K [mailto:arunk...@gmail.com]
> > Sent: Thursday, April 18, 2013 12:41 PM
> > To: java-user
> > Subject: WildCardQuery: TooManyClauses Exception
> >
> > Hi Guys,
> >
> > I am using following queries:
> > 1>WildCardQuery
> > 2>BooleanQuery having a WildCardQuery and TermQuery.
> > WildCardQuery is field:* or say field:ab*
> >
> > From Lucene FAQs and earlier discussions about TooManyClausesException i
> > see that WildCardQuery gets expanded before doing search.
> >
> > For that i was trying to simulate this exception with Lucene 3.0.2 and i
> don't
> > get one for WildCardQuery.
> > For BooleanQuery with many term queries/ clauses i could simulate but for
> > BooleanQuery (with a WildCardQuery & TermQuery) and for WildCardQuery i
> > couldn't.
> > I have 1 tokens matching the search for a field:* or field:ab* in the
> > index.
> >
> > I feel expansion doesn't happen for WildCardQuery and BooleanQuery
> > having it considers it as one clause.
> >
> > Have some implementation changed in 3.0.2 ?
> > Can anyone explain the Query expansion here ?
> >
> > Arun
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Why doesn't this code run - Adding synonyms from Wordnet to Lucene Index

2013-04-18 Thread Abhishek Shivkumar
I am writing this code as part of my CustomAnalyzer:

public class CustomAnalyzer extends Analyzer {
 
SynonymMap mySynonymMap = null;
 
CustomAnalyzer() throws IOException {
SynonymMap.Builder builder = new SynonymMap.Builder(true);
 
FileReader fr = new 
FileReader("/home/watsonuser/Downloads/wordnetSynonyms.txt");
BufferedReader br = new BufferedReader(fr);
String line = "";
 
while ((line = br.readLine()) != null) {
  String[] synset = line.split(",");
  for(String syn: synset)
  builder.add(new CharsRef(synset[0]), new CharsRef(syn), 
true);
}
 
br.close();
fr.close();
 
try {
mySynonymMap = builder.build();
} catch (IOException e) {
System.out.println("Unable to build synonymMap");
e.printStackTrace();
}
}
 
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new PorterStemFilter(new SynonymFilter(
  (new StopFilter(true,new 
LowerCaseFilter
   (new StandardFilter(new 
StandardTokenizer
 (Version.LUCENE_36,reader)
)
 ),StopAnalyzer.ENGLISH_STOP_WORDS_SET)), mySynonymMap, true)
   );
 
 
}
}

Now, if I use the same CustomAnalyzer as part of my querying, then if I 
enter the query as

myFieldName: manager

it expands the query with synonyms for manager.

But, I want the synonyms to be part of only my index and I don't want my 
query to be expanded with synonyms. 

So, when I removed the SynonymFilter from my CustomAnalyzer only when 
querying the index, the query remains as

myFieldName: manager

but, it fails to retrieve documents that have the synonyms of manager.

How do we solve this problem?

Thanks
Abhishek S

Please explain the example

2013-04-18 Thread Gaurav Ranjan
I am a student and studying the functionality of Lucene for my project work.
The DocDelta example on this link is not clear
http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene40/Lucene40PostingsFormat.html?is-external=true
,

Please explain the first part how we are getting 15,8,3 as the TermFreqs
for the example.

Thanks.


Re: Taking backup of a Lucene index

2013-04-18 Thread Michael McCandless
On Wed, Apr 17, 2013 at 8:10 AM, Ashish Sarna  wrote:
> The external backup utility would be used by some other person and it would
> simply copy the index directory to take its backup. I have no control over
> this utility.

OK.

> I have ensured that nothing would be written to index before the backup
> utility is executed and now just need to ensure that it does not get changed
> due to searches and or Lucene housekeeping activities.
>
> Is there a way to ensure this?

Safest is to close the IndexWriter.

But you could probably get away with 1) stopping all indexing actions,
2) commit, 3) IndexWriter.waitForMerges and only once that returns, do
the full backup.

> Does using the IndexReader.open method with 'readOnly' flag passed as 'true'
> would help keeping the indexes from modifying when a search is performed?

No, how an IndexReader is opened on the index won't alter what
IndexWriter is doing to it.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Taking backup of a Lucene index

2013-04-18 Thread Michael McCandless
On Thu, Apr 18, 2013 at 12:32 AM, Hien Luu  wrote:
> It is difficult to associate a class named SnapshotDeletionPolicy with taking 
> backup of Lucene index.

Naming is the hardest part :)

It's a snapshot in the same sense as the ZFS file system, or a Network
Appliance file server.

What's hard here is this deletion policy can be used for things other
than hot backups, eg protecting commit poings so you don't hit Stale
NFS File Handle when searching over NFS, keeping a point-in-time
searchable commit point alive with your index at different stages,
etc.  http://blog.mikemccandless.com/2012/03/transactional-lucene.html
goes into more detail.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Complete re-indexing using lucene

2013-04-18 Thread Sandeep Jangra
Hi,

  I am using lucene in my project built in java.
  I am writing the index to a file using FSDirectory.open("c:\\temp").

  Every hour boundary I need to re-index the complete system.
  But if I use the same directory "c:\\temp" for re-indexing, the directory
size will grow eventually because of new segments being created every hour.

  Does lucene provide a clean way for handling this or would I have to
handle it in my application by having new index location.

  Please let me know and Thanks in advance for the help!


Re: Complete re-indexing using lucene

2013-04-18 Thread Michael McCandless
Just pass IndexWriterConfig.OpenMode.CREATE when you open the index on
the same location ... this will make IndexWriter remove the existing
index.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Apr 18, 2013 at 3:33 PM, Sandeep Jangra  wrote:
> Hi,
>
>   I am using lucene in my project built in java.
>   I am writing the index to a file using FSDirectory.open("c:\\temp").
>
>   Every hour boundary I need to re-index the complete system.
>   But if I use the same directory "c:\\temp" for re-indexing, the directory
> size will grow eventually because of new segments being created every hour.
>
>   Does lucene provide a clean way for handling this or would I have to
> handle it in my application by having new index location.
>
>   Please let me know and Thanks in advance for the help!

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



How is the term frequency calculated if I have to add a user-generated document.

2013-04-18 Thread Gaurav Ranjan
I am a student and studying the functionality of Lucene for my project work.

If I have to add a new user-generated document in lucene with a term having
a particular frequency just like any text file, how do I do it?
For eg, say I have to add the following documents analyzed from an image

doc1 =
{ contents field:
{"red (X15 times) blue(X10 times)"} ,
  name field:
{"doc1"}
}

doc2 =
{ contents field:
{"red (X10 times) blue(X18 times)"} ,
  name field:
{"doc2"}
}

Now when indexing, I should have term freq for "red" as 15 for doc1 and 10
for doc2 ?
The documents doc1 and doc2 can be indexed alongwith the normal text files
if only we can update the frequencies manually. Here I need to have
frequencies indexed as well
(FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS).


The DocDelta example provided on this link (
http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene40/Lucene40PostingsFormat.html?is-external=true)
says :

FreqFile (.frq) --> Header,  TermCount
Header --> CodecHeader
TermFreqs -->  DocFreq
TermFreq --> DocDelta[, Freq?]
SkipData --> < NumSkipLevels-1, SkipLevel>

SkipLevel -->  DocFreq/(SkipInterval^(Level + 1))
SkipDatum -->
DocSkip,PayloadLength?,OffsetLength?,FreqSkip,ProxSkip,SkipChildLevelPointer?
DocDelta,Freq,DocSkip,PayloadLength,OffsetLength,FreqSkip,ProxSkip --> VInt
SkipChildLevelPointer --> VLong


"For example, the TermFreqs for a term which occurs once in document seven
and three times in document eleven, with frequencies indexed, would be the
following sequence of VInts:

15, 8, 3

If frequencies were omitted (FieldInfo.IndexOptions.DOCS_ONLY) it would be
this sequence of VInts instead:

7,4"

So what should be the DocDelta values for doc1 and doc2 and how? Please
provide any other useful links.

Thanks.