constructing smaller phrase queries given a multi-word query

2006-10-16 Thread Mek
Has anyone dealt with the problem of constructing sub-queries given a multi-word query ? Here is an example to illustrate what I mean: user queries for - A B C D right now I change that query to A B C D A B C D to give phrase matches higher weightage. What might happen though, is that the user

Re: Query not finding indexed data

2006-10-16 Thread Antony Bowesman
Doron Cohen wrote: Hi Antony, you cannot instruct the query parser to do that. Note that an Thanks, I suspected as much. I've changed it to make the field tokenized. field name. This is an application logic to know that a certain query is not to be tokenized. In this case you could create

Re: Query not finding indexed data

2006-10-16 Thread Erik Hatcher
On Oct 16, 2006, at 2:44 AM, Antony Bowesman wrote: Doron Cohen wrote: Hi Antony, you cannot instruct the query parser to do that. Note that an Thanks, I suspected as much. I've changed it to make the field tokenized. field name. This is an application logic to know that a certain

Searching pdf, getting page number

2006-10-16 Thread Christoph Pächter
Hi, I know that I can index pdf-files (using a third-party library). Is it possible to search the index for a phrase, getting not only the document, but also the page number in the (pdf-)document? Or is it even possible to get a bookmark, leading to this page? I am thankful for any information

Parallel Index Search

2006-10-16 Thread Supriya Kumar Shyamal
Hello All, If I am not mistaken the process of locking the Index by different objects like IndexReader or Indexwriter, theoratically only one Thread can access the index at a time. When we do search on the index it creates a commit lock so the other thread does not modify the index, so

Re: Parallel Index Search

2006-10-16 Thread Supriya Kumar Shyamal
Michael McCandless wrote: Supriya Kumar Shyamal wrote: If I am not mistaken the process of locking the Index by different objects like IndexReader or Indexwriter, theoratically only one Thread can access the index at a time. Actually, only one writer can write to the index at once.

Re: Looking for a stemmer that can return all inflected forms

2006-10-16 Thread Steven Rowe
Hi Jong, Jong Kim wrote: I'm looking for a stemmer that is capable of returning all morphological variants of a query term (to be used for high-recall search). For example, given a query term of 'cares', I would like to be able to generate 'cares', 'care', 'cared', and 'caring'. To

Re: Searching pdf, getting page number

2006-10-16 Thread Steven Rowe
Hi Bill, Bill Taylor wrote: On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote: I know that I can index pdf-files (using a third-party library). Could you please tell me where to find this library? There are several PDF extraction packages listed here (look under the Lucene Document

BooleanQuery.TooManyClauses exception

2006-10-16 Thread Bushey, John
Hi - Can someone explain the reason why I'm getting the TooManyClauses exception? I have a general understanding of the issue based on my reading, but I don't understand the mechanics of the it. Specifically how is my query being expanded to cause this problem? How am I exceeding the default

Re: BooleanQuery.TooManyClauses exception

2006-10-16 Thread Chris Hostetter
RangeQueries expand to a boolean query containing all terms in the range, so it doesn't matter if you search on a course grain range, if you store the dates with high granulatiry -- the number of terms will be high. this wiki page discusses some of the merrits of using multiple date fields with

Re: java.io.IOException: read past EOF

2006-10-16 Thread John Gilbert
turns out i needed a seek method. i ended up modeling it after the RAM Directory. i turned the RAMFile into an @Entity. the directory accesses the EntityManager. and i am using JBossCache. preliminary testing shows comparable response times.

Help with Custom Analyzer

2006-10-16 Thread Ryan O'Hara
I have a few questions regarding writing a custom analyzer. My situation is that I would like to use the StandardAnalyzer but with some data-specific rules. I was wondering if there was a way of telling the StandardAnalyzer to treat a string of text, that would normally be tokenized into

Re: Help with Custom Analyzer

2006-10-16 Thread Otis Gospodnetic
Hi Ryan, StandardAnalyzer should already be smart about keeping email addresses as a single token: // email addresses | EMAIL: ALPHANUM ((.|-|_) ALPHANUM)* @ ALPHANUM ((.|-) ALPHANUM)+ (this is from StandardAnalyzer.jj) As for changing the text you feed to Lucene, that's all up to you.

Re: Help with Custom Analyzer

2006-10-16 Thread Bill Taylor
It is not THAT hard to write a custom analyzer, that is what I did. I found that there is a bug in the setup, however, in that there are two incompatible definitions of Token. The generated file xxTokenizer.java refers to the wrong definition of Token so I ahve to patch it before it will

Re: Help with Custom Analyzer

2006-10-16 Thread Ryan O'Hara
Sorry, I wasn't really concerned with email addresses - I was just using that as an example. How would I tell the StandardAnalyzer that I want a certain phrase to be tokenized as a token? Surround by quotes or ..? Also, how would you recommend manipulating the Reader object? You said

Re: Help with Custom Analyzer

2006-10-16 Thread Doron Cohen
Otis Gospodnetic [EMAIL PROTECTED] wrote on 16/10/2006 14:32:13: Hi Ryan, StandardAnalyzer should already be smart about keeping email addresses as a single token: // email addresses | EMAIL: ALPHANUM ((.|-|_) ALPHANUM)* @ ALPHANUM ((.|-) ALPHANUM)+ (this is from StandardAnalyzer.jj)

PrefixFilter and WildcardQuery

2006-10-16 Thread vasu shah
Hi, I have have multiple fields that I need to search on. All these fields need to support wildcard search. I am ANDing these search fields using BooleanQuery. There is no need for score in my search. How do I implement these. I have seen PrefixFilter and it sounds promising. But then how do

Re: PrefixFilter and WildcardQuery

2006-10-16 Thread Doron Cohen
hi Vasu, how about using ChainedFilter(yourPrefixFilters[], ChainedFilter.AND)? vasu shah [EMAIL PROTECTED] wrote on 16/10/2006 17:50:27: Hi, I have have multiple fields that I need to search on. All these fields need to support wildcard search. I am ANDing these search fields using

Re: PrefixFilter and WildcardQuery

2006-10-16 Thread Erick Erickson
Well, depending on what you mean by wildcard, a prefixfilter isn't necessarily what you want. If wildcard means abc*, then prefixfilter is right. If it means ab*cd?fg, a prefix filter isn't useful unless you want to do some fancy indexing. Think about writing your own filter. Wrap it in a

reloading index

2006-10-16 Thread EDMOND KEMOKAI
Hi Guys How do you reload an index. I have a webapp which might need to be redeployed but whenever i test FSDirectory.list(), nothing is returned. The segments and .cfs file is in the directory but those aren't recognized either. -- talk trash and carry a small stick. PAUL KRUGMAN (NYT)

reading indice

2006-10-16 Thread EDMOND KEMOKAI
Can someone tell me how read an index into memory, or how to open an existing index for reading? -- talk trash and carry a small stick. PAUL KRUGMAN (NYT)

Re: reading indice

2006-10-16 Thread heritrix . lucene
Read *org.apache.lucene.index.IndexReader *And *org.apache.lucene.search.IndexSearcher There are description available in these docs. * On 10/17/06, EDMOND KEMOKAI [EMAIL PROTECTED] wrote: Can someone tell me how read an index into memory, or how to open an existing index for reading?