Does the Lucene search engine work with PDF's?

2003-10-20 Thread Konrad Kolosowski

Return Receipt
   
Your  Does the Lucene search engine work with PDF's?   
document   
:  
   
was   Konrad Kolosowski/Toronto/IBM
received   
by:
   
at:   10/20/2003 12:15:25  
   





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: OutOfMemoryErrors searching with WildCardQueries

2003-06-12 Thread Konrad Kolosowski
After Dave Kor put me on track, I thought I will need to dive into hacking
Lucene on my own, but having the fix already in the repository is great.
Thank you Doug.
I assume the fix will be picked up by 1.3 release.  Is there an expected
time frame for 1.3 Final build?
Thanks.

Konrad Kolosowski



   

  Doug Cutting 

  [EMAIL PROTECTED]To:   Lucene Users List [EMAIL 
PROTECTED]
  om  cc: 

   Subject:  Re: OutOfMemoryErrors 
searching with WildCardQueries  
  06/12/2003 02:28 

  PM   

  Please respond to

  Lucene Users

  List

   




Konrad Kolosowski wrote:
 If the index grows to hundred thousand documents, with users
simultaneously
 searching indexes for different locales, what is the best way to cup the
 memory requirement?  Limiting number of terms, or number of terms
 containing wild cards, or eliminating wild card searches altogether.

This was discussed recently on [EMAIL PROTECTED] in a thread
whose subject contains too many hits - OutOfMemoryError.

I checked in a patch which limits the number of terms that a wildcard is
permitted to expand into.  The default is 1000.  If a term expands to
more than that then an exception is thrown.  Each term that a wildcard
expands into requires around 2kB.  So this limits each wildcarded query
term to 2MB.  If you have queries with large numbers of wildcarded terms
then you might consider also limiting that.

This patch is in the latest version of Lucene in CVS, but not yet in a
release.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



OutOfMemoryErrors searching with WildCardQueries

2003-06-11 Thread Konrad Kolosowski
I need to proof an on-line system against Out Of Memory Errors, that some
times crash our system.  The system allows boolean searches with wild
cards.

It is not recommended to use WildCardQuery with wild card at the first
position.   Having wildcard at first position works for small number of
documents in the index but results in errors for a larger index (containing
3k of 1-2 pages docs).  If one types a query with many wild cards, close to
the beginning of terms, e.g.  a* OR b* OR ... OR z*, is not it going to
lead to the same problem?

If I impose a requirement that not first one but first 3 letters of a word
in a query cannot be a wild card.  Will it provide an additional safety and
reduce the memory consumption during search?  If it does than I think it
probably would not help when index contains large number of terms with
common prefix anyway.

If the index grows to hundred thousand documents, with users simultaneously
searching indexes for different locales, what is the best way to cup the
memory requirement?  Limiting number of terms, or number of terms
containing wild cards, or eliminating wild card searches altogether.

Thanks for explanation or any pointers.

Konrad Kolosowski


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]