Does the Lucene search engine work with PDF's?
Return Receipt Your Does the Lucene search engine work with PDF's? document : was Konrad Kolosowski/Toronto/IBM received by: at: 10/20/2003 12:15:25 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: OutOfMemoryErrors searching with WildCardQueries
After Dave Kor put me on track, I thought I will need to dive into hacking Lucene on my own, but having the fix already in the repository is great. Thank you Doug. I assume the fix will be picked up by 1.3 release. Is there an expected time frame for 1.3 Final build? Thanks. Konrad Kolosowski Doug Cutting [EMAIL PROTECTED]To: Lucene Users List [EMAIL PROTECTED] om cc: Subject: Re: OutOfMemoryErrors searching with WildCardQueries 06/12/2003 02:28 PM Please respond to Lucene Users List Konrad Kolosowski wrote: If the index grows to hundred thousand documents, with users simultaneously searching indexes for different locales, what is the best way to cup the memory requirement? Limiting number of terms, or number of terms containing wild cards, or eliminating wild card searches altogether. This was discussed recently on [EMAIL PROTECTED] in a thread whose subject contains too many hits - OutOfMemoryError. I checked in a patch which limits the number of terms that a wildcard is permitted to expand into. The default is 1000. If a term expands to more than that then an exception is thrown. Each term that a wildcard expands into requires around 2kB. So this limits each wildcarded query term to 2MB. If you have queries with large numbers of wildcarded terms then you might consider also limiting that. This patch is in the latest version of Lucene in CVS, but not yet in a release. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
OutOfMemoryErrors searching with WildCardQueries
I need to proof an on-line system against Out Of Memory Errors, that some times crash our system. The system allows boolean searches with wild cards. It is not recommended to use WildCardQuery with wild card at the first position. Having wildcard at first position works for small number of documents in the index but results in errors for a larger index (containing 3k of 1-2 pages docs). If one types a query with many wild cards, close to the beginning of terms, e.g. a* OR b* OR ... OR z*, is not it going to lead to the same problem? If I impose a requirement that not first one but first 3 letters of a word in a query cannot be a wild card. Will it provide an additional safety and reduce the memory consumption during search? If it does than I think it probably would not help when index contains large number of terms with common prefix anyway. If the index grows to hundred thousand documents, with users simultaneously searching indexes for different locales, what is the best way to cup the memory requirement? Limiting number of terms, or number of terms containing wild cards, or eliminating wild card searches altogether. Thanks for explanation or any pointers. Konrad Kolosowski - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]