Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Daniel Naber
On Tuesday 23 November 2004 00:06, Kevin A. Burton wrote: I'm wondering about the potential for a generic JDBCDirectory for keeping the lucene index within a database. Such a thing already exists: http://ppinew.mnis.com/jdbcdirectory/, but I don't know about its scalability. Regards Daniel

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Paul Elschot
Chris, On Tuesday 23 November 2004 03:25, Hoss wrote: (NOTE: numbers in [] indicate Footnotes) I'm rather new to Lucene (and this list), so if I'm grossly misunderstanding things, forgive me. One of my main needs as I investigate Search technologies is to restrict results based on Ranges

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Morus Walter
Hoss writes: (c) Filtering. Filters in general make a lot of sense to me. They are a way to specify (at query time) that only a certain subset of the index should be considered for results. The Filter class has a very straight forward API that seems very easy to subclass to get the

Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Erik Hatcher
Also, there is a DBDirectory in the sandbox to store a Lucene index inside Berkeley DB. Erik On Nov 22, 2004, at 6:06 PM, Kevin A. Burton wrote: It seems that when compared to other datastores that Lucene starts to fall down. For example lucene doesn't perform online index

Re: too many files open issue

2004-11-23 Thread Neelam Bhatnagar
Hi Dmitry, Thank you so much for your reply. I'd like to answer your specific questions. It also depends on whether you are using compound files or not (this is a flag on the IndexWriter). With compound files flag on, segments have fixed number of files, regardless of how many fields you

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Doug Cutting
Hoss wrote: The attachment contains my RangeFilter, a unit test that demonstrates it, and a Benchmarking unit test that does a side-by-side comparison with RangeQuery [6]. If developers feel that this class is useful, then by all means roll it into the code base. (90% of it is cut/pasted from

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Erik Hatcher
On Nov 22, 2004, at 9:25 PM, Hoss wrote: I'm rather new to Lucene (and this list), so if I'm grossly misunderstanding things, forgive me. You're spot on! But I was surprised then to see the following quote from Erik Hatcher in the archives: In fact, DateFilter by itself is practically of no

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Erik Hatcher
On Nov 23, 2004, at 4:18 AM, Doug Cutting wrote: Hoss wrote: The attachment contains my RangeFilter, a unit test that demonstrates it, and a Benchmarking unit test that does a side-by-side comparison with RangeQuery [6]. If developers feel that this class is useful, then by all means roll it

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Chris Hostetter
: Done. I deprecated DateField and DateFilter, and added the RangeFilter : class contributed by Chris. : : I did a little code cleanup, Chris, renaming some RangeFilter variables : and correcting typos in the Javadocs. Let me know if everything looks : ok. Wow ... that was fast. Things look

modifying existing index

2004-11-23 Thread Santosh
I am using lucene for indexing, when I am creating Index the docuemnts are added. but when I want to modify the single existing document and reIndex again, it is taking as new document and adding one more time, so that I am getting same document twice in the results. To overcome this I am

Re: modifying existing index

2004-11-23 Thread Luke Francl
On Tue, 2004-11-23 at 13:59, Santosh wrote: I am using lucene for indexing, when I am creating Index the docuemnts are added. but when I want to modify the single existing document and reIndex again, it is taking as new document and adding one more time, so that I am getting same document

RE: modifying existing index

2004-11-23 Thread Will Allen
To update a document you need to insert the modified document, then delete the old one. Here is some code that I use to get you going in the right direction (it wont compile, but if you follow it closely you will see how I take an array of lucene documents with new properties and add them,

experiences with PDF files

2004-11-23 Thread Paul
Hi, I read a lot of mails about the time consuming pdf-parsing and tried myself some solutions. My example PDF file has 181 pages in 1,5 MB (mostly text nearly no grafics). -with pdfbox.org's toolkit it took 17m32s to parseread it's content -after installing ghostscript and ps2text / ps2ascii my

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Erik Hatcher
On Nov 23, 2004, at 10:01 AM, Praveen Peddi wrote: Chris's RangeFilter does not cache anything where as QueryFilter does caching. Is it better to add the caching funtionality to RangeFilter also? or does it not make any difference? Caching is a different _aspect_. Filtering and caching are not

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Yonik Seeley
I think it depends on the query. If the query (q1) covers a large number of documents and the fiter covers a very small number, then using a RangeFilter will probably be slower than a RangeQuery. -Yonik See, this is what I'm not getting: what is the advantage of the second world? :) ... in

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Yonik Seeley
Hmmm, scratch that. I explained the tradeoff of a filter vs a range query - not between the different types of filters you talk about. --- Yonik Seeley [EMAIL PROTECTED] wrote: I think it depends on the query. If the query (q1) covers a large number of documents and the fiter covers a very

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Erik Hatcher
On Nov 23, 2004, at 2:16 PM, Chris Hostetter wrote: : I did a little code cleanup, Chris, renaming some RangeFilter variables : and correcting typos in the Javadocs. Let me know if everything looks : ok. Wow ... that was fast. Things look fine to me (typo's in javadocs are my specialty) but

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Erik Hatcher
On Nov 23, 2004, at 3:41 PM, Erik Hatcher wrote: On Nov 23, 2004, at 2:16 PM, Chris Hostetter wrote: First: Is there any reason Matt Quail's LongField class hasn't been added to CVS (or has it and I'm just not seeing it?) Laziness is the only reason, at least on my part. I think adding it is a

retrieving added document

2004-11-23 Thread Paul
Hi, I'm creating a document and adding it with a writer to the index. For some reason I need to add data to this specific document later on (minutes, not hours or days). Is it possible to retrieve it and add additonal data? I found the document(int n) - method within the IndexReader (btw: the

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Chris Hostetter
: Note that I said FilteredQuery, not QueryFilter. Doh .. right sorry, I confused myself by thinking you were still refering to your comments 2004-03-29 comparing DateFilter with RangeQuery wrapped in a QueryFilter. : I debate (with myself) on whether add-ons that can be done with other : code

Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Kevin A. Burton
Erik Hatcher wrote: Also, there is a DBDirectory in the sandbox to store a Lucene index inside Berkeley DB. I assume this would prevent prefix queries from working... Kevin -- Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you

URGENT: Help indexing large document set

2004-11-23 Thread John Wang
Hi: I am trying to index 1M documents, with batches of 500 documents. Each document has an unique text key, which is added as a Field.KeyWord(name,value). For each batch of 500, I need to make sure I am not adding a document with a key that is already in the current index. To do

RE: URGENT: Help indexing large document set

2004-11-23 Thread Chuck Williams
Are you sure you have a performance problem with TermInfosReader.get(Term)? It looks to me like it scans sequentially only within a small buffer window (of size SegmentTermEnum.indexInterval) and that it uses binary search otherwise. See TermInfosReader.getIndexOffset(Term). Chuck

Re: URGENT: Help indexing large document set

2004-11-23 Thread John Wang
Thanks Chuck! I missed the call: getIndexOffset. I am profiling it again to pin-point where the performance problem is. -John On Tue, 23 Nov 2004 16:13:22 -0800, Chuck Williams [EMAIL PROTECTED] wrote: Are you sure you have a performance problem with TermInfosReader.get(Term)? It looks to me

Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Erik Hatcher
On Nov 23, 2004, at 6:02 PM, Kevin A. Burton wrote: Erik Hatcher wrote: Also, there is a DBDirectory in the sandbox to store a Lucene index inside Berkeley DB. I assume this would prevent prefix queries from working... Huh? Why would you assume that? As far as I know, and I've tested this

Re: lucene Scorers

2004-11-23 Thread Ken McCracken
Hi, Thanks the pointers in your replies. Would it be possible to include some sort of accrual scorer interface somewhere in the Lucene Query APIs? This could be passed into a query similar to MaxDisjunctionQuery; and combine the sum, max, tieBreaker, etc., according to the implementor's

Re: retrieving added document

2004-11-23 Thread Cheolgoo Kang
On Tue, 23 Nov 2004 22:47:21 +0100, Paul [EMAIL PROTECTED] wrote: Hi, I'm creating a document and adding it with a writer to the index. For some reason I need to add data to this specific document later on (minutes, not hours or days). Is it possible to retrieve it and add additonal data?

RE: lucene Scorers

2004-11-23 Thread Chuck Williams
Hi Ken, I'm glad our replies were helpful. It sounds like you looked at the code in MaxDisjunctionQuery, so you probably noticed that it also implements skipTo(). Your suggestion sounds like a good thing to do. I thought about that when writing MaxDisjunctionQuery, but didn't need the

Help on the Query Parser

2004-11-23 Thread Terence Lai
Hi all, I am trying to use the QueryParser.parse() to parse a query string like java* developer. Note that I want the wildcard string, java*, followed by the word developer. The following is the code. - String qryStr = \java* developer\; String fieldname = text; StandardAnalyzer

MERGERINDEX + SOLUTION

2004-11-23 Thread Karthik N S
Hi Guys Apologies I have a MERGERINDEX [ Merged 1000 subindexes] , The Question is Does Somebody have any solution for recorrecting the Mergerindex [ in case of Corruption ] If so Please Let the Form know about this,so developers like us would use the same. Thx in

fetching similar wordlist as given word

2004-11-23 Thread Santosh
can lucene will be able to do stemming? If I am searching for roam then I know that it can give result for foam using fuzzy query. But my requirement is if I search for roam can I get the similar wordlist as output. so that I can show the end user in the column --- do you mean

Re: modifying existing index

2004-11-23 Thread Santosh
I have gon through IndexReader , I got method : delete(int docNum) , but from where I will get document number? Is this predifined? or we have to give a number prior to indexing? - Original Message - From: Luke Francl [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent:

Re: Help on the Query Parser

2004-11-23 Thread Morus Walter
Terence Lai writes: Look likes that the wildcard query disappeared. In fact, I am expecting text:java* developer to be returned. It seems to me that the QueryParser cannot handle the wildcard within a quoted String. That's not just QueryParser. Lucene itself doesn't handle wildcards

RE: modifying existing index

2004-11-23 Thread Chuck Williams
A good way to do this is to add a keyword field with whatever unique id you have for the document. Then you can delete the term containing a unique id to delete the document from the index (look at IndexReader.delete(Term)). You can look at the demo class IndexHTML to see how it does incremental

Re: modifying existing index

2004-11-23 Thread Cheolgoo Kang
On Wed, 24 Nov 2004 13:04:20 +0530, Santosh [EMAIL PROTECTED] wrote: I have gon through IndexReader , I got method : delete(int docNum) , but from where I will get document number? Is this predifined? or we have to give a number prior to indexing? The number(aka doc-id) is given by

RE: fetching similar wordlist as given word

2004-11-23 Thread Chuck Williams
Lucene does support stemming, but that is not what your example requires (stemming equates roaming, roam, roamed, etc.). For stemming, look at PorterStemFilter or better, the Snowball stemmers in the sandbox. For your similar word list, I think you are looking for the class FuzzyTermEnum. This