Re: Reopen IndexWriter after delete?

2003-11-12 Thread Morus Walter
Otis Gospodnetic writes: No, it is not safe. You should close the IndexWriter, then delete the document and close IndexReader, and then get a new IndexWriter and continue writing. IIRC lucene takes care that you do so. Locking prevents you from having an open IndexWriter and modify the

Re: Document Clustering

2003-11-12 Thread Eric Jain
I was basically thinking of using lucene to generate document vectors, and writing my custom similarity algorithms for measuring distance. I could then run this data through k-means or SOM algorithms for calculating clusters First of all, I think it would already be great if there was some

Re: Can use Lucene be used for this

2003-11-12 Thread Eric Jain
I need to retrieve the value with simple queries on the data like: col1 like %ab, What does the ampersand mean? col2 like %aa% Lucene doesn't handle queries where the start of the term is not known very efficiently. and col3 sounds like ; No experience with this, but you could

Wildcard search and HOST tokens

2003-11-12 Thread Pascal Nadal
My lucene indexes contain fields with values like this www.xxx.yyy.zzz which are treated as HOST tokens. My problem is the following : search results never contain documents with such fields when doing a wildcard query or a fuzzy query. Only searches on full field values work. example queries:

Re: Wildcard search and HOST tokens

2003-11-12 Thread Erik Hatcher
On Wednesday, November 12, 2003, at 05:55 AM, Pascal Nadal wrote: My lucene indexes contain fields with values like this www.xxx.yyy.zzz which are treated as HOST tokens. My problem is the following : search results never contain documents with such fields when doing a wildcard query or a fuzzy

Re: Reopen IndexWriter after delete?

2003-11-12 Thread Otis Gospodnetic
Correct. write.lock is used for that. Otis --- Morus Walter [EMAIL PROTECTED] wrote: Otis Gospodnetic writes: No, it is not safe. You should close the IndexWriter, then delete the document and close IndexReader, and then get a new IndexWriter and continue writing. IIRC lucene

Re: Can use Lucene be used for this

2003-11-12 Thread Hackl, Rene
col2 like %aa% Lucene doesn't handle queries where the start of the term is not known very efficiently. Is it really able to handle them at all? I thought *foo-type queries were not supported. That's because I build two indexes for the purpose of simultaneous left and right truncation. One

Overview to Lucene

2003-11-12 Thread ambiesense
Hello group, can somebody give me an overview to Lucene? What high level components does it include? Particularly I want to asnwer the following questions regarding available functionalty: 1) Does Lucene provide a Vector Space IR Model (with TF/IDF and Cosine Similarity)? 2) Does Lucene provide

Re: Overview to Lucene

2003-11-12 Thread petite_abeille
Hi Ralf, On Nov 12, 2003, at 14:06, [EMAIL PROTECTED] wrote: Does anybody know good articles which demonstrate parts of that or give a good start into Lucene? Otis Gospodnetic's articles are a good starting point: Introduction to Text Indexing with Apache Jakarta Lucene

Re: Re: Wildcard search and HOST tokens

2003-11-12 Thread Pascal Nadal
when I do a query.toString(my default field), it prints exactly my query. example: title:FE.MENU* gives title:FE.MENU* FE.MENU* when I search in the default field and the field 'title'. the HostFilter I wrote (that tokenizes again HOST tokens) works wonderfully. PS: thanks Erik

Boost in Query Parser

2003-11-12 Thread MOYSE Gilles (Cetelem)
Hello. I've made a Filter which recognizes special words and return them in a boosted form, in a QueryParser sense. For instance, when the filter receives special_word, it returns special_word^3, so as to boost it. The problem is that the QueryParser understands the boost syntax when the string

Re: Wildcard search and HOST tokens

2003-11-12 Thread Erik Hatcher
On Wednesday, November 12, 2003, at 10:43 AM, Pascal Nadal wrote: the HostFilter I wrote (that tokenizes again HOST tokens) works wonderfully. I wonder if this has been fixed since Lucene 1.2 could you try the latest 1.3RC build available and see if it works without your HostFilter? Erik

Re: Boost in Query Parser

2003-11-12 Thread Erik Hatcher
On Wednesday, November 12, 2003, at 10:53 AM, MOYSE Gilles (Cetelem) wrote: Hello. I've made a Filter which recognizes special words and return them in a boosted form, in a QueryParser sense. For instance, when the filter receives special_word, it returns special_word^3, so as to boost it. The

Re: Index pdf files with your content in lucene.

2003-11-12 Thread Ernesto De Santis
Hello well, not work zip the files. I can send files, if somebody won, to personal email. And if somebody can post this in a web site, very cool. I don´t post in a web site. Ernesto. - To unsubscribe, e-mail: [EMAIL

Connection Pooling

2003-11-12 Thread Elsa Hernandez
Hi! Does anyone have the code of a Connection Pooling? I am using JDK 1.3.1. Thank you! _ The new MSN 8: advanced junk mail protection and 2 months FREE* http://join.msn.com/?page=features/junkmail

Vector Space Model in Lucene?

2003-11-12 Thread ambiesense
Hi, does Lucene implement a Vector Space Model? If yes, does anybody have an example of how using it? Cheers, Ralf -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net

Latent Semantic Indexing

2003-11-12 Thread Ralf Bierig
Does Lucene implement Latent Semantic Indexing? Examples? Ralf -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse für Mail, Message,

Re: Reopen IndexWriter after delete?

2003-11-12 Thread Dror Matalon
Which begs the question: why do you need to use an IndexReader rather than an IndexWriter to delete an item? On Tue, Nov 11, 2003 at 02:46:37PM -0800, Otis Gospodnetic wrote: 1). If I delete a term using an IndexReader, can I use an existing IndexWriter to write to the index? Or do I need

Poor Performance when searching for 500+ terms

2003-11-12 Thread Jie Yang
I know this is rare, But I am building an application that submits searches having 500+ search terms. A general example would be field1:w1 OR field1:w2 OR ... OR field1:w500 For 1 millions documents, the performance is OK if field1 in each document has less than 50 terms, I can get result 1

RE: Can use Lucene be used for this

2003-11-12 Thread Majerus, John P.
Hello, This has probably been put forth on the list before, but how about the following approach for leftmost wildcard searches, at least for single term searches? Reverse the character order of all words after they're stemmed and before they're added to a special reverse-character-order index.

RE: Reopen IndexWriter after delete?

2003-11-12 Thread Wilton, Reece
I agree it's a bit of a strange design. It seems that there should be one class that handles all modifications of the index. Usually you'd only have one instance of this so you wouldn't need to open and close it all the time (I'm basically writing one of these classes myself to simplify my code.

QueryParser Rules article (Erik Hatcher)

2003-11-12 Thread Tomcat Programmer
I thought Erik's article was great. There was one unanswered brainbender I had which I was hoping was in there, but... Maybe you can add this topic to the next one, Erik? Here is my issue: When using the QueryParser class, the parse method will throw a TokenMgrError when there is a syntax

Re: QueryParser Rules article (Erik Hatcher)

2003-11-12 Thread Erik Hatcher
On Wednesday, November 12, 2003, at 11:52 PM, Tomcat Programmer wrote: I thought Erik's article was great. There was one unanswered brainbender I had which I was hoping was in there, but... Maybe you can add this topic to the next one, Erik? Well, I'm not sure another article on QueryParser is