Re: FSDIrectory.create doesn't tolerate subdirectories

2003-12-08 Thread Morus Walter
Erik Hatcher writes: On Sunday, December 7, 2003, at 08:21 PM, Esmond Pitt wrote: I'm not clear whether this is a 'yes' or a 'no'. I think other committers would need to weigh in on it. I'm fine with making a change to check isDirectory as well and not deleting them since Lucene

Re: OR query return fewer result than AND query

2003-12-08 Thread Erik Hatcher
On Sunday, December 7, 2003, at 09:50 PM, Fitrio Pakana wrote: I have similar problems with him, which is query using multiple terms, and to make things worse, the hits returned is quite absurd. The score of hits using 'OR' (any words) query is lower than if using 'AND' (all words) query, thus

Re: FSDIrectory.create doesn't tolerate subdirectories

2003-12-08 Thread Otis Gospodnetic
I am against making the suggested Lucene modification. Lucene index structure may change in the future. It is possible that one day Lucene developers will need to use a hierarchy of directories to implement some feature. Therefore, Lucene users should be discouraged from creating sub-directories

term vector (Damian patch)

2003-12-08 Thread Stefan Groschupf
Hi there, is Damian patch in the cvs or latest lucene release. Allow this patch to recieve a term vector of a document? Thanks! Stefan -- open technology: www.media-style.com open source: www.weta-group.net open discussion: www.text-mining.org

Unindexed fields

2003-12-08 Thread Chong, Herb
is there a limit to the size of an UnIndexed field? i changed my code to increase the maximum string size per document from 300 bytes to 10,000 and although the index run completes without errors, i never find any documents while searching. Herb

Re: term vector (Damian patch)

2003-12-08 Thread Otis Gospodnetic
Stefan, which patch are you referring to? I looked at the following, but did not find it:

RE: Unindexed fields

2003-12-08 Thread Pleasant, Tracy
If you don't index something then it's not going to be searched. -Original Message- From: Chong, Herb [mailto:[EMAIL PROTECTED] Sent: Monday, December 08, 2003 11:14 AM To: Lucene Users List Subject: Unindexed fields is there a limit to the size of an UnIndexed field? i changed my code

Re: term vector (Damian patch)

2003-12-08 Thread Stefan Groschupf
Otis, based on this discussion: http://www.mail-archive.com/[EMAIL PROTECTED]/msg03350.html Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: Unindexed fields

2003-12-08 Thread Chong, Herb
my indexed fields haven't changed. Herb... - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Document Similarity

2003-12-08 Thread Stefan Groschupf
Hi Jing, do you work on the task of document similarity? I see nobody was answering your question. To create a query out of an document would be very easy, but would it provide well results? Document term vectors would provide more possibilities to use different data mining algorithms for

Re: term vector (Damian patch)

2003-12-08 Thread Otis Gospodnetic
I think this never resulted in a patch. A few days after that thread another person expressed interest in implementing the same thing, but I am not sure what the status of that idea is now. Otis --- Stefan Groschupf [EMAIL PROTECTED] wrote: Otis, based on this discussion:

Re: term vector or document vector

2003-12-08 Thread Stefan Groschupf
Just to be sure since there was a lot of dicussion in the lists. There is actually no solution available to get a term vector for a document or a TF/IDF feature vector for a document, isn't it? Some one had work on such things? Some wish to work on such things? Stefan

Re: term vector or document vector

2003-12-08 Thread Otis Gospodnetic
--- Stefan Groschupf [EMAIL PROTECTED] wrote: Just to be sure since there was a lot of dicussion in the lists. There is actually no solution available to get a term vector for a document or a TF/IDF feature vector for a document, isn't it? Correct :( Some one had work on such things?

Re: term vector or document vector

2003-12-08 Thread Stefan Groschupf
A few people have asked, a few people have expressed interest. I have to do some work for nutch but since I need the feature vector stuff for an commercial project I will try to implement it. Someone wish to join me??? ;) Stefan -- open technology: www.media-style.com open source:

Re: FSDIrectory.create doesn't tolerate subdirectories

2003-12-08 Thread Doug Cutting
I agree. One should provide Lucene with a unique path in the filesystem, one that is not intended to be used for any other purpose. All access to that path should be through Lucene's API. The fact that Lucene decides to create a directory there rather than a single file is an implementation

Re: term vector or document vector

2003-12-08 Thread Damian Gajda
I have to do some work for nutch but since I need the feature vector stuff for an commercial project I will try to implement it. Someone wish to join me??? ;) Stefan Hello I already have some experience with Dmitry's implementation. Feel free to contact me. -- Damian

Re: term vector or document vector

2003-12-08 Thread Stefan Groschupf
Damian Gajda wrote: Hello I already have some experience with Dmitry's implementation. Can you point me to Dmitry's code,so that i can take a look, i just had read about it Feel free to contact me. I will do! ;) Thanks! Stefan -- open technology: www.media-style.com open source:

Re: term vector or document vector

2003-12-08 Thread Damian Gajda
W licie z pon, 08-12-2003, godz. 19:21, Stefan Groschupf pisze: Damian Gajda wrote: Hello I already have some experience with Dmitry's implementation. Can you point me to Dmitry's code,so that i can take a look, i just had read about it Here some links for Your consideration:

boosting StandardAnalyzer

2003-12-08 Thread Stefan Groschupf
Hi, I notice something really strange. I just tried the document to query thing with term frequencies and term bosting based on the term frequence. The code itself take may be 3 minutes, but i spend around 2 hours to search a nullpointer exception i got in this line. query =

Re: term vector or document vector

2003-12-08 Thread Damian Gajda
BTW. i may send You the partly working Lucene with Dmitrys code patched in. -- Damian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: boosting StandardAnalyzer, stop words

2003-12-08 Thread Ype Kingma
Stefan, It's a bug, and there is a fix for this in the latest CVS near the end of the QueryParser.jj file: // avoid boosting null queries, such as those caused by stop words if (q != null) { q.setBoost(f); } Kind regards, Ype On Monday 08 December 2003 20:20, Stefan

Re: term vector or document vector

2003-12-08 Thread Otis Gospodnetic
Nice. Please send the cvs diff, as I mentioned in that thread where you sent inlined diffs. Thanks, Otis --- Damian Gajda [EMAIL PROTECTED] wrote: BTW. i may send You the partly working Lucene with Dmitrys code patched in. -- Damian

TooManyBooleanClauses exception

2003-12-08 Thread DMGoodstein
If I generate a query using QueryParser and a standard analyzer, in some cases I'm getting a TooManyBooleanClauses exception, e.g.: [2003-12-08 14:39:23] [ debug1 ] query is +glucose -kog* always:1 [2003-12-08 14:39:23] [--ERROR--] Exception in searchAnnotations:

Re: term vector or document vector

2003-12-08 Thread Stefan Groschupf
Damian Gajda wrote: BTW. i may send You the partly working Lucene with Dmitrys code patched in. Yeah that would be very helpful. Thanks! -- open technology: http://www.media-style.com open source: http://www.weta-group.net open discussion: http://www.text-mining.org

Re: TooManyBooleanClauses exception

2003-12-08 Thread Erik Hatcher
On Monday, December 8, 2003, at 05:47 PM, [EMAIL PROTECTED] wrote: If I generate a query using QueryParser and a standard analyzer, in some cases I'm getting a TooManyBooleanClauses exception, e.g.: [2003-12-08 14:39:23] [ debug1 ] query is +glucose -kog* always:1 [2003-12-08 14:39:23]