Erik Hatcher writes:
On Sunday, December 7, 2003, at 08:21 PM, Esmond Pitt wrote:
I'm not clear whether this is a 'yes' or a 'no'.
I think other committers would need to weigh in on it. I'm fine with
making a change to check isDirectory as well and not deleting them
since Lucene
On Sunday, December 7, 2003, at 09:50 PM, Fitrio Pakana wrote:
I have similar problems with him, which is query using
multiple terms, and to make things worse, the hits
returned is quite absurd. The score of hits using 'OR'
(any words) query is lower than if using 'AND' (all
words) query, thus
I am against making the suggested Lucene modification.
Lucene index structure may change in the future. It is possible that
one day Lucene developers will need to use a hierarchy of directories
to implement some feature.
Therefore, Lucene users should be discouraged from creating
sub-directories
Hi there,
is Damian patch in the cvs or latest lucene release.
Allow this patch to recieve a term vector of a document?
Thanks!
Stefan
--
open technology: www.media-style.com
open source: www.weta-group.net
open discussion: www.text-mining.org
is there a limit to the size of an UnIndexed field? i changed my code to increase the
maximum string size per document from 300 bytes to 10,000 and although the index run
completes without errors, i never find any documents while searching.
Herb
Stefan, which patch are you referring to?
I looked at the following, but did not find it:
If you don't index something then it's not going to be searched.
-Original Message-
From: Chong, Herb [mailto:[EMAIL PROTECTED]
Sent: Monday, December 08, 2003 11:14 AM
To: Lucene Users List
Subject: Unindexed fields
is there a limit to the size of an UnIndexed field? i changed my code
Otis,
based on this discussion:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg03350.html
Stefan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
my indexed fields haven't changed.
Herb...
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi Jing,
do you work on the task of document similarity?
I see nobody was answering your question.
To create a query out of an document would be very easy, but would it
provide well results?
Document term vectors would provide more possibilities to use different
data mining algorithms for
I think this never resulted in a patch. A few days after that thread
another person expressed interest in implementing the same thing, but I
am not sure what the status of that idea is now.
Otis
--- Stefan Groschupf [EMAIL PROTECTED] wrote:
Otis,
based on this discussion:
Just to be sure since there was a lot of dicussion in the lists.
There is actually no solution available to get a term vector for a
document or a TF/IDF feature vector for a document, isn't it?
Some one had work on such things?
Some wish to work on such things?
Stefan
--- Stefan Groschupf [EMAIL PROTECTED] wrote:
Just to be sure since there was a lot of dicussion in the lists.
There is actually no solution available to get a term vector for a
document or a TF/IDF feature vector for a document, isn't it?
Correct :(
Some one had work on such things?
A few people have asked, a few people have expressed interest.
I have to do some work for nutch but since I need the feature vector
stuff for an commercial project I will try to implement it.
Someone wish to join me??? ;)
Stefan
--
open technology: www.media-style.com
open source:
I agree. One should provide Lucene with a unique path in the
filesystem, one that is not intended to be used for any other purpose.
All access to that path should be through Lucene's API. The fact that
Lucene decides to create a directory there rather than a single file is
an implementation
I have to do some work for nutch but since I need the feature vector
stuff for an commercial project I will try to implement it.
Someone wish to join me??? ;)
Stefan
Hello I already have some experience with Dmitry's implementation.
Feel free to contact me.
--
Damian
Damian Gajda wrote:
Hello I already have some experience with Dmitry's implementation.
Can you point me to Dmitry's code,so that i can take a look, i just had
read about it
Feel free to contact me.
I will do! ;)
Thanks!
Stefan
--
open technology: www.media-style.com
open source:
W licie z pon, 08-12-2003, godz. 19:21, Stefan Groschupf pisze:
Damian Gajda wrote:
Hello I already have some experience with Dmitry's implementation.
Can you point me to Dmitry's code,so that i can take a look, i just had
read about it
Here some links for Your consideration:
Hi,
I notice something really strange.
I just tried the document to query thing with term frequencies and
term bosting based on the term frequence.
The code itself take may be 3 minutes, but i spend around 2 hours to
search a nullpointer exception i got in this line.
query =
BTW. i may send You the partly working Lucene with Dmitrys code patched
in.
--
Damian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Stefan,
It's a bug, and there is a fix for this in the latest CVS
near the end of the QueryParser.jj file:
// avoid boosting null queries, such as those caused by stop words
if (q != null) {
q.setBoost(f);
}
Kind regards,
Ype
On Monday 08 December 2003 20:20, Stefan
Nice.
Please send the cvs diff, as I mentioned in that thread where you sent
inlined diffs.
Thanks,
Otis
--- Damian Gajda [EMAIL PROTECTED] wrote:
BTW. i may send You the partly working Lucene with Dmitrys code
patched
in.
--
Damian
If I generate a query using QueryParser and a
standard analyzer, in some cases I'm getting a
TooManyBooleanClauses exception, e.g.:
[2003-12-08 14:39:23] [ debug1 ] query is +glucose
-kog* always:1
[2003-12-08 14:39:23] [--ERROR--] Exception in
searchAnnotations:
Damian Gajda wrote:
BTW. i may send You the partly working Lucene with Dmitrys code patched
in.
Yeah that would be very helpful.
Thanks!
--
open technology: http://www.media-style.com
open source: http://www.weta-group.net
open discussion: http://www.text-mining.org
On Monday, December 8, 2003, at 05:47 PM, [EMAIL PROTECTED] wrote:
If I generate a query using QueryParser and a
standard analyzer, in some cases I'm getting a
TooManyBooleanClauses exception, e.g.:
[2003-12-08 14:39:23] [ debug1 ] query is +glucose
-kog* always:1
[2003-12-08 14:39:23]
25 matches
Mail list logo