Hello John,
Once you make your change locally, use 'cvs diff -u IndexWriter.java
indexwriter.patch' to make a patch.
Then open a new Bugzilla entry.
Finally, attach your patch to that entry.
Note that Document deletion is actually done from IndexReader, so your
patch may have to be on
that sounds very interesting but how do you handle queries like
select * from MY_TABLE where MY_NUMERIC_FIELD 80
as far as I know you have only the range query so you will have to say
my_numeric_filed:[80 TO ??]
but this would not work in the a/m example or am I missing something?
regards
Hmm. So far all our fields are just strings. But I would guess you should be
able to use Integer.MAX_VALUE or something on the upper bound. Or there
might be a better way of doing it.
Praveen
- Original Message -
From: Akmal Sarhan [EMAIL PROTECTED]
To: Lucene Users List [EMAIL
On Dec 14, 2004, at 15:40, Kevin L. Cobb wrote:
Was wondering if anyone out there was doing the same of it
there are any dissenting opinions on using Lucene for this purpose.
ZOE [1] [2] takes the same approach and uses Lucene as a relational
engine of sort.
However, for both practical and
Bruce Ritchie wrote:
Christoph,
I'm not entirely certain if this is what you want, but a while back David Spencer did code up a 'More Like This' class which can be used for generating similarities between documents. I can't seem to find this class in the sandbox
Ot oh, sorry, I'll try to get this
petite_abeille wrote:
Well, the subject says it all...
If there is one thing which is overly cumbersome in Lucene, it's
updating documents, therefore this Request For Enhancement:
Please consider enhancing the IndexWriter API to include an
updateDocument(...) method to take care of all the gory
My concern is that this just shifts the scaling issue to
Lucene, and I haven't found much info on how to scale Lucene
vertically.
By vertically, of course, I meant horizontally. Basically scaling
it across servers as one might do with a relational database.
: select * from MY_TABLE where MY_NUMERIC_FIELD 80
:
: as far as I know you have only the range query so you will have to say
:
: my_numeric_filed:[80 TO ??]
: but this would not work in the a/m example or am I missing something?
RangeQuery allows you to an open ended range -- you can tell the
Otis Gospodnetic wrote:
You can also see 'Books like this' example from here
https://secure.manning.com/catalog/view.php?book=hatcher2item=source
Well done, uses a term vector, instead of reparsing the orig doc, to
form the similarity query. Also I like the way you exclude the source
doc in the
Well, the subject says it all...
If there is one thing which is overly cumbersome in Lucene, it's
updating documents, therefore this Request For Enhancement:
Please consider enhancing the IndexWriter API to include an
updateDocument(...) method to take care of all the gory details
involved in
You can also see 'Books like this' example from here
https://secure.manning.com/catalog/view.php?book=hatcher2item=source
Otis
--- Bruce Ritchie [EMAIL PROTECTED] wrote:
Christoph,
I'm not entirely certain if this is what you want, but a while back
David Spencer did code up a 'More Like
You can also see 'Books like this' example from here
https://secure.manning.com/catalog/view.php?book=hatcher2item=source
Well done, uses a term vector, instead of reparsing the orig
doc, to form the similarity query. Also I like the way you
exclude the source doc in the query, I
Well, one could always partition an index, distribute pieces of it
horizontally across multiple 'search servers' and use the built-in
RMI-based and Parallel search feature. Nutch uses something similar
for search scaling.
Otis
--- Monsur Hossain [EMAIL PROTECTED] wrote:
My concern is that
From the code I looked at, those calls don't recalculate on
every call.
I was referring to this fragment below from BooksLikeThis.docsLike(),
and was mentioning it as the javadoc
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/in
dex/TermFreqVector.html
does not say that
Bruce Ritchie wrote:
From the code I looked at, those calls don't recalculate on
every call.
I was referring to this fragment below from BooksLikeThis.docsLike(),
and was mentioning it as the javadoc
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/in
dex/TermFreqVector.html
does
Bruce Ritchie wrote:
You can also see 'Books like this' example from here
https://secure.manning.com/catalog/view.php?book=hatcher2item=source
Well done, uses a term vector, instead of reparsing the orig
doc, to form the similarity query. Also I like the way you
exclude the source doc in
I'm trying to index a large number of records from the
DB (a few millions). Each record will be stored as a
document with about 30 fields, most of them are
UnStored and represent small strings or numbers. No
huge DB Text fields.
But I'm running out of memory very fast, and the
indexing is slowing
Hi all,
Lucene score document based on the correlation between
the query q and document t:
(this is raw function, I don't pay attention to the
boost_t, coord_q_d factor)
score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t
/ norm_d_t) (*)
Could anybody explain it in detail ? Or are there any
Hi Guys
Some body tell me what this Exception am Getting Pleae
Sys Specifications
O/s Linux Gentoo
Appserver Apache Tomcat/4.1.24
Jdk build 1.4.2_03-b02
Lucene 1.4.1 ,2, 3
Note: - This Exception is displayed on Every 2nd Query after Tomcat is
started
java.io.IOException: Stale NFS
Nhan,
Re. your two differences:
1 is not a difference. Norm_d and Norm_q are both independent of t, so summing
over t has no effect on them. I.e., Norm_d * Norm_q is constant wrt the
summation, so it doesn't matter if the sum is over just the numerator or over
the entire fraction, the
Hello John,
I believe you didn't get any replies to this. What you are describing
cannot be done using the public, but maaay (no source code on this
machine, so I can't double-check that) be doable if you use some of the
'internal' methods.
I don't have the need for this, but others might, so
Hi Otis:
Thanks for you reply.
I am looking for more of an API call than a tool. e.g.
IndexWriter.finalizeDelete()
If I implement this, how would I go about submitting a patch?
thanks
-John
On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic
[EMAIL PROTECTED] wrote:
On Dec 14, 2004, at 4:53 AM, Vikas Gupta wrote:
I have come across a scenario where the hits returned are not sorted.
Or
maybe they are sorted but the explanation is not correct.
Take a look at
http://cofferdam.cs.utexas.edu:8080/search.jsp?
query=space+odysseyhitsPerPage=10hitsPerSite=0
This
Even we use lucene for similar purpose except that we index and store quite
a few fields. Infact I also update partial documents as people suggested. I
store all the indexed fields so I don't have to build the whole document
again while updating partial document. The reason we do this is due to
How big do you expect it to get and how often do you expect to update
it, we've been using Lucene for about 1 M records (19 fields each) with
incremental updates every 10 minutes, the performance during updates
wasn't wonderful, so it took some seriously intense code to sort that
out, as you
Christoph,
I'm not entirely certain if this is what you want, but a while back David
Spencer did code up a 'More Like This' class which can be used for generating
similarities between documents. I can't seem to find this class in the sandbox
so I've attached it here. Just repackage and test.
Hi,
My current task/problem is the following: I need to implement TFIDF
document term ranking using Jakarta Lucene to compute a similarity rank
between arbitrary documents in the constructed index.
I saw from the API that there are similar functions already implemented
in the class Similarity and
You can see Flickr-like tag (lookup) system at my Simpy site (
http://www.simpy.com ). It uses Lucene as the backend for lookups, but
still uses a RDBMS as the primary storage.
I find it that keeping the RDBMS and Lucene indices is a bit of a pain
and error prone, so _thin_ storage layer with
On Tuesday 14 December 2004 20:13, Monsur Hossain wrote:
My concern is that this just shifts the scaling issue to Lucene, and I
haven't found much info on how to scale Lucene vertically.
You can easily use MultiSearcher to search over several indices. If you
want the distribution to be more
Hello,
There are a few things you can do:
1) Don't just pull all rows from the DB at once. Do that in batches.
2) If you can get a Reader from your SqlDataReader, consider this:
Lucene uses the vector space model. To understand that:
-Read section 2.1 of Space optimizations for Total Ranking paper (Linked
here http://lucene.sourceforge.net/publications.html)
-Read section 6 to 6.4 of
http://www.csee.umbc.edu/cadip/readings/IR.report.120600.book.pdf
-Read section 1 of
Thanks Otis!
What do you mean by building it in batches? Does it
mean I should close the IndexWriter every 1000 rows
and reopen it? Does that releases references to the
document objects so that they can be
garbage-collected?
I'm calling optimize() only at the end.
I agree that 1500 documents is
32 matches
Mail list logo