Erick, Otis,
Thank you for your help. I will work with a single index and parent fields.
It's hard to say exactly how much raw data I will index as this differs per
client. But I guess right now I'm more looking at 1G (contents of a
non-CLOB/BLOB DB). But one client is thinking of throwing their e
Hai ,
I was trying to store to document id's external.
I have found that lucene generates document id's linearly starting
from 0 and are not changed until any document is deleted.
but it did work for me.
How could I store document id's externally.
100TB? Ouch. Yes, most certainly very different. Again, how to split the
index and design the whole system depends on how this is going to be used, how
it's going to be changed, if it's going to be changed, how it's going to grow,
etc.
I'd love to hear from you once you start working with 100
I believe you are correct about when document IDs change. That said, I'd
strongly recommend you spend some time trying think of a way to keep from
doing this, since it may lead to endless synchronization issues.
But if you must, you can retrieve a document with IndexReader.document(id);
On 1/27/
I put in 1TB as a number because I thought it would surely be bigger than
anything you intended to put in your database. And you reply with 100 times
that size .
The index I'm working with now is 5GB, so I have no wisdom to offer you at
all about how to scale to 100TB. You should probably inf
Hi all,
Is there any limitation of number of file that lucene can handle?
I indexed a total of 3 XML Documents,however it stops at 1th
documents.
No warning,no error ,no exception as well.
Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491876.xml
Indexing C:\sweetp
Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? Isn't it you that suggested
turning the query into a SpanQuery, extracting the spans and then doing
the highlighting after a rewrite? This seems somewhat trivial so what am
I missing? I
did you try triggering a thread dump to see what it was doing at that
point?
depending on your merge factors and other IndexWriter settings it could
just be doing a relaly big merge.
: Date: Sat, 27 Jan 2007 09:40:47 -0800 (PST)
: From: maureen tanuwidjaja <[EMAIL PROTECTED]>
: Reply-To: java-us
/** Returns the number of documents currently in this index. */
public synchronized int docCount() {
int count = ramSegmentInfos.size();
for (int i = 0; i < segmentInfos.size(); i++) {
SegmentInfo si = segmentInfos.info(i);
count += si.docCount;
}
return count;
}
I
I'll keep you posted ;-)
Joost Schouten
Director
JS Portal
Dasstraat 21
2623CB Delft
the Netherlands
P: +31 6 160 160 14
W: www.jsportal.com
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Sunday, January 28, 2007 4:34 AM
To: java-user@lucene.apache.org
Subje
Hi Karl,
karl wettin <[EMAIL PROTECTED]> wrote on 27/01/2007 11:54:18:
> /** Returns the number of documents currently in this index. */
>public synchronized int docCount() {
> int count = ramSegmentInfos.size();
> for (int i = 0; i < segmentInfos.size(); i++) {
>SegmentInfo
Isn't this extremely ineffecient to do the euclidean distance twice?
Perhaps not a huge deal if a small search result set. I at times have
13,000 results that match my search terms of an index with 1.2 million docs.
Can't you do some simple radian math first to ensure it's way out of bounds,
the
27 jan 2007 kl. 21.19 skrev Doron Cohen:
karl wettin <[EMAIL PROTECTED]> wrote on 27/01/2007 11:54:18:
/** Returns the number of documents currently in this index. */
public synchronized int docCount() {
I don't understand, what is it this method returns?
"Something else" - it is the
>>Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? I
I haven't conducted a survey but it's the typical web search engine
scenario - select only a small subset of the matching document content
for display in SERPS. I would expect that
markharw00d wrote:
>>Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? I
I haven't conducted a survey but it's the typical web search engine
scenario - select only a small subset of the matching document content
for display in SERP
Maybe a new highlighter with no attempt at summarising could more
easily address phrase support for small pieces of content. It will
always be hard to faithfully represent all possible query match logic
- especially if there are NOTs, ANDs and ORs mixed in with all the
term proximity logic
Hi!
I promised karl that I'd share something on this topic, so here it
goes. It fits the subject, too ;)
On Jan 27, 2007, at 6:14 PM, Erick Erickson wrote:
I believe you are correct about when document IDs change. That
said, I'd
strongly recommend you spend some time trying think of a way
karl wettin <[EMAIL PROTECTED]> wrote on 27/01/2007 13:49:24:
> Deleted as in still available in the segment and noted in the delted
> file, but not optimized and IllegalArgumentException thrown in case
> of IndexReader.document(n)? At least I think that is the way a
> Directory works?
Yes.. so i
For what it's worth Mark (Miller), there *is* a need for "just highlight the
query terms without trying to get excerpts" functionality - something a la
Google cache (different colours...mmm, nice). I've had people ask me for this
before, and I know I could use this functionality, too. Please c
19 matches
Mail list logo