date:20070127

RE: lucense index/document architecture

2007-01-27 Thread Joost Schouten

Erick, Otis, Thank you for your help. I will work with a single index and parent fields. It's hard to say exactly how much raw data I will index as this differs per client. But I guess right now I'm more looking at 1G (contents of a non-CLOB/BLOB DB). But one client is thinking of throwing their e

Re : lucene doc id's

2007-01-27 Thread saikrishna venkata pendyala

Hai , I was trying to store to document id's external. I have found that lucene generates document id's linearly starting from 0 and are not changed until any document is deleted. but it did work for me. How could I store document id's externally.

Re: lucense index/document architecture

2007-01-27 Thread Otis Gospodnetic

100TB? Ouch. Yes, most certainly very different. Again, how to split the index and design the whole system depends on how this is going to be used, how it's going to be changed, if it's going to be changed, how it's going to grow, etc. I'd love to hear from you once you start working with 100

Re: Re : lucene document id's

2007-01-27 Thread Erick Erickson

I believe you are correct about when document IDs change. That said, I'd strongly recommend you spend some time trying think of a way to keep from doing this, since it may lead to endless synchronization issues. But if you must, you can retrieve a document with IndexReader.document(id); On 1/27/

Re: lucense index/document architecture

2007-01-27 Thread Erick Erickson

I put in 1TB as a number because I thought it would surely be bigger than anything you intended to put in your database. And you reply with 100 times that size . The index I'm working with now is 5GB, so I have no wisdom to offer you at all about how to scale to 100TB. You should probably inf

My program stops indexing after 10000th documents is indexed

2007-01-27 Thread maureen tanuwidjaja

Hi all, Is there any limitation of number of file that lucene can handle? I indexed a total of 3 XML Documents,however it stops at 1th documents. No warning,no error ,no exception as well. Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491876.xml Indexing C:\sweetp

Re: Multiword Highlighting

2007-01-27 Thread Mark Miller

Isn't it semi trivial if you are not interested in the fragments (I swear it seems that most people are not)? Isn't it you that suggested turning the query into a SpanQuery, extracting the spans and then doing the highlighting after a rewrite? This seems somewhat trivial so what am I missing? I

Re: My program stops indexing after 10000th documents is indexed

2007-01-27 Thread Chris Hostetter

did you try triggering a thread dump to see what it was doing at that point? depending on your merge factors and other IndexWriter settings it could just be doing a relaly big merge. : Date: Sat, 27 Jan 2007 09:40:47 -0800 (PST) : From: maureen tanuwidjaja <[EMAIL PROTECTED]> : Reply-To: java-us

IndexWriter.docCount

2007-01-27 Thread karl wettin

/** Returns the number of documents currently in this index. */ public synchronized int docCount() { int count = ramSegmentInfos.size(); for (int i = 0; i < segmentInfos.size(); i++) { SegmentInfo si = segmentInfos.info(i); count += si.docCount; } return count; } I

RE: lucense index/document architecture

2007-01-27 Thread Joost Schouten

I'll keep you posted ;-) Joost Schouten Director JS Portal Dasstraat 21 2623CB Delft the Netherlands P: +31 6 160 160 14 W: www.jsportal.com -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Sunday, January 28, 2007 4:34 AM To: java-user@lucene.apache.org Subje

Re: IndexWriter.docCount

2007-01-27 Thread Doron Cohen

Hi Karl, karl wettin <[EMAIL PROTECTED]> wrote on 27/01/2007 11:54:18: > /** Returns the number of documents currently in this index. */ >public synchronized int docCount() { > int count = ramSegmentInfos.size(); > for (int i = 0; i < segmentInfos.size(); i++) { >SegmentInfo

Re: Announcement: Lucene powering Monster job search index (Beta)

2007-01-27 Thread no spam

Isn't this extremely ineffecient to do the euclidean distance twice? Perhaps not a huge deal if a small search result set. I at times have 13,000 results that match my search terms of an index with 1.2 million docs. Can't you do some simple radian math first to ensure it's way out of bounds, the

Re: IndexWriter.docCount

2007-01-27 Thread karl wettin

27 jan 2007 kl. 21.19 skrev Doron Cohen: karl wettin <[EMAIL PROTECTED]> wrote on 27/01/2007 11:54:18: /** Returns the number of documents currently in this index. */ public synchronized int docCount() { I don't understand, what is it this method returns? "Something else" - it is the

Re: Multiword Highlighting

2007-01-27 Thread markharw00d

>>Isn't it semi trivial if you are not interested in the fragments (I swear it seems that most people are not)? I I haven't conducted a survey but it's the typical web search engine scenario - select only a small subset of the matching document content for display in SERPS. I would expect that

Re: Multiword Highlighting

2007-01-27 Thread Mark Miller

markharw00d wrote: >>Isn't it semi trivial if you are not interested in the fragments (I swear it seems that most people are not)? I I haven't conducted a survey but it's the typical web search engine scenario - select only a small subset of the matching document content for display in SERP

Re: Multiword Highlighting

2007-01-27 Thread Mark Miller

Maybe a new highlighter with no attempt at summarising could more easily address phrase support for small pieces of content. It will always be hard to faithfully represent all possible query match logic - especially if there are NOTs, ANDs and ORs mixed in with all the term proximity logic

Re: Re : lucene document id's

2007-01-27 Thread Kay Roepke

Hi! I promised karl that I'd share something on this topic, so here it goes. It fits the subject, too ;) On Jan 27, 2007, at 6:14 PM, Erick Erickson wrote: I believe you are correct about when document IDs change. That said, I'd strongly recommend you spend some time trying think of a way

Re: IndexWriter.docCount

2007-01-27 Thread Doron Cohen

karl wettin <[EMAIL PROTECTED]> wrote on 27/01/2007 13:49:24: > Deleted as in still available in the segment and noted in the delted > file, but not optimized and IllegalArgumentException thrown in case > of IndexReader.document(n)? At least I think that is the way a > Directory works? Yes.. so i

Re: Multiword Highlighting

2007-01-27 Thread Otis Gospodnetic

For what it's worth Mark (Miller), there *is* a need for "just highlight the query terms without trying to get excerpts" functionality - something a la Google cache (different colours...mmm, nice). I've had people ask me for this before, and I know I could use this functionality, too. Please c

RE: lucense index/document architecture

Re : lucene doc id's

Re: lucense index/document architecture

Re: Re : lucene document id's

Re: lucense index/document architecture

My program stops indexing after 10000th documents is indexed

Re: Multiword Highlighting

Re: My program stops indexing after 10000th documents is indexed

IndexWriter.docCount

RE: lucense index/document architecture

Re: IndexWriter.docCount

Re: Announcement: Lucene powering Monster job search index (Beta)

Re: IndexWriter.docCount

Re: Multiword Highlighting

Re: Multiword Highlighting

Re: Multiword Highlighting

Re: Re : lucene document id's

Re: IndexWriter.docCount

Re: Multiword Highlighting

19 matches

Site Navigation

Mail list logo

Footer information