Document on Indexing in Lucene

2006-10-12 Thread sachin
Hello, I have got lot of personal emails for sharing the Lucene Investigation document. It is not possible to reply each of the Emails. So I am putting this document inside my briefcase. Anyone interested please go to following site and get the document.

Re: Document on Indexing in Lucene

2006-10-12 Thread Prasenjit Mukherjee
did someone delete the shared doc ? [EMAIL PROTECTED] wrote: Hello, I have got lot of personal emails for sharing the Lucene Investigation document. It is not possible to reply each of the Emails. So I am putting this document inside my briefcase. Anyone interested please go to following site

IOException and index corruption

2006-10-12 Thread Apache Lucene
When I am adding a document to the lucene index if the method throws an IOException and if I continue with adding other documents ignoring the exception, will the index be corrupted? What happens to the fields which are already written to the index?

Re: Document on Indexing in Lucene

2006-10-12 Thread Bill Taylor
When I went there, I got a message that there were no shared folders in the brief case. It never gave me an opportunity to enter the password. Thanks. Bill Taylor On Oct 12, 2006, at 6:34 AM, sachin wrote: Hello, I have got lot of personal emails for sharing the Lucene Investigation

a design question

2006-10-12 Thread Chenini, Mohamed
Hello, This is a design question: For Lucene to be able to process a million documents and in the purpose for the search application to be scalable and still have a good response time do we need to use an EJB container such as Weblogic or is a Servlet container such as Tomcat sufficient to do the

Re: Document on Indexing in Lucene

2006-10-12 Thread Tom Bouctou
go to http://briefcase.yahoo.com/pickupartistmistry click on login enter user pickupartistmistry password: chotachetan the document should be there -tom Bill Taylor wrote: When I went there, I got a message that there were no shared folders in the brief case. It never gave me an opportunity

Re: a design question

2006-10-12 Thread mark harwood
EJB explicitly precludes you from accessing files, including via third party libraries such as Lucene. http://java.sun.com/blueprints/qanda/ejb_tier/restrictions.html In practice you may be able to get away with it but I see no particular reasons why using an EJB server should offer any

Re: IOException and index corruption

2006-10-12 Thread Erik Hatcher
On Oct 12, 2006, at 10:17 AM, Apache Lucene wrote: When I am adding a document to the lucene index if the method throws an IOException and if I continue with adding other documents ignoring the exception, will the index be corrupted? What happens to the fields which are already written to

Re: a design question

2006-10-12 Thread Bill Taylor
IN THEORY, EJB containers are better able than Tomcat to spread incoming requests over a multitude of servers. There was considerable discussion some time ago about index search speed on a single processor. I do not remember the details, but there was some information about how fast a

Re: IOException and index corruption

2006-10-12 Thread Apache Lucene
For example in the following statement doc.add(new Field(contents, parser.getReader(), Field.TermVector.YES)); The reader is causing the IOException when internally invertDocument() method is called where tokenstream is generated from the reader. I am not worried if the document info is

Avoiding sort by date

2006-10-12 Thread rayvittal-lists
Hi folks, I am using Lucene 2.0 In our application, I am indexing a stream of documents. Each document is fairly small ( 1 KB), but there can be 10's of millions of documents. Each document has a Timestamp field. Users can enter free-form searches and a date/time range. They are most

Re: Avoiding sort by date

2006-10-12 Thread Erik Hatcher
You really should be using the same IndexSearcher for successive searches. Sorting works best when done with a warm searcher. Have a look at Solr's warming strategy, and consider adopting that in some way. Erik On Oct 12, 2006, at 3:04 PM, [EMAIL PROTECTED] wrote: Hi folks,

Large index question

2006-10-12 Thread Scott Smith
Supposed I want to index 500,000 documents (average document size is 4kBs). Let's assume I create a single index and that the index is static (I'm not going to add any new documents to it). I would guess the index would be around 2GB. Now, I do searches against this on a somewhat beefy

Re: Large index question

2006-10-12 Thread Doron Cohen
Scott Smith [EMAIL PROTECTED] wrote on 12/10/2006 14:14:57: Supposed I want to index 500,000 documents (average document size is 4kBs). Let's assume I create a single index and that the index is static (I'm not going to add any new documents to it). I would guess the index would be around

QueryParser Is Badly Broken

2006-10-12 Thread Renaud Waldura
I'm developing an application used by scientists -- people who have a pretty good idea of what logic is -- and they were shocked to find out that neither of these queries return the same results: 1- banana AND apple OR orange 2- banana AND (apple OR orange) 3- (banana AND apple) OR orange I'd

Re: QueryParser Is Badly Broken

2006-10-12 Thread Mark Miller
There is also the Surround Query Parser in contrib by the way...I would bet that Paul will tell you that it does not have these issues. I can't wait to see the replies on this one...I didn't realize that the QueryParser had these problems and am a bit skeptical...unfortunately I am away from home

Re: QueryParser Is Badly Broken

2006-10-12 Thread Daniel Noll
Renaud Waldura wrote: While we are also developing a query-building UI, users must be able to enter text queries as well. What do other folks do? I mean, this is pretty bad. I can hardly go back to my scientists and tell them Lucene is unable to handle 2 boolean operators, that they should

Re: QueryParser Is Badly Broken

2006-10-12 Thread Erik Hatcher
On Oct 12, 2006, at 7:11 PM, Renaud Waldura wrote: I'm developing an application used by scientists -- people who have a pretty good idea of what logic is -- and they were shocked to find out that neither of these queries return the same results: 1- banana AND apple OR orange 2- banana AND

Re: Avoiding sort by date

2006-10-12 Thread rayvittal-lists
Thanks, Erik for the pointer to Solr. Since the document index is added to frequently, creating new IndexSearchers is required anyway. We plan to 'age' out already created IndexSearcher and create new ones every few minutes. Solr's cache regeneration would be useful in this scenario. Does the

Re: a design question

2006-10-12 Thread Chris Lu
I think a standalone J2EE application will be good and better loose coupling than EJB. You can seperate memory, disk, and CPU resources from your main application. You can send results back in XML, JSON, or other formats. Chris Lu - Instant Full-Text Search On Any

Re: Large index question

2006-10-12 Thread Chris Lu
Lots of memory will help a lot. I have a customer of DBSight and he is using Intel Core Duo, and configure everything in memory. The index size is about 700M. When I checked his system's average response time, it's 12ms! I guess you can estimate what you will get from your beefy machine. So it

Re: a design question

2006-10-12 Thread Otis Gospodnetic
Gecko? ;) My advice: stay away from EJBs as much as you can. They are too complicated and too heavy for most systems. Servlet containers like Jetty, Tomcat, or Resin are often perfectly suitable for the job and a lot simpler. Otis - Original Message From: Chenini, Mohamed [EMAIL

Error while closing IndexWriter

2006-10-12 Thread Shivani Sawhney
Hi All, I am facing a peculiar problem. I am trying to index a file and the indexing code executes without any error but when I try to close the indexer, I get the following error and the error comes very rarely but when it does, no code on document indexing works and I finally have to delete