of NetSearch
-Original Message-
From: Doron Cohen [mailto:[EMAIL PROTECTED]
Sent: 25 July 2006 22:23
To: java-user@lucene.apache.org
Subject: Re: Index Rows as Documents? Help me design a solution
Few comments -
(from first posting in this thread)
The indexing was taking much more than minutes
the multi threading and distribution
of the parts of the log to each writer.
Mike
www.ardentia.com the home of NetSearch
-Original Message-
From: Doron Cohen [mailto:[EMAIL PROTECTED]
Sent: 25 July 2006 22:23
To: java-user@lucene.apache.org
Subject: Re: Index Rows as Documents? Help me
-
From: Doron Cohen [mailto:[EMAIL PROTECTED]
Sent: 25 July 2006 22:23
To: java-user@lucene.apache.org
Subject: Re: Index Rows as Documents? Help me design a solution
Few comments -
(from first posting in this thread)
The indexing was taking much more than minutes for a 1 MB log file
It feels to me like you're major problem might be file IO with all those
files. There's no need to split the files up first and then index the files.
Just read through the log and index each row. The code fragment you posted
should allow you to get the line back from the line field of each
A document per row is seems correct to me too.
If search would be by msisdn / messageid, - and if, as it seems, these are
keywords, not free text that needs to be analyzed, they both should have
Index.UNTOKENIZED. Also, since no search is to be done by the line content,
the line should have
On Dienstag 25 Juli 2006 04:05, Namit Yadav wrote:
1 List SMSIDs of all the SMSes that a phone number had sent (Each SMS
message will have a globally unique ID)
2 List SomeData1, SomeData2, SomeData3 and SomeData4 for a given SMSID.
How can I do this efficiently?
Short answer: use a
Indexing 1M of logs shouldn't take minutes, so you're probably right.
A problem I've seen is opening/indexing/closing your index writer too often.
You should do something like... (really bad pseudo code here)
IndexWriter IW = new IndexWriter();
for (lots and lots and lots of records) {
The code looks good, *assuming* that the IndexWriter you pass in isn't
closed/opened between files (this would be a problem if you have lots of
files to index..). I've had the IndexWriter.optimize method take a
lng time to complete, so I typically don't do this until I'm entirely
done...
Few comments -
(from first posting in this thread)
The indexing was taking much more than minutes for a 1 MB log file. ...
I would expect to be able to index at least a of GB of logs within 1 or 2
minutes.
1-2 minutes per GB would be 30-60 GB/Hour, which for a single machine/jvm
is a lot -
My question might be very easy for you Lucene experts. But after going
through the Lucene documentation / example, I haven't been able to
figure out how to solve this problem. I'll be really grateful if
someone can help me get a starting point here.
Our application tracks SMSes sent from a
10 matches
Mail list logo