You have to call updateDocument with the unique key of the document to update.
The unique key must be a separate, indexed, not necessarily stored key.
addDocument just adds a new instance of the document to the index, it cannot
determine if it’s a duplicate.
-
Uwe Schindler
I'm not aware of a lucene rather than Solr or whatever tutorial. A
search for something like lucene sharding will get hits.
Why don't you want to use Solr or Katta or similar? They've already
done much of the hard work.
How much data are you talking about?
What are your master-master
Hi Benson,
On Mon, Sep 30, 2013 at 5:21 PM, Benson Margulies ben...@basistech.com wrote:
The multithreaded index searcher fans out across segments. How aggressively
does 'optimize' reduce the number of segments? If the segment count goes
way down, is there some other way to exploit multiple
Maybe Lucene's new replication module is useful for this?
Mike McCandless
http://blog.mikemccandless.com
On Mon, Sep 30, 2013 at 3:08 PM, Neda Grbic neda.gr...@mangora.org wrote:
Hi all
I'm hoping to use Lucene in my project, but I have two master-master
servers. Is there any good tutorial
You might want to set a smallish maxMergedSegmentMB in
TieredMergePolicy to force enough segments in the index ... sort of
the opposite of optimizing.
Really, IndexSearcher's approach to using one thread per segment is
rather silly, and, it's annoying/bad to expose change in behavior due
to
I am afraid, my document in the above code has already a unique-key (will
milli-seconds I hope this is enough to differentiate with another records).
My requirement is simple, I have a folder with a.log,b.log and c.log files
which will be updated every 30 minutes, I want to update the index of
milliseconds as unique keys are a bad idea unless you are 100% certain
you'll never be creating 2 docs in the same millisecond. And are you
saying the log record A1 from file a.log indexed at 14:00 will have
the same unique id as the same record from the same file indexed at
14:30 or will it be
Hi
Basically my log folder consists of four log files like
abc.log,abc1.log,abc2.log,abc3.log, as my log appender is doing. Every 30
minutes content will be changed of all these file , for example after 30
minutes refresh my conent of abc1.log will be replaced with existing abc.log
content and
I'm still a bit confused about exactly what you're indexing, when, but
if you have a unique id and don't want to add or update a doc that's
already present, add the unique id to the index and search (TermQuery
probably) for each one and skip if already present.
Can't you change the log
I am really sorry if something made you confuse, as I said I am indexing a
folder
which contains mylogs.log,mylogs1.log,mylogs2.log etc, I am not indexing
them as a flat file.
I have tokenized my each line of text with regex and storing them as fields
like messageType,
timeStamp,message.
So I
Benson,
Rather than forcing a random number of small segments into the index using
maxMergedSegmentMB, it might be better to split your index into multiple
shards. You can create a specific number of balanced shards to control the
parallelism and then forceMerge each shard down to 1 segment to
On Tue, Oct 1, 2013 at 3:58 PM, Desidero desid...@gmail.com wrote:
Benson,
Rather than forcing a random number of small segments into the index using
maxMergedSegmentMB, it might be better to split your index into multiple
shards. You can create a specific number of balanced shards to control
For anyone who was wondering, this was actually resolved in a different
thread today. I misread the information in the
IndexSearcher(IndexReader,ExecutorService) constructor documentation - I
was under the impression that it was submitting a thread for each index
shard (MultiReader wraps 20
Hi,
use a bounded thread pool.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Desidero [mailto:desid...@gmail.com]
Sent: Tuesday, October 01, 2013 11:37 PM
To: java-user@lucene.apache.org
Uwe,
I was using a bounded thread pool.
I don't know if the problem was the task overload or something about the
actual efficiency of searching a single segment rather than iterating over
multiple AtomicReaderContexts, but I'd lean toward task overload. I will do
some testing tonight to find out
15 matches
Mail list logo