RE: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread Uwe Schindler
You have to call updateDocument with the unique key of the document to update. The unique key must be a separate, indexed, not necessarily stored key. addDocument just adds a new instance of the document to the index, it cannot determine if it’s a duplicate. - Uwe Schindler

Re: Multi server

2013-10-01 Thread Ian Lea
I'm not aware of a lucene rather than Solr or whatever tutorial. A search for something like lucene sharding will get hits. Why don't you want to use Solr or Katta or similar? They've already done much of the hard work. How much data are you talking about? What are your master-master

Re: How to make good use of the multithreaded IndexSearcher?

2013-10-01 Thread Adrien Grand
Hi Benson, On Mon, Sep 30, 2013 at 5:21 PM, Benson Margulies ben...@basistech.com wrote: The multithreaded index searcher fans out across segments. How aggressively does 'optimize' reduce the number of segments? If the segment count goes way down, is there some other way to exploit multiple

Re: Multi server

2013-10-01 Thread Michael McCandless
Maybe Lucene's new replication module is useful for this? Mike McCandless http://blog.mikemccandless.com On Mon, Sep 30, 2013 at 3:08 PM, Neda Grbic neda.gr...@mangora.org wrote: Hi all I'm hoping to use Lucene in my project, but I have two master-master servers. Is there any good tutorial

Re: How to make good use of the multithreaded IndexSearcher?

2013-10-01 Thread Michael McCandless
You might want to set a smallish maxMergedSegmentMB in TieredMergePolicy to force enough segments in the index ... sort of the opposite of optimizing. Really, IndexSearcher's approach to using one thread per segment is rather silly, and, it's annoying/bad to expose change in behavior due to

RE: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread gudiseashok
I am afraid, my document in the above code has already a unique-key (will milli-seconds I hope this is enough to differentiate with another records). My requirement is simple, I have a folder with a.log,b.log and c.log files which will be updated every 30 minutes, I want to update the index of

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread Ian Lea
milliseconds as unique keys are a bad idea unless you are 100% certain you'll never be creating 2 docs in the same millisecond. And are you saying the log record A1 from file a.log indexed at 14:00 will have the same unique id as the same record from the same file indexed at 14:30 or will it be

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread gudiseashok
Hi Basically my log folder consists of four log files like abc.log,abc1.log,abc2.log,abc3.log, as my log appender is doing. Every 30 minutes content will be changed of all these file , for example after 30 minutes refresh my conent of abc1.log will be replaced with existing abc.log content and

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread Ian Lea
I'm still a bit confused about exactly what you're indexing, when, but if you have a unique id and don't want to add or update a doc that's already present, add the unique id to the index and search (TermQuery probably) for each one and skip if already present. Can't you change the log

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-01 Thread gudiseashok
I am really sorry if something made you confuse, as I said I am indexing a folder which contains mylogs.log,mylogs1.log,mylogs2.log etc, I am not indexing them as a flat file. I have tokenized my each line of text with regex and storing them as fields like messageType, timeStamp,message. So I

Re: How to make good use of the multithreaded IndexSearcher?

2013-10-01 Thread Desidero
Benson, Rather than forcing a random number of small segments into the index using maxMergedSegmentMB, it might be better to split your index into multiple shards. You can create a specific number of balanced shards to control the parallelism and then forceMerge each shard down to 1 segment to

Re: How to make good use of the multithreaded IndexSearcher?

2013-10-01 Thread Benson Margulies
On Tue, Oct 1, 2013 at 3:58 PM, Desidero desid...@gmail.com wrote: Benson, Rather than forcing a random number of small segments into the index using maxMergedSegmentMB, it might be better to split your index into multiple shards. You can create a specific number of balanced shards to control

Re: Query performance in Lucene 4.x

2013-10-01 Thread Desidero
For anyone who was wondering, this was actually resolved in a different thread today. I misread the information in the IndexSearcher(IndexReader,ExecutorService) constructor documentation - I was under the impression that it was submitting a thread for each index shard (MultiReader wraps 20

RE: Query performance in Lucene 4.x

2013-10-01 Thread Uwe Schindler
Hi, use a bounded thread pool. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Desidero [mailto:desid...@gmail.com] Sent: Tuesday, October 01, 2013 11:37 PM To: java-user@lucene.apache.org

Re: Query performance in Lucene 4.x

2013-10-01 Thread Desidero
Uwe, I was using a bounded thread pool. I don't know if the problem was the task overload or something about the actual efficiency of searching a single segment rather than iterating over multiple AtomicReaderContexts, but I'd lean toward task overload. I will do some testing tonight to find out