RE: Concurrent Indexing

2014-07-03 Thread Umashanker, Srividhya
0 threads >500 <1000ms | 74 threads >100 <500ms | 26 threads >20 <50 ms | 0 threads >0 <20 ms | 0 threads -Original Message- From: Vitaly Funstein [mailto:vfunst...@gmail.com] Sent: Saturday, June

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
Hmm, I'm not sure you want to rely on the presence or absence of a particular document in the index to determine the recovery point. It may work for inserts, but not likely for updates or removes. I would look into driving the version numbers from the commiter to the DB, and record them as commit u

Re: Concurrent Indexing

2014-06-20 Thread Umashanker, Srividhya
We do have a way to recover partially with a version number for each transaction. The same version maintained in lucene as one document. During startup these numbers define what has to be syncd up. Unfortunately lucene is used in a webapp, so this happens "only" during a jetty restart. - Vidhy

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
This is a better idea than what you had before, but I don't think there's any point in doing any commits manually at all unless you have a way of detecting and recovering exactly the data that hasn't been committed. In other words, what difference does it make whether you lost 1 index record or 1M,

Re: Concurrent Indexing

2014-06-20 Thread Umashanker, Srividhya
Let me try with the NRT and periodic commit say every 5 mins in a committer thread on need basis. Is there a threshold limit on how long we can go without committing ? I think the buffers get flushed to disk but not to crash proof on disk. So we should be good on memory. I should also verify

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
Hmm, I might have actually given you a slightly incorrect explanation wrt what happens when internal buffers fill up. There will definitely be a flush of the buffer, and segment files will be written to, but it's not actually considered a full commit, i.e. an external reader will not see these chan

Re: Concurrent Indexing

2014-06-20 Thread Umashanker, Srividhya
It is non transactional. We first write the same data to database in a transaction and then call writer addDocument. If lucene fails we still hold the data to recover. I can avoid the commit if we use NRT reader. We do need this to be searchable immediately. Another question. I did try removi

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
You could just avoid calling commit() altogether if your application's semantics allow this (i.e. it's non-transactional in nature). This way, Lucene will do commits when appropriate, based on the buffering settings you chose. It's generally unnecessary and undesirable to call commit at the end of

Re: Concurrent indexing performance problem

2013-03-07 Thread Simon Willnauer
On Thu, Mar 7, 2013 at 6:44 PM, Michael McCandless wrote: > This sounds reasonable (500 M docs / 50 GB index), though you'll need > to test resulting search perf for what you want to do with it. > > To reduce merging time, maximize your IndexWriter RAM buffer > (setRAMBufferSizeMB). You could als

Re: Concurrent indexing performance problem

2013-03-07 Thread Simon Willnauer
On Thu, Mar 7, 2013 at 7:06 PM, Jan Stette wrote: > Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size > and segments-per-tier settings and see what that does. > > The time spent merging seems to be so great though, that I'm wondering if > I'm actually better off doing the

Re: Concurrent indexing performance problem

2013-03-07 Thread Jan Stette
Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size and segments-per-tier settings and see what that does. The time spent merging seems to be so great though, that I'm wondering if I'm actually better off doing the indexing single-threaded. Am I right in thinking that no me

Re: Concurrent indexing performance problem

2013-03-07 Thread Michael McCandless
This sounds reasonable (500 M docs / 50 GB index), though you'll need to test resulting search perf for what you want to do with it. To reduce merging time, maximize your IndexWriter RAM buffer (setRAMBufferSizeMB). You could also increase the TieredMergePolicy.setSegmentsPerTier to allow more se

Re: Concurrent Indexing and Searching

2009-09-25 Thread Jake Mannix
Hi Klaus, If you've really still got 500MB of changes to your index since the last time you commit()'ed, then the call to commit() will be costly and take a while to complete. If in another thread, you reopen() an IndexReader pointing to that index, it will only see changes since the most recen

Re: Concurrent Indexing and Searching

2009-09-25 Thread Jason Rutherglen
It depends on whether or not the commit completes before the reopen. Lucene 2.9 adds an IndexWriter.getReader method that will always return with the latest modifications to your index. So if you're adding many documents, you can at anytime, call IW.getReader and you will be able to search the cha

Re: Concurrent Indexing + Searching

2008-02-05 Thread ajay_garg
Thanks a ton Mark. I am really obliged to interact with you, who is never hesistant to reply on the slightest of queries. Thanks again. Ajay Garg markrmiller wrote: > > >> Again, if the >> indexerThreads are bombarding the writer continuously, then the moment, >> when >> no indexer is accessi

Re: Concurrent Indexing + Searching

2008-02-05 Thread Mark Miller
Again, if the indexerThreads are bombarding the writer continuously, then the moment, when no indexer is accessing the writer, may never come. Thus, I invested some of my time, and wrote my own code, to control the sleeping of indexerThreads. I don't know how much of a concern this is. All y

Re: Concurrent Indexing + Searching

2008-02-05 Thread ajay_garg
Thanks Mark. Just one last thing, this issue seems to be similar to the case, where the Lucene source code says, that if an explicit "flush" method is called on an IndexWriter instance, then again, it will wait for all the indexerThreads to release the writer, and only then will the flush happen

Re: Concurrent Indexing + Searching

2008-02-05 Thread Mark Miller
P.S. About that write bombardment...its still very difficult for that to be a problem. Take a look at the tests. I start a bunch of threads searching as fast as they can, and a bunch of threads writing as fast as they can - nonstop. And still there are plenty of moments where the references h

Re: Concurrent Indexing + Searching

2008-02-05 Thread Mark Miller
ajay_garg wrote: Thanks Mark. Ok, I got your point. So it happens like this : a) If it is me, who is re-opening an IndxReader, at any time, but "manually-programmatically". That is, I don't want a-sort-of-automatic-reopening-of-IndexWriter, then I am fine. Sure...your kind of doing what In

Re: Concurrent Indexing + Searching

2008-02-05 Thread ajay_garg
Thanks Mark. Ok, I got your point. So it happens like this : a) If it is me, who is re-opening an IndxReader, at any time, but "manually-programmatically". That is, I don't want a-sort-of-automatic-reopening-of-IndexWriter, then I am fine. b) If I do wish this automatic-reopening of index (usin

Re: Concurrent Indexing + Searching

2008-02-04 Thread Mark Miller
You are right that if auto-commit=true and a user reopens an IndexReader, the docs will absolutely be visible as they are flushed. I think the part you are missing is that you need to be cooperating with the IndexAccessor: a user should not be reopening an IndexReader. The whole point of IndexA

Re: Concurrent Indexing + Searching

2008-02-03 Thread ajay_garg
@Mark. I am sorry, but I need a bit more of explanation. So you mean to say :: "If auto-commit is false, then of course, docs will not be visible in the index, until all the threads release themselves out of a particular IndexWriter instance, and close() the IndexWriter instance. If auto-commit

Re: Concurrent Indexing + Searching

2008-02-03 Thread Mark Miller
You are correct that autocommit=false means that docs will be in the index before the last thread releases its concurrent hold on a Writer, *but because IndexAccessor controls* *when the IndexSearchers are reopened*, those docs will still not be visible until the last thread holding a Writer re

Re: Concurrent Indexing + Searching

2008-02-03 Thread ajay_garg
Hi. Sorry if I seem a stranger in this thread, but there is something that I can't resist clearing myself on. Mark, you say that the additional documents added to a index, won't show up until the # of threads accessing the index hits 0; and subsequently the indexwriter instance is closed. But I

Re: Concurrent Indexing + Searching

2008-02-01 Thread Mark Miller
1) I should be calling release of writer and searcher after every call. Is it always mandatory in cases like searcher, when I am sure that I havn't written anything since the last search ? You have to be careful here. It works like this: a single searcher is cached and returned every time. O

Re: Concurrent Indexing + Searching

2008-02-01 Thread Infinite Tester
Thanks Mark! Option D looks great. Regarding that option, I have couple of questions based on my first glance at the code ( more specifically SimpleSearchServer ) 1) I should be calling release of writer and searcher after every call. Is it always mandatory in cases like searcher, when I am sure

Re: Concurrent Indexing + Searching

2008-02-01 Thread Mark Miller
You are not seeing the doc because you need to close the IndexWriter first. To have an interactive index you can: A: roll your own. B: use Solr. C: use the original LuceneIndexAccessor https://issues.apache.org/jira/browse/LUCENE-390 D: use my updated IndexAccessor https://issues.apache.org/ji