0 threads
>500 <1000ms | 74 threads
>100 <500ms | 26 threads
>20 <50 ms | 0 threads
>0 <20 ms | 0 threads
-Original Message-
From: Vitaly Funstein [mailto:vfunst...@gmail.com]
Sent: Saturday, June
Hmm, I'm not sure you want to rely on the presence or absence of a
particular document in the index to determine the recovery point. It may
work for inserts, but not likely for updates or removes. I would look into
driving the version numbers from the commiter to the DB, and record them as
commit u
We do have a way to recover partially with a version number for each
transaction. The same version maintained in lucene as one document. During
startup these numbers define what has to be syncd up. Unfortunately lucene is
used in a webapp, so this happens "only" during a jetty restart.
- Vidhy
This is a better idea than what you had before, but I don't think there's
any point in doing any commits manually at all unless you have a way of
detecting and recovering exactly the data that hasn't been committed. In
other words, what difference does it make whether you lost 1 index record
or 1M,
Let me try with the NRT and periodic commit say every 5 mins in a committer
thread on need basis.
Is there a threshold limit on how long we can go without committing ? I think
the buffers get flushed to disk but not to crash proof on disk. So we should be
good on memory.
I should also verify
Hmm, I might have actually given you a slightly incorrect explanation wrt
what happens when internal buffers fill up. There will definitely be a
flush of the buffer, and segment files will be written to, but it's not
actually considered a full commit, i.e. an external reader will not see
these chan
It is non transactional. We first write the same data to database in a
transaction and then call writer addDocument. If lucene fails we still hold
the data to recover.
I can avoid the commit if we use NRT reader. We do need this to be searchable
immediately.
Another question. I did try removi
You could just avoid calling commit() altogether if your application's
semantics allow this (i.e. it's non-transactional in nature). This way,
Lucene will do commits when appropriate, based on the buffering settings
you chose. It's generally unnecessary and undesirable to call commit at the
end of
On Thu, Mar 7, 2013 at 6:44 PM, Michael McCandless
wrote:
> This sounds reasonable (500 M docs / 50 GB index), though you'll need
> to test resulting search perf for what you want to do with it.
>
> To reduce merging time, maximize your IndexWriter RAM buffer
> (setRAMBufferSizeMB). You could als
On Thu, Mar 7, 2013 at 7:06 PM, Jan Stette wrote:
> Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size
> and segments-per-tier settings and see what that does.
>
> The time spent merging seems to be so great though, that I'm wondering if
> I'm actually better off doing the
Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size
and segments-per-tier settings and see what that does.
The time spent merging seems to be so great though, that I'm wondering if
I'm actually better off doing the indexing single-threaded. Am I right in
thinking that no me
This sounds reasonable (500 M docs / 50 GB index), though you'll need
to test resulting search perf for what you want to do with it.
To reduce merging time, maximize your IndexWriter RAM buffer
(setRAMBufferSizeMB). You could also increase the
TieredMergePolicy.setSegmentsPerTier to allow more se
Hi Klaus,
If you've really still got 500MB of changes to your index since the last
time you commit()'ed, then the call to commit() will be costly and take a
while to complete. If in another thread, you reopen() an IndexReader
pointing to that index, it will only see changes since the most recen
It depends on whether or not the commit completes before the
reopen. Lucene 2.9 adds an IndexWriter.getReader method that
will always return with the latest modifications to your index.
So if you're adding many documents, you can at anytime, call
IW.getReader and you will be able to search the cha
Thanks a ton Mark. I am really obliged to interact with you, who is never
hesistant to reply on the slightest of queries.
Thanks again.
Ajay Garg
markrmiller wrote:
>
>
>> Again, if the
>> indexerThreads are bombarding the writer continuously, then the moment,
>> when
>> no indexer is accessi
Again, if the
indexerThreads are bombarding the writer continuously, then the moment, when
no indexer is accessing the writer, may never come. Thus, I invested some of
my time, and wrote my own code, to control the sleeping of indexerThreads.
I don't know how much of a concern this is. All y
Thanks Mark.
Just one last thing, this issue seems to be similar to the case, where the
Lucene source code says, that if an explicit "flush" method is called on an
IndexWriter instance, then again, it will wait for all the indexerThreads to
release the writer, and only then will the flush happen
P.S.
About that write bombardment...its still very difficult for that to be a
problem. Take a look at the tests. I start a bunch of threads searching
as fast as they can, and a bunch of threads writing as fast as they can
- nonstop. And still there are plenty of moments where the references
h
ajay_garg wrote:
Thanks Mark.
Ok, I got your point. So it happens like this :
a) If it is me, who is re-opening an IndxReader, at any time, but
"manually-programmatically". That is, I don't want
a-sort-of-automatic-reopening-of-IndexWriter, then I am fine.
Sure...your kind of doing what In
Thanks Mark.
Ok, I got your point. So it happens like this :
a) If it is me, who is re-opening an IndxReader, at any time, but
"manually-programmatically". That is, I don't want
a-sort-of-automatic-reopening-of-IndexWriter, then I am fine.
b) If I do wish this automatic-reopening of index (usin
You are right that if auto-commit=true and a user reopens an
IndexReader, the docs will absolutely be visible as they are flushed. I
think the part you are missing is that you need to be cooperating with
the IndexAccessor: a user should not be reopening an IndexReader. The
whole point of IndexA
@Mark.
I am sorry, but I need a bit more of explanation. So you mean to say ::
"If auto-commit is false, then of course, docs will not be visible in the
index, until all the threads release themselves out of a particular
IndexWriter instance, and close() the IndexWriter instance.
If auto-commit
You are correct that autocommit=false means that docs will be in the
index before the last thread releases its concurrent hold on a Writer,
*but because IndexAccessor controls* *when the IndexSearchers are
reopened*, those docs will still not be visible until the last thread
holding a Writer re
Hi. Sorry if I seem a stranger in this thread, but there is something that I
can't resist clearing myself on.
Mark, you say that the additional documents added to a index, won't show up
until the # of threads accessing the index hits 0; and subsequently the
indexwriter instance is closed.
But I
1) I should be calling release of writer and searcher after every call. Is
it always mandatory in cases like searcher, when I am sure that I havn't
written anything since the last search ?
You have to be careful here. It works like this: a single searcher is
cached and returned every time. O
Thanks Mark!
Option D looks great. Regarding that option, I have couple of questions
based on my first glance at the code ( more specifically SimpleSearchServer
)
1) I should be calling release of writer and searcher after every call. Is
it always mandatory in cases like searcher, when I am sure
You are not seeing the doc because you need to close the IndexWriter first.
To have an interactive index you can:
A: roll your own.
B: use Solr.
C: use the original LuceneIndexAccessor
https://issues.apache.org/jira/browse/LUCENE-390
D: use my updated IndexAccessor
https://issues.apache.org/ji
27 matches
Mail list logo