The older versions of Lucene NRT indexing is slow, the newer version
with RT will be as fast as Lucene's batch indexing is today, which I'm
guessing will be fast enough for many/most users?  Eg, it's simply
analyzing and throwing the data into a RAM buffer (there's no IO or
segment merging happening).

On Mon, Feb 14, 2011 at 10:57 AM, Ted Dunning <tdunn...@maprtech.com> wrote:
> I would find that unacceptable for many systems I have worked on.  Lucene
> update-behind would be fine, but waiting the insert until all of the Lucene
> stuff happened would not be acceptable.
>
> I would much rather that Lucene update from the write log in batches that
> are as big as needed to catch/keep up.
>
> On Mon, Feb 14, 2011 at 9:48 AM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> > Yes, that should work. But doesn't it assume that the index is updated
>> > synchronously with the HBase row? I can imagine this will sometimes be an
>> > issue, e.g. if it would involve performing expensive content extraction
>> > (tika) or analysis.
>>
>> I don't understand here.  You mean that the delay in indexing a
>> document will adversely affect the HBase row insert because it's all
>> in the same transaction?  I think that fine, eg, it's just how the
>> system'd work?
>

Reply via email to