> I can't imagine that the speed achieved by using Hbase would be even within
> orders of magnitude of what you can do in Lucene 4 (or even 3).

The indexing speed in Lucene hasn't changed in quite a while, are you
saying HBase would somehow be overloaded?  That doesn't seem to jive
with the sequential writes HBase performs?

On the query side, I think they should be fine as well?  At the rock
bottom, all we need need to be able to do is sequentially scan the
posting lists?

The speed of indexing is a function of creating segments, with
flexible indexing, the underlying segment files (and postings) may be
significantly altered from the default file structures, eg, placed
into HBase in various ways.  The posting lists could even be split
along with HBase regions?

> For reference, I think that Michi Busch's search based on flexible indexing

You mean for Twitter?  I can't comment on that, however as far as I
know the internals don't use Lucene, eg, it's a entirely new inverted
index structure specifically for Twitter.  I think this's illustrated
in these slides:
http://www.lucenerevolution.org/sites/default/files/Lucene%20Rev%20Preso%20Busch%20Realtime_Search_LR1010.pdf

On Fri, Feb 11, 2011 at 3:27 PM, Ted Dunning <tdunn...@maprtech.com> wrote:
> Jason,
>
> I can't imagine that the speed achieved by using Hbase would be even within
> orders of magnitude of what you can do in Lucene 4 (or even 3).
>
> For reference, I think that Michi Busch's search based on flexible indexing
> is able to handle >10,000 inserts and >40,000 searches per second on a
> laptop.  Each search involves a number of scans of posting vectors so this
> is roughly equivalent to >100,000 scans per second (on a single host).
>
> The rumor is that the insert speed is so high that it is quickly to re-index
> 500 million documents than to load an index.
>
> I don't think that hbase is intended to be anywhere near this kind of speed.
>
>
> On Fri, Feb 11, 2011 at 3:10 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> Hello,
>>
>> I'm curious as to what a 'good' approach would be for implementing
>> search in HBase (using Lucene) with the end goal being the integration
>> of realtime search into HBase.  I think the use case makes sense as
>> HBase is realtime and has a write-ahead log, performs automatic
>> partitioning, splitting of data, failover, redundancy, etc.  These are
>> all things Lucene does not have out of the box, that we'd essentially
>> get for 'free'.
>>
>> For starters: Where would be the right place to store Lucene segments
>> or postings?  Eg, we need to be able to efficiently perform a linear
>> iteration of the per-term posting list(s).
>>
>> Thanks!
>>
>> Jason Rutherglen
>>
>

Reply via email to