I like the approach of building Lucene indexes for HBase data via a coprocessor. However, the requirement (for good performance) of mmap of HDFS blocks from the local filesystem, presupposing regionserver and datanode colocation, presupposing short circuit local access, presupposing an HDFS API modification (that was vetoed), is at issue here. It seems we have to do something else. How can HBase provide index data to Lucene such that it isn't a massive layering violation? Maybe the Lucene 4 Codec and CodecProvider interfaces? (I'm not all that familiar with Lucene internals, so big caveat there.)
Indeed Jason put a lot of work into the HBASE-3529 patch, and it is a shame we couldn't commit the result. On Thu, Sep 20, 2012 at 5:15 PM, Otis Gospodnetic <[email protected]> wrote: > I agree with Stack. I liked that whole approach and it's a shame it > didn't get committed after all the work Jason put into it. > > Otis > Search Analytics - http://sematext.com/search-analytics/index.html > Performance Monitoring - http://sematext.com/spm/index.html > > > On Thu, Sep 20, 2012 at 5:32 PM, Stack <[email protected]> wrote: >> On Thu, Sep 20, 2012 at 12:43 PM, Andrew Purtell <[email protected]> wrote: >>> The issue with the patch on HBASE-3529 is it relies on modifications >>> to HDFS that the author of HBASE-3529 proposed to the HDFS project as >>> https://issues.apache.org/jira/browse/HDFS-2004. The proposal was >>> vetoed. Therefore, further progress on HBASE-3529 as currently >>> implemented is not possible. >>> >> >> Jason's approach had much merit (IMO). It warrants study at least. >> >> Though the indices were written to HDFS, Jason had it so lucene was >> getting local filesystem access by going via the local read >> short-circuit facility [1]. Being able to do this made it so he got >> close to native speeds querying the "HDFS-based" indices. When Jason >> left it -- he had to get a real job unfortunately -- he was blocked on >> what to do when a region moved. He wanted to be able to be able to >> immediately pull the indices local on region reopen. The HDFS fellas >> who commented in the issue cited by Andrew above thought it a little >> dodgy adding API for this special case. >> >> If you wanted to follow in Jasons footsteps, lets chat. >> St.Ack >> >> 1. http://hbase.apache.org/book.html#perf.hdfs.configs -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
