Yes. St.Ack
On Mon, Jul 25, 2011 at 1:23 PM, Paul Nickerson <[email protected]> wrote: > We currently run on the cloudera stack. Would this be something that we can > pull, compile, and plug right into that stack? > > ----- Original Message ----- > > From: "Gary Helmling" <[email protected]> > To: [email protected] > Sent: Monday, July 25, 2011 2:02:50 PM > Subject: Re: Fanning out hbase queries in parallel > > Coprocessors are currently only in trunk. They will be in the 0.92 release > once we get that out. There's no set date for that, but personally I'll be > trying to help get it out sooner than later. > > > On Mon, Jul 25, 2011 at 7:37 AM, Michel Segel > <[email protected]>wrote: > >> Which release(s) have coprocessors enabled? >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On Jul 24, 2011, at 11:03 PM, Sonal Goyal <[email protected]> wrote: >> >> > Hi Paul, >> > >> > Have you taken a look at HBase coprocessors? I think you will find them >> > useful. >> > >> > Best Regards, >> > Sonal >> > <https://github.com/sonalgoyal/hiho>Hadoop ETL and Data >> > Integration<https://github.com/sonalgoyal/hiho> >> > Nube Technologies <http://www.nubetech.co> >> > >> > <http://in.linkedin.com/in/sonalgoyal> >> > >> > >> > >> > >> > >> > On Mon, Jul 25, 2011 at 8:13 AM, Paul Nickerson < >> [email protected] >> >> wrote: >> > >> >> >> >> I would like to implement a multidimensional query system that >> aggregates >> >> large amounts of data on-the-fly by fanning out queries in parallel. It >> >> should be fast enough for interactive exploration of the data and >> extensible >> >> enough to take sets of hundreds or thousands of dimensions with high >> >> cardinality, and aggregate them from high granularity to low >> granularity. >> >> Dimensions and their values are stored in the row key. For instance, row >> >> keys look like this >> >> Foo=bar,blah=123 >> >> and each row contains numerical values within their column families, >> such >> >> as plays=100, versioned by the date of calculation. >> >> User wants the top "Foo" values with blah=123 sorted downward by total >> >> plays in july. My current thinking is that a query would get executed by >> >> grouping all Foo-prefixed row keys by region server, and send the query >> to >> >> each of those. Each region server iterates through all of it's row keys >> that >> >> start with Foo=something,blah=, and passes the query on to all regions >> >> containing blahs that equal 123, which then contain play counts. >> Matching >> >> row keys, as well as the sum of all their play values within july, are >> >> passed back up the chain and sorted/truncated when possible. >> >> >> >> >> >> It seems quite complicated and would involve either modifying hbase >> source >> >> code or at the very least using the deep internals of the api. Does this >> >> seem like a practical solution or could someone offer some ideas? >> >> >> >> >> >> Thank you! >> > >
