I don't think Phoenix will solve his problem. He also needs to explain more about his problem before we can start to think about the problem.
On Apr 25, 2013, at 4:54 PM, lars hofhansl <[email protected]> wrote: > You might want to have a look at Phoenix > (https://github.com/forcedotcom/phoenix), which does that and more, and gives > a SQL/JDBC interface. > > -- Lars > > > > ________________________________ > From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <[email protected]> > To: [email protected] > Sent: Thursday, April 25, 2013 2:44 PM > Subject: Coprocessors > > > Folks: > > This is my first post on the HBase user mailing list. > > I have the following scenario: > I've a HBase table of upto a billion keys. I'm looking to support an > application where on some user action, I'd need to fetch multiple columns for > upto 250K keys and do some sort of aggregation on it. Fetching all that data > and doing the aggregation in my application takes about a minute. > > I'm looking to co-locate the aggregation logic with the region servers to > a. Distribute the aggregation > b. Avoid having to fetch large amounts of data over the network (this could > potentially be cross-datacenter) > > Neither observers nor aggregation endpoints work for this use case. Observers > don't return data back to the client while aggregation endpoints work in the > context of scans not a multi-get (Are these correct assumptions?). > > I'm looking to write a service that runs alongside the region servers and > acts a proxy b/w my application and the region servers. > > I plan to use the logic in HBase client's HConnectionManager, to segment my > request of 1M rowkeys into sub-requests per region-server. These are sent > over to the proxy which fetches the data from the region server, aggregates > locally and sends data back. Does this sound reasonable or even a useful > thing to pursue? > > Regards, > -sudarshan
