> You should not let just any user run coprocessors on the server. That's > madness. > > Best regards, > > - Andy
Fei Ding, I'm a little confused. Are you trying to solve the problem of querying data efficiently from a table, or are you trying to find an example of where and when to use co-processors? You actually have an interesting problem that isn't easily solved in relational databases, but I don't think its an appropriate problem if you want to stress the use of coprocessors. Yes with Indexes you want to use coprocessors as a way to keep the index in synch with the underlying table. However beyond that... the solution is really best run as a M/R job. Considering that HBase has two different access methods. One is as part of M/R jobs, the other is a client/server model. If you wanted to, you could create a service/engine/app that would allow you to efficiently query and return result sets from your database, as well as manage indexes. In part, coprocessors make this a lot easier. If you consider the general flow of my solution earlier in this thread, you now have a really great way to implement this. Note: we're really talking about allowing someone to query data from a table using multiple indexes and index types. Think alternate table (key/value pair) , Lucene/SOLR, and GeoSpatial. You could even bench mark it against an Oracle implementation, and probably smoke it. You could also do efficient joins between tables. So yeah, I would encourage you to work on your initial problem... ;-) Just Saying... ;-) -Mike On May 16, 2012, at 8:49 PM, Andrew Purtell wrote: > On Wed, May 16, 2012 at 6:43 PM, fding hbase <fding.hb...@gmail.com> wrote: >>> Not coprocessors in general. The client side support for Endpoints >>> (Exec, etc.) gives the developer the fiction of addressing the cluster >>> as a range of rows, and will parallelize per-region Endpoint >>> invocations, and collect the responses, and can return them all to the >>> caller as "a single call". >> >> But on the deadlock problem the Endpoint behaves the same way as Observer. >> Endpoints are also executed via RPC handlers of RegionServer. > > Reread what I wrote. I'm not talking about the server side above. > > Regarding the RPC issues, yes the behavior is the same. My other point > was there is no RPC deadlock if you schedule your additional work > (which issues RPCs) in some background thread or Executor and return > to the client immediately. But that is not what you have claimed you > want to do, you want to do some distributed indexed join if I > understood it correctly *first* (via RPC) and *then* return to the > client. That is how you would get deadlocks. > >> the coprocessors are written by users and any kind of >> code may appear on the server side. > > You should not let just any user run coprocessors on the server. That's > madness. > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet > Hein (via Tom White) >