> You should not let just any user run coprocessors on the server. That's 
> madness.
> 
> Best regards,
> 
>    - Andy

Fei Ding, 

I'm a little confused. 
Are you trying to solve the problem of querying  data efficiently from a table, 
or are you trying to find an example of where and when  to use co-processors?

You actually have an interesting problem that isn't easily solved in relational 
databases, but I don't think its an appropriate problem if you want to stress 
the use of coprocessors. 

Yes with Indexes you want to use coprocessors as a way to keep the index in 
synch with the underlying table. 

However beyond that... the solution is really best run as a M/R job. 

Considering that HBase has two different access methods. One is as part of M/R 
jobs, the other is a client/server model.  If you wanted to, you could create a 
service/engine/app that would allow you to efficiently query and return result 
sets from your database, as well as manage indexes. 
In part, coprocessors make this a lot easier. 

If you consider the general flow of my solution earlier in this thread, you now 
have a really great way to implement this.

Note: we're really talking about allowing someone to query data from a table 
using multiple indexes and index types. Think alternate table (key/value pair) 
, Lucene/SOLR, and GeoSpatial.

You could even bench mark it against an Oracle implementation, and probably 
smoke it.
You could also do efficient joins between tables. 

So yeah, I would encourage you to work on your initial problem... ;-)

Just Saying...  ;-)

-Mike

On May 16, 2012, at 8:49 PM, Andrew Purtell wrote:

> On Wed, May 16, 2012 at 6:43 PM, fding hbase <fding.hb...@gmail.com> wrote:
>>> Not coprocessors in general. The client side support for Endpoints
>>> (Exec, etc.) gives the developer the fiction of addressing the cluster
>>> as a range of rows, and will parallelize per-region Endpoint
>>> invocations, and collect the responses, and can return them all to the
>>> caller as "a single call".
>> 
>> But on the deadlock problem the Endpoint behaves the same way as Observer.
>> Endpoints are also executed via RPC handlers of RegionServer.
> 
> Reread what I wrote. I'm not talking about the server side above.
> 
> Regarding the RPC issues, yes the behavior is the same. My other point
> was there is no RPC deadlock if you schedule your additional work
> (which issues RPCs) in some background thread or Executor and return
> to the client immediately. But that is not what you have claimed you
> want to do, you want to do some distributed indexed join if I
> understood it correctly *first* (via RPC) and *then* return to the
> client. That is how you would get deadlocks.
> 
>> the coprocessors are written by users and any kind of
>> code may appear on the server side.
> 
> You should not let just any user run coprocessors on the server. That's 
> madness.
> 
> Best regards,
> 
>    - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein (via Tom White)
> 

Reply via email to