Re: EndPoint Coprocessor could be dealocked?

fding hbase Mon, 14 May 2012 06:21:21 -0700

Hi Michel,

I indexed each column within a column family of a table, so we can query a
row with specific column value.
By multi-index I mean using multiple indexes at the same time on a single
query. That looks like a SQL select
with two *where* clauses of two indexed columns.


The row key of index table is made up of column value and row key of
indexed table. For set intersection
I used the utility class from Apache common-collections package
CollectionUtils.intersection(). There's no
assumption on sort order on indices. A scan with column value as startKey
and column value+1 as endKey
applied to index table will return all rows in indexed table with that
column value.

For multi-index queries, previously I tried to use a scan for each index
column and intersect of those
result sets to get the rows that I want. But the query time is too long. So
I decided to move the computation of
intersection to server side and reduce the amount of data transferred.

Do you have any better idea?

On Mon, May 14, 2012 at 8:17 PM, Michel Segel <[email protected]>wrote:

> Need a little clarification...
>
> You said that you need to do multi-index queries.
>
> Did you mean to say multiple people running queries at the same time, or
> did you mean you wanted to do multi-key indexes where the key is a
> multi-key part.
>
> Or did you mean that you really wanted to use multiple indexes at the same
> time on a single query?
>
> If its the latter, not really a good idea...
> How do you handle the intersection of the two sets? (3 sets or more?)
> Can you assume that the indexes are in sort order?
>
> What happens when the results from the indexes exceed the amount of
> allocated memory?
>
> What I am suggesting you to do is to set aside the underpinnings of HBase
> and look at the problem you are trying to solve in general terms.  Not an
> easy one...
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On May 14, 2012, at 4:35 AM, fding hbase <[email protected]> wrote:
>
> > Hi all,
> >
> > Is it possible to use table scanner (different from the host table
> region)
> > or
> > execute coprocessor of another table, in the endpoint coprocessor?
> > It looks like chaining coprocessors. But I found a possible deadlock!
> > Can anyone help me with this?
> >
> > In my testing environment I deployed the 0.92.0 version from CDH.
> > I wrote an Endpoint coprocessor to do composite secondary index queries.
> > The index is stored in another table and the index update is maintained
> > by the client through a extended HTable. While a single index query
> > works fine through Scanners of index table, soon after we realized
> > we need to do multi-index queries at the same time.
> > At first we tried to pull every row keys queried from a single index
> table
> > and do the merge (just set intersection) on the client,
> > but that overruns the network bandwidth. So I proposed to try
> > the endpoint coprocessor. The idea is to use coprocessors, one
> > in master table (the indexed table) and the other for each index table
> > regions.
> > Each master table region coprocessor instance invokes the index table
> > coprocessor instances with its regioninfo (the startKey and endKey) and
> the
> > scan,
> > the index table region coprocessor instance scans and returns the row
> keys
> > within the range of startKey and endKey passed in.
> >
> > The cluster blocks sometimes in invoking the index table coprocessor. I
> > traced
> > into the code and found that when HConnection locates regions it will rpc
> > to the same regionserver.
> >
> > (After a while I found the index table coprocessor is equivalent to
> > just a plain scan with filter, so I switched to scanners with filter, but
> > the problem
> > remains.)
>



-- 

Best Regards!

Fei Ding
[email protected]

Re: EndPoint Coprocessor could be dealocked?

Reply via email to