Hi Michel, I indexed each column within a column family of a table, so we can query a row with specific column value. By multi-index I mean using multiple indexes at the same time on a single query. That looks like a SQL select with two *where* clauses of two indexed columns.
The row key of index table is made up of column value and row key of indexed table. For set intersection I used the utility class from Apache common-collections package CollectionUtils.intersection(). There's no assumption on sort order on indices. A scan with column value as startKey and column value+1 as endKey applied to index table will return all rows in indexed table with that column value. For multi-index queries, previously I tried to use a scan for each index column and intersect of those result sets to get the rows that I want. But the query time is too long. So I decided to move the computation of intersection to server side and reduce the amount of data transferred. Do you have any better idea? On Mon, May 14, 2012 at 8:17 PM, Michel Segel <[email protected]>wrote: > Need a little clarification... > > You said that you need to do multi-index queries. > > Did you mean to say multiple people running queries at the same time, or > did you mean you wanted to do multi-key indexes where the key is a > multi-key part. > > Or did you mean that you really wanted to use multiple indexes at the same > time on a single query? > > If its the latter, not really a good idea... > How do you handle the intersection of the two sets? (3 sets or more?) > Can you assume that the indexes are in sort order? > > What happens when the results from the indexes exceed the amount of > allocated memory? > > What I am suggesting you to do is to set aside the underpinnings of HBase > and look at the problem you are trying to solve in general terms. Not an > easy one... > > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 14, 2012, at 4:35 AM, fding hbase <[email protected]> wrote: > > > Hi all, > > > > Is it possible to use table scanner (different from the host table > region) > > or > > execute coprocessor of another table, in the endpoint coprocessor? > > It looks like chaining coprocessors. But I found a possible deadlock! > > Can anyone help me with this? > > > > In my testing environment I deployed the 0.92.0 version from CDH. > > I wrote an Endpoint coprocessor to do composite secondary index queries. > > The index is stored in another table and the index update is maintained > > by the client through a extended HTable. While a single index query > > works fine through Scanners of index table, soon after we realized > > we need to do multi-index queries at the same time. > > At first we tried to pull every row keys queried from a single index > table > > and do the merge (just set intersection) on the client, > > but that overruns the network bandwidth. So I proposed to try > > the endpoint coprocessor. The idea is to use coprocessors, one > > in master table (the indexed table) and the other for each index table > > regions. > > Each master table region coprocessor instance invokes the index table > > coprocessor instances with its regioninfo (the startKey and endKey) and > the > > scan, > > the index table region coprocessor instance scans and returns the row > keys > > within the range of startKey and endKey passed in. > > > > The cluster blocks sometimes in invoking the index table coprocessor. I > > traced > > into the code and found that when HConnection locates regions it will rpc > > to the same regionserver. > > > > (After a while I found the index table coprocessor is equivalent to > > just a plain scan with filter, so I switched to scanners with filter, but > > the problem > > remains.) > -- Best Regards! Fei Ding [email protected]
