[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-03-05 Thread Anoop Sam John (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222340#comment-13222340
 ] 

Anoop Sam John commented on HBASE-2038:
---

Or may be we can give the signature of the seek() and reseek() at the 
RegionScanner as seek( byte[] rowKey ) reseek( byte[] rowKey )?
So that the seek will be always to the begin KV of the row in every CF. [ if CF 
contains that key ]

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-03-03 Thread Anoop Sam John (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221541#comment-13221541
 ] 

Anoop Sam John commented on HBASE-2038:
---

@Lars

{quote}
Seeking is done one level (or two actually) level deeper.
Seeking is done in the StoreScanners, coprocessors see RegionScanners.

It is not entirely clear to me where to hook this up in that API.
{quote}

Yes at RegionScanners level we dont have seek() or reseek(). It is one level 
down @ KeyValueHeap level.
Will it be correct to add seek() reseek() behaviours at RegionScanner level?[ 
We just need to delegate seek() or reseek() calls into KeyValueHeap  object 
within the RegionScanner...]

If so it would be very easy to do a reseek() to the needed row at the 
coprocessor preScannerNext().
next() will take the needed row.

What do you say? Correct me if my suggestion is wrong.

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-03-03 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221640#comment-13221640
 ] 

ramkrishna.s.vasudevan commented on HBASE-2038:
---

Also we need to have a provision to use the nbRows that is passed while 
scanning to be used in coprocessor such that the normal scanner.next() can be 
used in sync with the cached preScannerNext that we do with nbRows. 

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-03-03 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221759#comment-13221759
 ] 

Lars Hofhansl commented on HBASE-2038:
--

@Anoop: We can certainly try to expose this at the RegionScanner level. 
Although I feel it might actually be harder than you think, as the seeking is 
dealt with on a store basis, and we do not want to inhibit the ability to deal 
with Stores in parallel in the future.
RegionScanner.seek would have to go through all Stores and for each Store seek 
the MemstoreScanner and all StoreFileScanners. Seeking this way across stores 
is only valid if we seek on row boundaries (each Store - i.e. column family - 
has it's own set of columns, which could even have the same names between 
stores).

@HBASE-5521: I started working on that, but I am starting to question the 
usefulness.
A filter is per KeyValue (at least the method that allows for seeking). So, 
many KeyValues flow through the Filter for a single row, and the filter needs 
to seek separately for each ColumnFamily (as explained above and on the mailing 
list).
So the gain from this would be fairly minimal (which I guess is why we do not 
have this).
For example a row with many column would need to issue many INCLUDE's and only 
for the last KeyVakue (and how would it know it's the last?) issue 
INCLUDE_AND_SEEK...

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-03-03 Thread Anoop Sam John (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221797#comment-13221797
 ] 

Anoop Sam John commented on HBASE-2038:
---

{quote}
@HBASE-5521: I started working on that, but I am starting to question the 
usefulness.
A filter is per KeyValue (at least the method that allows for seeking). So, 
many KeyValues flow through the Filter for a single row, and the filter needs 
to seek separately for each ColumnFamily (as explained above and on the mailing 
list).
So the gain from this would be fairly minimal (which I guess is why we do not 
have this).
For example a row with many column would need to issue many INCLUDE's and only 
for the last KeyVakue (and how would it know it's the last?) issue 
INCLUDE_AND_SEEK..
{quote}

Lars,   I was also thinking on this yesterday after seeing the patch. I wanted 
to give a test case try run before commenting :) 

Regarding you 1st comment, In our above discussion scenario of seek() we need a 
row boundary seek.. Yes all the stores ( memstore and all store files in that 
store) need to get seeked to needed point. Let me see more on this on Monday. 
we had done small changes and tested this once. I mean we were able to seek to 
row boundaries.

Thanks a lot Lars for your work and suggestion

@Ram: Yes we can file a Jira for co processor support for next( int nbrows)?


 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-03-03 Thread Anoop Sam John (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221804#comment-13221804
 ] 

Anoop Sam John commented on HBASE-2038:
---

Created https://issues.apache.org/jira/browse/HBASE-5517 for the co processor 
change

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-03-02 Thread Anoop Sam John (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220981#comment-13220981
 ] 

Anoop Sam John commented on HBASE-2038:
---

Hi Lars,
{quote}It might be possible to provide a custom filter to do that.{quote}

- What we wanted from the filter is include a row and then seek to the next row 
which we are interested in. I cant see such a facility with our Filter right 
now. Correct me if I am wrong. So suppose we already seeked to one row and this 
need to be included in the result, then the Filter should return INCLUDE. Then 
when the next next() call happens, then only we can return a SEEK_USING_HINT. 
So one extra row reading is needed. This might create even one unwanted 
HFileBlock fetch (who knows).
Can we add reseek() at higher level?
If you have suggestion pls give me.

Thanks
Anoop

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-03-02 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221153#comment-13221153
 ] 

Lars Hofhansl commented on HBASE-2038:
--

@Alex: Looks like preScannerOpen could actually change the passed Scan object 
and add a filter. The API is a bit strange. Scan is marked final, but it is 
perfectly OK (and possible, and final does not prevent that) to change it here. 
postScannerOpen also gets the Scan object, but modifying it there is pointless.

@Anoop: Yep, for that we'd need to add INCLUDE_AND_SEEK_USING_HINT (similar to 
the INCLUDE_AND_SEEK_NEXT_ROW that we already have). Shouldn't be hard to add, 
I'm happy to do that, if that's the route we want to go with this.

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-03-02 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221185#comment-13221185
 ] 

Zhihong Yu commented on HBASE-2038:
---

I logged HBASE-5512 for adding INCLUDE_AND_SEEK_USING_HINT

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-02-06 Thread Anoop Sam John (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201205#comment-13201205
 ] 

Anoop Sam John commented on HBASE-2038:
---

Hi Lars,
  I am also trying for a secondary index and I have seen the IHBase 
concept being good.. But we need this to be moved to coprocessor based so that 
the kernel code of HBase need not be different for the secondary index. IHBase 
makes the scan go through all the regions ( as u said ) but they will skip and 
seek to the later positions in the heap avoid so many possible data read from 
HDFS etc...
When I saw the current co processor, we call preScannerNext() from 
HRegionServer next(final long scannerId, int nbRows)  and pass the 
RegionScanner here to the co processor.  But as per the IHBase way, within the 
co processor we should be able to seek to the correct row where the indexed col 
val equals our value. But we can not do this as of now as RegionScanner seek() 
not there. 

Also this preScannerNext() will be called once before the actual next(final 
long scannerId, int nbRows) call happening on the region. Here as per the cache 
value at the client side the nbRows might be more than one. Now suppose this is 
nbRows=2 and in the region we have 2 rows one at some what in the middle part 
of an HFile and the other at another HFile. Now as per IHBase we should 1st 
seek to the 1st position of the row and after reading this data should seek to 
the next position. Now as per the current way of calling of preScannerNext() 
this wont be possible. So I think we might need some change in these area?  
What do u say?

Mean while what is your plan to continue with the way of IHBase storing the 
index in memory for each of the region or some change in this?

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-02-06 Thread Anoop Sam John (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202057#comment-13202057
 ] 

Anoop Sam John commented on HBASE-2038:
---

Hi Alex,
Thanks for your reply...  Yes I had seen your past comment..I am checking 
the trunk code for the co processor for this work as of now...

What is your comment on my first comment, that the HRegionServer next(final 
long scannerId, int nbRows) calls the co processor preScannerNext() by passing 
the RegionScanner. On this we can not make a seek()..

Thanks
Anoop


 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-02-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202088#comment-13202088
 ] 

Lars Hofhansl commented on HBASE-2038:
--

Unfortunately there is no seeking in the coprocessors, yet. They work more like 
a filter of a real scan. Seeking is done one level (or two actually) level 
deeper.
Seeking is done in the StoreScanners, coprocessors see RegionScanners.

It is not entirely clear to me where to hook this up in that API.

It might be possible to provide a custom filter to do that. Filters operate at 
the storescanner level, and so can (and do) provide seek hints to the calling 
scanner.

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-01-05 Thread Alex Baranau (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180942#comment-13180942
 ] 

Alex Baranau commented on HBASE-2038:
-

Hi. Sorry for abandoning this issue for so long. I'd love to still work on it, 
but I think I'll manage to dedicate enough time to it only in 2-3 weeks. I plan 
to do some progress in it starting from next week.
If I couldn't do anything in this time then someone could take over it, will 
not hold anymore.

Is that OK? Or is there someone who wants to work on it right away?

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-01-05 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180996#comment-13180996
 ] 

Lars Hofhansl commented on HBASE-2038:
--

Maybe there's a way to collaborate on this (although I cannot promise much of 
my time on this either).

From gleaming HBASE-2037 this would not requite ITHBase and the indexes would 
be always consistent, correct? I have not looked at the 2.5mb(!) patch in 
HBASE-2037, yet...
I was wondering if there's a summary somewhere about how it works. Specifically 
how does a client know where to look for an indexed value?


 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-01-05 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181010#comment-13181010
 ] 

Zhihong Yu commented on HBASE-2038:
---

There're already two projects building secondary index on top of HBase: Lily 
and Culvert.

If we provide native secondary indexing support in HBase, we should evaluate 
what ITHBase, Lily and Culvert have done so that the native support gives 
better abstraction.

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-01-05 Thread Alex Baranau (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181069#comment-13181069
 ] 

Alex Baranau commented on HBASE-2038:
-

+1 for collaboration!
re HBASE-2037 (aka IHBase): it is the base for this effort. Yes, it doesn't 
require ITHBase, it is alternative implementation. The refactored code I 
pointed above (https://github.com/abaranau/ihbase) is also based on the IHBase 
code.

In short (sorry, the description is not tied to classes, don't have them in 
front of me currently):

As far as I remember (need to refresh my memory though) the point is that index 
is being kept for each Region, it is loaded in RAM, not persistent. It is built 
during Region initialization (after HBase restart or new region creation after 
split and such). When scan is performed with indexed columns involved it uses 
the index when finding the next record to navigate to and *fast forwards* to 
this next record (usually by skipping some other records without even reading 
them). This is where it wins the speed.

As this was developed before CPs were added the special API was developed which 
is being used by client.

Hope this helps a bit. I will refresh my memory from the code and we'll discuss 
that a bit deeper.

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-01-05 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181077#comment-13181077
 ] 

Lars Hofhansl commented on HBASE-2038:
--

I see. So if I wanted to do a Get and all I have is the value of an indexed 
column, I have to ask all regions, correct? (Because there is no way to 
identify the region with the column value ahead of time)


 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-01-05 Thread Alex Baranau (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181099#comment-13181099
 ] 

Alex Baranau commented on HBASE-2038:
-

I think that IHBase implementation (current one) implies that. For scans this 
should not be a big problem though - regions are usually configured to be big 
and with the help of index the whole region is skipped at once. If we are 
talking about random single read (Get) operations then this may mean a lot of 
useless work (comparing to amount of useful work).

Do you have a specific use-case (real one or just the on in mind) you want to 
discuss? If so, may be it makes sense to discuss on ML or even in chat (Skype 
can work).

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-01-05 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181183#comment-13181183
 ] 

Lars Hofhansl commented on HBASE-2038:
--

I have no particular use case... Just that I have been trying to work out how I 
would do 2nd-ary indexes in HBase and I always came back to this (either it 
needs some cross region transaction when maintaining the index, or you need to 
ask all regions when you query the index).

Not a problem as such. Just confirming.
Such a get could be farmed off in parallel to manu regions, so latency is not 
necessarily bad.
I'd be up for an offline chat. I'll send an email.

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180068#comment-13180068
 ] 

Lars Hofhansl commented on HBASE-2038:
--

@Alex: Are you still planning to work on this?

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-2038) Coprocessors: Region level indexing

2010-11-21 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934313#action_12934313
 ] 

Alex Baranau commented on HBASE-2038:
-

The next might be helpful. Some time ago I refactored a bit IHBase and 
extracted base interfaces to make easier develop any custom indexed 
implementation: IndexManager and IndexScannerContext (I just published code 
here: https://github.com/abaranau/ihbase). To give an idea, here's how they 
look:
{noformat}
public abstract class IndexManager implements HeapSize {
  public abstract void initialize(IdxRegion region);
  public abstract PairLong, CallableVoid rebuildIndexes() throws 
IOException;
  public abstract void cleanup();
  public abstract IndexScannerContext newIndexScannerContext(Scan scan) throws 
IOException;
}

public interface IndexScannerContext {
  KeyValue getNextKey();
  void close();
}
{noformat}
This refactored code somehow correlates with BaseRegionObserverCoprocessor API 
(at least in my head):
* IndexManager's initialize() should be invoked at region open time,
* rebuildIndexes() during flush,
* cleanup() during region close,
* IndexScannerContext should be created during open scan,
* it's getNextKey() (with a bit of addional code) during scan's next()
* and finally, close() when scan is closed in region.

I'm not saying we should use this refactored version of code, I'm just putting 
it here for better visualization purposes, just as a way to express the idea.

Please, correct my logic where needed! 
Thanks!

P.S. Please don't judge the names of classes, other dirty pieces in the 
refactored version I've  shared, I wanted to a) just *try* to add aditional 
abstraction to be able to inject custom indexing implementation and b) make as 
little changes in IHBase codebase as I can so that others can follow them 
easily. Copypasting notes that I took during refactoring (may be helpful if 
someone wants to go inside the code):

1. Extracted interface IndexManager from IdxRegionIndexManager.
2. Extracted separate IdxRegionIndexManagerMBean from IdxRegionMbean with 
IndexManager-implementation-specific info
3. Created IndexScannerContext interface (with IdxScannerContext 
implementation, which encapsulates idxSearchContext and 
matchedExpressionIterator for existing code) which performs iteration over 
indexed expression keys.

NOTE: Didn't think about renaming class IdxRegionIndexManager and related.
NOTE: Didn't think about repackaging things.
NOTE: New code/classes lack javadocs, will add them
NOTE: Unit-tests should be added with regard to refactoring (add check 
IdxRegionIndexManagerMBean values at least)

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2038) Coprocessors: Region level indexing

2010-11-21 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934341#action_12934341
 ] 

Gary Helmling commented on HBASE-2038:
--

Hi Alex,

I'm not familiar with the internal IHBase code, but I'll provide whatever info 
I can on coprocessors.

bq. 1) Are coprocessors meant to be stateless? If not, then I assume that one 
instance is created and assigned to a region and that CP implementation 
should be thread-safe (e.g. multiple scanners can be handled at the same time 
for the regions).

No, coprocessor implementations do not need to be stateless.  If anything 
you'll need state for many interesting applications.  A single coprocessor 
instance is created per configured coprocessor implementation on region load.  
You can treat the postOpen() and preClose() methods as init() and destroy() 
methods in your implementation.  And yes, coprocessor implementations need to 
be thread safe.

bq. 2) During batch scan (smth which was added in trunk but wasn't supported in 
previous HBase versions, and hence current IHBase implementation doesn't take 
it into account) we need to return multiple rows from scan's next() method. It 
looks like if we apply current approach (from current IHBase implementation) of 
fast forwarding to next value we'll only fastforward scan to the first value 
of those to return. 

Sorry, I'm not familiar with how IHBase handles this or what changed in the 
scanner API, but I'm guessing RegionObserver.preScannerNext() does provide much 
help in this fast-forwarding use case.  It seems like this would need much 
deeper hooks into HRegion.RegionScanner to interact with the positioning code.  
Alternately, you could expose your own indexed scanner functionality via the 
dynamic rpc hooks (HTable.coprocessorExec()), but that would require the client 
to differentiate on indexed vs. non-indexed usage and doesn't provide the 
transparency you're looking for.

bq. 3) Is it in general a good idea to take this initiave (transform IHBase 
implementation to CP-based one) by me?

Sorry again, I don't have much of an answer on this one.  I'll help on anything 
I can on the coprocessor side of things, though!



 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2038) Coprocessors: Region level indexing

2010-11-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934344#action_12934344
 ] 

Andrew Purtell commented on HBASE-2038:
---

Alex, 

bq. Is it in general a good idea to take this initiave (transform IHBase 
implementation to CP-based one) by me?

Well I for one definitely think this is a good idea. 

bq. I believe it's a good time for this effort and hope that CP-based 
implementation of region-level indexing will confirm that CP API is complete 
and has all one might need (for now).

As do I. However there are additions and improvements to the CP API coming, see 
HBASE-3256 and HBASE-3257. The latter especially may be relevant.

bq.  I assume that one instance is created and assigned to a region and that 
CP implementation should be thread-safe (e.g. multiple scanners can be handled 
at the same time for the regions). 

Correct.

bq. I believe that CoprocessorEnvironment's get/put/remove methods are used to 
store intermediate data (aka attributes) between method calls (if we really 
need it). 

Correct, but see next answer below. 

bq. Is CoprocessorEnvironment instance is created one-per-region? 

No. This is created once per coprocessor. Each coprocessor has its own 
environment, which can be used to keep state between multiple threads of a 
coprocessor attached to one region, but not between multiple coprocessors.

CoprocessorHost is the object that is created once per region. 

bq. I can store some scan-related data using scanId passed to the scan-related 
callbacks (is it safe?)

This should be safe. 

I don't understand the current IHBase implementation enough to answer your 
question #2.

bq. This refactored code somehow correlates with BaseRegionObserverCoprocessor 
API (at least in my head)

I think that is a great start.

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.