[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222340#comment-13222340 ] Anoop Sam John commented on HBASE-2038: --- Or may be we can give the signature of the seek() and reseek() at the RegionScanner as seek( byte[] rowKey ) reseek( byte[] rowKey )? So that the seek will be always to the begin KV of the row in every CF. [ if CF contains that key ] Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221541#comment-13221541 ] Anoop Sam John commented on HBASE-2038: --- @Lars {quote} Seeking is done one level (or two actually) level deeper. Seeking is done in the StoreScanners, coprocessors see RegionScanners. It is not entirely clear to me where to hook this up in that API. {quote} Yes at RegionScanners level we dont have seek() or reseek(). It is one level down @ KeyValueHeap level. Will it be correct to add seek() reseek() behaviours at RegionScanner level?[ We just need to delegate seek() or reseek() calls into KeyValueHeap object within the RegionScanner...] If so it would be very easy to do a reseek() to the needed row at the coprocessor preScannerNext(). next() will take the needed row. What do you say? Correct me if my suggestion is wrong. Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221640#comment-13221640 ] ramkrishna.s.vasudevan commented on HBASE-2038: --- Also we need to have a provision to use the nbRows that is passed while scanning to be used in coprocessor such that the normal scanner.next() can be used in sync with the cached preScannerNext that we do with nbRows. Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221759#comment-13221759 ] Lars Hofhansl commented on HBASE-2038: -- @Anoop: We can certainly try to expose this at the RegionScanner level. Although I feel it might actually be harder than you think, as the seeking is dealt with on a store basis, and we do not want to inhibit the ability to deal with Stores in parallel in the future. RegionScanner.seek would have to go through all Stores and for each Store seek the MemstoreScanner and all StoreFileScanners. Seeking this way across stores is only valid if we seek on row boundaries (each Store - i.e. column family - has it's own set of columns, which could even have the same names between stores). @HBASE-5521: I started working on that, but I am starting to question the usefulness. A filter is per KeyValue (at least the method that allows for seeking). So, many KeyValues flow through the Filter for a single row, and the filter needs to seek separately for each ColumnFamily (as explained above and on the mailing list). So the gain from this would be fairly minimal (which I guess is why we do not have this). For example a row with many column would need to issue many INCLUDE's and only for the last KeyVakue (and how would it know it's the last?) issue INCLUDE_AND_SEEK... Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221797#comment-13221797 ] Anoop Sam John commented on HBASE-2038: --- {quote} @HBASE-5521: I started working on that, but I am starting to question the usefulness. A filter is per KeyValue (at least the method that allows for seeking). So, many KeyValues flow through the Filter for a single row, and the filter needs to seek separately for each ColumnFamily (as explained above and on the mailing list). So the gain from this would be fairly minimal (which I guess is why we do not have this). For example a row with many column would need to issue many INCLUDE's and only for the last KeyVakue (and how would it know it's the last?) issue INCLUDE_AND_SEEK.. {quote} Lars, I was also thinking on this yesterday after seeing the patch. I wanted to give a test case try run before commenting :) Regarding you 1st comment, In our above discussion scenario of seek() we need a row boundary seek.. Yes all the stores ( memstore and all store files in that store) need to get seeked to needed point. Let me see more on this on Monday. we had done small changes and tested this once. I mean we were able to seek to row boundaries. Thanks a lot Lars for your work and suggestion @Ram: Yes we can file a Jira for co processor support for next( int nbrows)? Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221804#comment-13221804 ] Anoop Sam John commented on HBASE-2038: --- Created https://issues.apache.org/jira/browse/HBASE-5517 for the co processor change Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220981#comment-13220981 ] Anoop Sam John commented on HBASE-2038: --- Hi Lars, {quote}It might be possible to provide a custom filter to do that.{quote} - What we wanted from the filter is include a row and then seek to the next row which we are interested in. I cant see such a facility with our Filter right now. Correct me if I am wrong. So suppose we already seeked to one row and this need to be included in the result, then the Filter should return INCLUDE. Then when the next next() call happens, then only we can return a SEEK_USING_HINT. So one extra row reading is needed. This might create even one unwanted HFileBlock fetch (who knows). Can we add reseek() at higher level? If you have suggestion pls give me. Thanks Anoop Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221153#comment-13221153 ] Lars Hofhansl commented on HBASE-2038: -- @Alex: Looks like preScannerOpen could actually change the passed Scan object and add a filter. The API is a bit strange. Scan is marked final, but it is perfectly OK (and possible, and final does not prevent that) to change it here. postScannerOpen also gets the Scan object, but modifying it there is pointless. @Anoop: Yep, for that we'd need to add INCLUDE_AND_SEEK_USING_HINT (similar to the INCLUDE_AND_SEEK_NEXT_ROW that we already have). Shouldn't be hard to add, I'm happy to do that, if that's the route we want to go with this. Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221185#comment-13221185 ] Zhihong Yu commented on HBASE-2038: --- I logged HBASE-5512 for adding INCLUDE_AND_SEEK_USING_HINT Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201205#comment-13201205 ] Anoop Sam John commented on HBASE-2038: --- Hi Lars, I am also trying for a secondary index and I have seen the IHBase concept being good.. But we need this to be moved to coprocessor based so that the kernel code of HBase need not be different for the secondary index. IHBase makes the scan go through all the regions ( as u said ) but they will skip and seek to the later positions in the heap avoid so many possible data read from HDFS etc... When I saw the current co processor, we call preScannerNext() from HRegionServer next(final long scannerId, int nbRows) and pass the RegionScanner here to the co processor. But as per the IHBase way, within the co processor we should be able to seek to the correct row where the indexed col val equals our value. But we can not do this as of now as RegionScanner seek() not there. Also this preScannerNext() will be called once before the actual next(final long scannerId, int nbRows) call happening on the region. Here as per the cache value at the client side the nbRows might be more than one. Now suppose this is nbRows=2 and in the region we have 2 rows one at some what in the middle part of an HFile and the other at another HFile. Now as per IHBase we should 1st seek to the 1st position of the row and after reading this data should seek to the next position. Now as per the current way of calling of preScannerNext() this wont be possible. So I think we might need some change in these area? What do u say? Mean while what is your plan to continue with the way of IHBase storing the index in memory for each of the region or some change in this? Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202057#comment-13202057 ] Anoop Sam John commented on HBASE-2038: --- Hi Alex, Thanks for your reply... Yes I had seen your past comment..I am checking the trunk code for the co processor for this work as of now... What is your comment on my first comment, that the HRegionServer next(final long scannerId, int nbRows) calls the co processor preScannerNext() by passing the RegionScanner. On this we can not make a seek().. Thanks Anoop Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202088#comment-13202088 ] Lars Hofhansl commented on HBASE-2038: -- Unfortunately there is no seeking in the coprocessors, yet. They work more like a filter of a real scan. Seeking is done one level (or two actually) level deeper. Seeking is done in the StoreScanners, coprocessors see RegionScanners. It is not entirely clear to me where to hook this up in that API. It might be possible to provide a custom filter to do that. Filters operate at the storescanner level, and so can (and do) provide seek hints to the calling scanner. Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180942#comment-13180942 ] Alex Baranau commented on HBASE-2038: - Hi. Sorry for abandoning this issue for so long. I'd love to still work on it, but I think I'll manage to dedicate enough time to it only in 2-3 weeks. I plan to do some progress in it starting from next week. If I couldn't do anything in this time then someone could take over it, will not hold anymore. Is that OK? Or is there someone who wants to work on it right away? Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180996#comment-13180996 ] Lars Hofhansl commented on HBASE-2038: -- Maybe there's a way to collaborate on this (although I cannot promise much of my time on this either). From gleaming HBASE-2037 this would not requite ITHBase and the indexes would be always consistent, correct? I have not looked at the 2.5mb(!) patch in HBASE-2037, yet... I was wondering if there's a summary somewhere about how it works. Specifically how does a client know where to look for an indexed value? Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181010#comment-13181010 ] Zhihong Yu commented on HBASE-2038: --- There're already two projects building secondary index on top of HBase: Lily and Culvert. If we provide native secondary indexing support in HBase, we should evaluate what ITHBase, Lily and Culvert have done so that the native support gives better abstraction. Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181069#comment-13181069 ] Alex Baranau commented on HBASE-2038: - +1 for collaboration! re HBASE-2037 (aka IHBase): it is the base for this effort. Yes, it doesn't require ITHBase, it is alternative implementation. The refactored code I pointed above (https://github.com/abaranau/ihbase) is also based on the IHBase code. In short (sorry, the description is not tied to classes, don't have them in front of me currently): As far as I remember (need to refresh my memory though) the point is that index is being kept for each Region, it is loaded in RAM, not persistent. It is built during Region initialization (after HBase restart or new region creation after split and such). When scan is performed with indexed columns involved it uses the index when finding the next record to navigate to and *fast forwards* to this next record (usually by skipping some other records without even reading them). This is where it wins the speed. As this was developed before CPs were added the special API was developed which is being used by client. Hope this helps a bit. I will refresh my memory from the code and we'll discuss that a bit deeper. Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181077#comment-13181077 ] Lars Hofhansl commented on HBASE-2038: -- I see. So if I wanted to do a Get and all I have is the value of an indexed column, I have to ask all regions, correct? (Because there is no way to identify the region with the column value ahead of time) Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181099#comment-13181099 ] Alex Baranau commented on HBASE-2038: - I think that IHBase implementation (current one) implies that. For scans this should not be a big problem though - regions are usually configured to be big and with the help of index the whole region is skipped at once. If we are talking about random single read (Get) operations then this may mean a lot of useless work (comparing to amount of useful work). Do you have a specific use-case (real one or just the on in mind) you want to discuss? If so, may be it makes sense to discuss on ML or even in chat (Skype can work). Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181183#comment-13181183 ] Lars Hofhansl commented on HBASE-2038: -- I have no particular use case... Just that I have been trying to work out how I would do 2nd-ary indexes in HBase and I always came back to this (either it needs some cross region transaction when maintaining the index, or you need to ask all regions when you query the index). Not a problem as such. Just confirming. Such a get could be farmed off in parallel to manu regions, so latency is not necessarily bad. I'd be up for an offline chat. I'll send an email. Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180068#comment-13180068 ] Lars Hofhansl commented on HBASE-2038: -- @Alex: Are you still planning to work on this? Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934313#action_12934313 ] Alex Baranau commented on HBASE-2038: - The next might be helpful. Some time ago I refactored a bit IHBase and extracted base interfaces to make easier develop any custom indexed implementation: IndexManager and IndexScannerContext (I just published code here: https://github.com/abaranau/ihbase). To give an idea, here's how they look: {noformat} public abstract class IndexManager implements HeapSize { public abstract void initialize(IdxRegion region); public abstract PairLong, CallableVoid rebuildIndexes() throws IOException; public abstract void cleanup(); public abstract IndexScannerContext newIndexScannerContext(Scan scan) throws IOException; } public interface IndexScannerContext { KeyValue getNextKey(); void close(); } {noformat} This refactored code somehow correlates with BaseRegionObserverCoprocessor API (at least in my head): * IndexManager's initialize() should be invoked at region open time, * rebuildIndexes() during flush, * cleanup() during region close, * IndexScannerContext should be created during open scan, * it's getNextKey() (with a bit of addional code) during scan's next() * and finally, close() when scan is closed in region. I'm not saying we should use this refactored version of code, I'm just putting it here for better visualization purposes, just as a way to express the idea. Please, correct my logic where needed! Thanks! P.S. Please don't judge the names of classes, other dirty pieces in the refactored version I've shared, I wanted to a) just *try* to add aditional abstraction to be able to inject custom indexing implementation and b) make as little changes in IHBase codebase as I can so that others can follow them easily. Copypasting notes that I took during refactoring (may be helpful if someone wants to go inside the code): 1. Extracted interface IndexManager from IdxRegionIndexManager. 2. Extracted separate IdxRegionIndexManagerMBean from IdxRegionMbean with IndexManager-implementation-specific info 3. Created IndexScannerContext interface (with IdxScannerContext implementation, which encapsulates idxSearchContext and matchedExpressionIterator for existing code) which performs iteration over indexed expression keys. NOTE: Didn't think about renaming class IdxRegionIndexManager and related. NOTE: Didn't think about repackaging things. NOTE: New code/classes lack javadocs, will add them NOTE: Unit-tests should be added with regard to refactoring (add check IdxRegionIndexManagerMBean values at least) Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934341#action_12934341 ] Gary Helmling commented on HBASE-2038: -- Hi Alex, I'm not familiar with the internal IHBase code, but I'll provide whatever info I can on coprocessors. bq. 1) Are coprocessors meant to be stateless? If not, then I assume that one instance is created and assigned to a region and that CP implementation should be thread-safe (e.g. multiple scanners can be handled at the same time for the regions). No, coprocessor implementations do not need to be stateless. If anything you'll need state for many interesting applications. A single coprocessor instance is created per configured coprocessor implementation on region load. You can treat the postOpen() and preClose() methods as init() and destroy() methods in your implementation. And yes, coprocessor implementations need to be thread safe. bq. 2) During batch scan (smth which was added in trunk but wasn't supported in previous HBase versions, and hence current IHBase implementation doesn't take it into account) we need to return multiple rows from scan's next() method. It looks like if we apply current approach (from current IHBase implementation) of fast forwarding to next value we'll only fastforward scan to the first value of those to return. Sorry, I'm not familiar with how IHBase handles this or what changed in the scanner API, but I'm guessing RegionObserver.preScannerNext() does provide much help in this fast-forwarding use case. It seems like this would need much deeper hooks into HRegion.RegionScanner to interact with the positioning code. Alternately, you could expose your own indexed scanner functionality via the dynamic rpc hooks (HTable.coprocessorExec()), but that would require the client to differentiate on indexed vs. non-indexed usage and doesn't provide the transparency you're looking for. bq. 3) Is it in general a good idea to take this initiave (transform IHBase implementation to CP-based one) by me? Sorry again, I don't have much of an answer on this one. I'll help on anything I can on the coprocessor side of things, though! Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934344#action_12934344 ] Andrew Purtell commented on HBASE-2038: --- Alex, bq. Is it in general a good idea to take this initiave (transform IHBase implementation to CP-based one) by me? Well I for one definitely think this is a good idea. bq. I believe it's a good time for this effort and hope that CP-based implementation of region-level indexing will confirm that CP API is complete and has all one might need (for now). As do I. However there are additions and improvements to the CP API coming, see HBASE-3256 and HBASE-3257. The latter especially may be relevant. bq. I assume that one instance is created and assigned to a region and that CP implementation should be thread-safe (e.g. multiple scanners can be handled at the same time for the regions). Correct. bq. I believe that CoprocessorEnvironment's get/put/remove methods are used to store intermediate data (aka attributes) between method calls (if we really need it). Correct, but see next answer below. bq. Is CoprocessorEnvironment instance is created one-per-region? No. This is created once per coprocessor. Each coprocessor has its own environment, which can be used to keep state between multiple threads of a coprocessor attached to one region, but not between multiple coprocessors. CoprocessorHost is the object that is created once per region. bq. I can store some scan-related data using scanId passed to the scan-related callbacks (is it safe?) This should be safe. I don't understand the current IHBase implementation enough to answer your question #2. bq. This refactored code somehow correlates with BaseRegionObserverCoprocessor API (at least in my head) I think that is a great start. Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.