[jira] [Commented] (HBASE-7233) Serializing KeyValues over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602394#comment-13602394 ] ramkrishna.s.vasudevan commented on HBASE-7233: --- bq.Tags are a little different than our current fields because there are multiple per Cell The above comment is from Matt. So basically we say we will have multiple tags for a Cell. Which means cell which internally is now a represenation of a KV will have more than one additional attributes added to it (which is Tags) and one among them will be an ACL tag, visibility tag etc. So now how will we say which tag to see if i want to know only the Visibility part of the Cell? I could see an tagIterator() api added that iterates thro the tags, so is it like every time iterate to find out which is my Visisbility tag. Will there be a mechanism which says visibility tag should be the first tag or second .something like that? Serializing KeyValues over RPC -- Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.95.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. This issue wandered and became general discussion of KeyValue serialization, in particular, how to pass lots of KeyValues across rpc. It was noticed that what we were passing over the wire for KeyValues was not protobuf'd KeyValues but the old serialization which assumes the KeyValue version 1 format. After a bunch of good discussion working out rpc formats, was decided to close this issue in favor of more specific issues: see summary at https://issues.apache.org/jira/browse/HBASE-7233?focusedCommentId=13573259page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13573259 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583456#comment-13583456 ] stack commented on HBASE-7233: -- I made HBASE-7898 for passing Cells over RPC and HBASE-7897 for adding tags to Cells. Let me close this issue now as won't fix. I believe items raised here now have dedicated jiras. Serializing KeyValues over RPC -- Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. This issue wandered and became general discussion of KeyValue serialization, in particular, how to pass lots of KeyValues across rpc. It was noticed that what we were passing over the wire for KeyValues was not protobuf'd KeyValues but the old serialization which assumes the KeyValue version 1 format. After a bunch of good discussion working out rpc formats, was decided to close this issue in favor of more specific issues: see summary at https://issues.apache.org/jira/browse/HBASE-7233?focusedCommentId=13573259page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13573259 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573744#comment-13573744 ] stack commented on HBASE-7233: -- [~mcorgan] Yeah, I think that the way to go. Client can send over that it favors v2 but that it can do v1 too... Server will do to the best of its abilities. This won't help [~andrew.purt...@gmail.com] if he wants to refer to tags via the Cell Interface? He'll have to cast to KV2 which will 'break'? So, wondering if we should add to Cell Interface now support for tags? Only other feature we have ever talked of is adding a mvcc into the key but Interface already has that so a shift in how it is implemented should have no effect on the Cell Interface. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573756#comment-13573756 ] Andrew Purtell commented on HBASE-7233: --- If versioning the codecs like BasicCellEncoderV1, then could we have accessors for tags in the Cell interface but BasicCellEncoderV1 would throw UnsupportedOperationException, while a BasicCellEncoderV2 would support it? And/or a method in the interface that a user can interrogate for capabilities, i.e. can do tags or not? Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573757#comment-13573757 ] stack commented on HBASE-7233: -- [~andrew.purt...@gmail.com] I was thinking would just return no tags in v1 rather than unsupported? Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573760#comment-13573760 ] Andrew Purtell commented on HBASE-7233: --- bq. I was thinking would just return no tags in v1 rather than unsupported? Sure, easy. Let's at least get the placeholders in. Both HBASE-7662 and HBASE-7663 would need efficiently stored and accessed tags in KVs, and [~enis] was also talking about using KV tags for holding metadata for grouping rows recently if I recall correctly. It would be even better if there was a BasicCellEncoderV2 that could actually store and retrieve tags, at least in unit tests, even if not baked enough to actually use until a later release. Something to build on. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573784#comment-13573784 ] Matt Corgan commented on HBASE-7233: {quote}So, wondering if we should add to Cell interface now support for tags?{quote}Guess I'm confused why we need to add it to the Cell interface now since the reason for the versioning is to enable us to add it later. It would help future-proof the KeyValueEncoderV1? I think Andy has found a way to work the tags into the current KeyValue serialization, so might not even need a V2 for that. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573823#comment-13573823 ] Andrew Purtell commented on HBASE-7233: --- Yeah I guess there is a little bit of confusion here. I was thinking Cell can support getting tag data, and the encoders might not support it (v1) or could (v2). Should tags be a concern of Cell or KV? As maybe an interesting consideration, I only need tags in on disk representation and actually should not send the ones I'd be working with over the wire to the client. bq. I think Andy has found a way to work the tags into the current KeyValue serialization I did, but it is ugly IMHO: I store the value length as negative, prepend delimited tag data to the user value data, and parse the tags into in-memory metadata and fix up offsets on KV instantiation. Do we actually want this? If so, then I guess we can have tagged KVs mixed with old KVs in a backwards compatible way. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573883#comment-13573883 ] stack commented on HBASE-7233: -- Tags should be concern of Cell else you will have to cast to KV2 before getting them. I'd think we'd add tag support to Cell so we don't have to change it later (that'd be painful, wouldn't it?) Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574120#comment-13574120 ] Matt Corgan commented on HBASE-7233: I don't see a big problem adding them to the interface. Current encoders may ignore them when writing a file (which would be bad for security), but future or modified encoders could add support for them. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573189#comment-13573189 ] Ted Yu commented on HBASE-7233: --- Fro KeyValueDecoder: {code} + public boolean next() { +if (!this.hasNext) return !this.hasNext; {code} I think this.hasNext should be returned. For TestBasicCellCodec: {code} + public void testOne() throws IOException { ... + public void testThree() throws IOException { {code} Would testOneKeyValue(), testThreeKeyValue() be better names ? Similar comment for TestCellMessageCodec.testOne() For ProtobufUtil.java: {code} -builder.addKeyValue(toKeyValue(c)); +builder.addKeyValue(toCell(c)); {code} It would be nice if the method name for builder can be changed to addCell(). Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573204#comment-13573204 ] stack commented on HBASE-7233: -- Thanks for the comments. First one is good. Was sort of looking for more high-level commentary on whether this a good direction or not. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573210#comment-13573210 ] Ted Yu commented on HBASE-7233: --- {code} +public class BasicCellDecoder implements CellScanner { +public class CellMessageDecoder implements CellScanner { +public class KeyValueDecoder implements CellScanner { {code} The decoders all depend on InputStream#available(). It would be nice if class javadoc is added explaining the context where each of them would be used. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573220#comment-13573220 ] stack commented on HBASE-7233: -- Should have said more plainly that this is experimental/prototyping code for ideas mentioned in cited google doc and mentioned elsewhere over in rpc spec issue. Patches are posted for high level does this look right feedback, not javadoc could be better or method naming suggestions. Thanks. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573232#comment-13573232 ] Ted Yu commented on HBASE-7233: --- I read the google doc linked above. I think the patch reflects what the doc describes. Meaning the direction is good. RPC spec mentions EncodedDataBlock in several places. I am not sure the spec has been updated. Will go over the spec and patch in HBASE-7533 tomorrow so that I can gain better understanding of these two JIRAs. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573241#comment-13573241 ] stack commented on HBASE-7233: -- [~ted_yu] Beware it is a work in progress Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573258#comment-13573258 ] Matt Corgan commented on HBASE-7233: I like it. Nice and simple. What are your thoughts on BasicCellEncoder vs KeyValueEncoder? Are you introducing the new BasicCell format because it's easier for a non-java client to decode? KeyValueEncoder appears faster in this benchmark because the serialization is less granular, but I think that will become irrelevant over time if we get the Cell interface all the way up the read path. It's faster now because you know the input cell is KeyValue so can just cast. If you don't know the input Cell implementation, you'd have to append each KeyValue field separately. Would probably be good to include BasicCellEncoder and KeyValue encoder just to make sure that the necessary abstractions are in place to add future encodings. Later on, my guess is one of the delta-style encoders will be best for java client RPC. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573259#comment-13573259 ] stack commented on HBASE-7233: -- I think we should close this issue now. Its scope has wandered and we got a bunch of use out of it but I think its time has come. The original intent was removing Writable from KV which has happened over in another issue. How KVs went across MapReduce when not Writable came up and that got fixed out in other issues also. We then did a bunch of the back and forth on how to serialize KVs particularly across RPC. Most of the discussion has been captured here: https://docs.google.com/document/d/1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw/edit# ...and will be realized in code over in the rpc revamp issue, HBASE-7533 (patches added here will reappear in that issue) Regards Andrew's request regards HBASE-6222 above on being able to pass a KeyValue version2, a v2 should make it across the rpc and even into hfiles (after doing some work on EncodedDataBlocks so they sling Cells instead of KVs) but what is required will not happen in this issue. The client and server will need to all be moved to reference the Cell Interface rather than KV1 as they currently do. Only then could a KV2 traverse the client and server. That is work to do (As Lars Hofhansl's found out, doing this often makes for speedups since we are often realizing KVs when all we need is a piece). Lets take up the effort to change the servers to be Cell based rather than KV elsehwere. The EncodedDataBlocks revamp should happen elsewhere too. Neither of the above should hold up 0.96 release (A 0.96 client should be able to talk to a future server that can do KV version 2). Should we add anything to the Cell Interface before 0.96 ships; e.g. the getTagsArray, etc., Matt suggets above for Andrew's tag work? If so, lets get that in. Will close in a day or two unless objection. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573267#comment-13573267 ] stack commented on HBASE-7233: -- bq. What are your thoughts on BasicCellEncoder vs KeyValueEncoder? Are you introducing the new BasicCell format because it's easier for a non-java client to decode? Thanks [~mcorgan] for review. I think KVEncoder will be lowest common denominator. Should include BasicCellEncoder too. The CellMessageCodec was just to see. Not worth including I'd say. I should look again at BasicCellEncoder. Might be able to make it a bit better. Your point on KV codec having an unfair advantage currently is a indeed the case. Will move this code over to the rpc issue hbase-7533. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573283#comment-13573283 ] Andrew Purtell commented on HBASE-7233: --- I'm good if we have placeholders for tags (HBASE-7448) and can build on that as the Cell work takes shape. The first drop for HBASE-6222 can use out of line storage for the ACL metadata with some slowdown. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573284#comment-13573284 ] stack commented on HBASE-7233: -- [~andrew.purt...@gmail.com] What about additions to Cell Interface? The ones Matt suggested above? Too soon? Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573288#comment-13573288 ] Matt Corgan commented on HBASE-7233: One idea is to version the codecs like BasicCellEncoderV1, then if we want to add tags we make V2. To avoid historical version explosion we apply the normal policy of only supporting upgrade over a single major release. Just delete the old versions after that. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571080#comment-13571080 ] stack commented on HBASE-7233: -- Here are the last few runs encoding/decoding 100k KVs with small key and value (so, worst case). I did 30 cycles each encoding then decoding so hotspot would cut in. I print out the last three runs of each in below. In short, as has been said already, we can't have pb do Cell serialization. Its ten times slower both encoding and decoding (I had a look w/ profiler and doesn't seem to be anything particularly dumb going on... encoding, its just the copying of byte arrays out of Cell into pb ByteString and then decoding, its inching over the stream reading varints and allocating the arrays to copy into). {code} 13/02/04 21:57:05 INFO codec.CodecPerformance: 27 encoded count=10 in 13ms for encoder org.apache.hbase.codec.KeyValueEncoder@7fcebc9f 13/02/04 21:57:05 INFO codec.CodecPerformance: 28 encoded count=10 in 13ms for encoder org.apache.hbase.codec.KeyValueEncoder@5dc1ac46 13/02/04 21:57:05 INFO codec.CodecPerformance: 29 encoded count=10 in 13ms for encoder org.apache.hbase.codec.KeyValueEncoder@14718242 13/02/04 21:57:06 INFO codec.CodecPerformance: 26 decoded count=10 in 16ms for decoder org.apache.hbase.codec.KeyValueDecoder@962522b 13/02/04 21:57:06 INFO codec.CodecPerformance: 27 decoded count=10 in 27ms for decoder org.apache.hbase.codec.KeyValueDecoder@53ea0105 13/02/04 21:57:06 INFO codec.CodecPerformance: 28 decoded count=10 in 32ms for decoder org.apache.hbase.codec.KeyValueDecoder@25dd9891 13/02/04 21:57:06 INFO codec.CodecPerformance: 29 decoded count=10 in 16ms for decoder org.apache.hbase.codec.KeyValueDecoder@774b6b02 13/02/04 21:57:08 INFO codec.CodecPerformance: 27 encoded count=10 in 62ms for encoder org.apache.hbase.codec.BasicCellEncoder@407e75d2 13/02/04 21:57:08 INFO codec.CodecPerformance: 28 encoded count=10 in 62ms for encoder org.apache.hbase.codec.BasicCellEncoder@2e694f12 13/02/04 21:57:08 INFO codec.CodecPerformance: 29 encoded count=10 in 61ms for encoder org.apache.hbase.codec.BasicCellEncoder@4c309f9f 13/02/04 21:57:09 INFO codec.CodecPerformance: 27 decoded count=10 in 38ms for decoder org.apache.hbase.codec.BasicCellDecoder@76f1fad1 13/02/04 21:57:09 INFO codec.CodecPerformance: 28 decoded count=10 in 37ms for decoder org.apache.hbase.codec.BasicCellDecoder@5ee771f3 13/02/04 21:57:09 INFO codec.CodecPerformance: 29 decoded count=10 in 40ms for decoder org.apache.hbase.codec.BasicCellDecoder@1c8321c8 13/02/04 21:57:11 INFO codec.CodecPerformance: 7 decoded count=10 in 174ms for decoder org.apache.hbase.codec.CellMessageDecoder@64d1afd3 13/02/04 21:57:11 INFO codec.CodecPerformance: 8 decoded count=10 in 176ms for decoder org.apache.hbase.codec.CellMessageDecoder@4ecd200f 13/02/04 21:57:12 INFO codec.CodecPerformance: 9 decoded count=10 in 175ms for decoder org.apache.hbase.codec.CellMessageDecoder@151cc2a8 13/02/04 21:57:15 INFO codec.CodecPerformance: 27 decoded count=10 in 178ms for decoder org.apache.hbase.codec.CellMessageDecoder@4226c7da 13/02/04 21:57:15 INFO codec.CodecPerformance: 28 decoded count=10 in 177ms for decoder org.apache.hbase.codec.CellMessageDecoder@5083198c 13/02/04 21:57:15 INFO codec.CodecPerformance: 29 decoded count=10 in 186ms for decoder org.apache.hbase.codec.CellMessageDecoder@263b84ee {code} Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571097#comment-13571097 ] stack commented on HBASE-7233: -- Updated https://docs.google.com/document/d/1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw/edit# Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569821#comment-13569821 ] Ted Yu commented on HBASE-7233: --- OldSchoolKeyValueDecoder.java and CodecException.java miss license. For OldSchoolKeyValueDecoder: {code} + @Override + public boolean next() { +if (!this.hasNext) return !this.hasNext; {code} True is returned above. Does this align with the javadoc for next() ? {code} + /** + * Advance the scanner 1 cell. + * @return true if the next cell is found and getCurrentCell() will return a valid Cell + */ + boolean next(); {code} There seems to be dependency on HBASE-4676. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt, 7233v7.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546669#comment-13546669 ] stack commented on HBASE-7233: -- Some unfinished notes I've been keeping on how to pass KeyValues: https://docs.google.com/document/pub?id=1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531437#comment-13531437 ] Andrew Purtell commented on HBASE-7233: --- {quote} bq. I would really prefer not to double the number of kV types just to say foo with tags. And then double again for foo with tags and bar. That would be ugly, but at the same time it's difficult and maybe wasteful to future-proof it from every angle. Tags are already sort of a flexible future-proofing mechanism. Maybe tags can be added in a backwards compatible way to the existing encoders. I'd have to think about it for PrefixTree, probably punting them to a PREFIX_TREE2 encoder with some other additions/improvements. {quote} The use case I'm looking at is adding security policy information to KVs (HBASE-6222), could be either ACLs or visibility labels, both can be handled the same way. There's a 1:1 mapping, so it makes sense to store the policy information in the KV. This also has the nice property of reading in the ACL for free in the same op that reads in the KV. I'm not asking for specifically more than tagging KVs with this specific metadata but, given that tags could be easily made generic enough to support a number of other cases, I think it makes sense to do that. Then security is just one user of something more generally useful, we haven't done something fixed for security's sake only. Adding tag support to the encoders might be the right answer. Would we still have the trouble of teaching KeyValue about where in the bytebuffers coming out of the encoder the tag data resides? Any thoughts on how we might distinguish a KV with tags from one without? Maybe we don't, we just have the encoder add the discovered tag data to the KV by way of an API that adds out of band metadata to the KV's in memory representation? And likewise add tags to the blocks beyond the KV itself if they are present? Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531619#comment-13531619 ] Matt Corgan commented on HBASE-7233: {quote}Would we still have the trouble of teaching KeyValue about where in the bytebuffers coming out of the encoder the tag data resides?{quote}Tags are a little different than our current fields because there are multiple per Cell. For performance sake, we may want to keep the Cell interface to these low level methods: getTagsArray(), getNumTags(), getTagsOffset(), getTagsLength(), and then have methods for parsing them like {code}Iterablebyte[] tags = CellTool.getTagsIterator(cell){code}. So the requirement for the encoder/decoder would be to line them up in a single array: vint length0bytes tag0vint length1bytes tag1etc. Behind the scenes, tags could be encoded similarly to qualifiers (speaking for prefix-tree) {quote}Any thoughts on how we might distinguish a KV with tags from one without? {quote}Could just have Cell.getNumTags() return 0 Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531668#comment-13531668 ] Andrew Purtell commented on HBASE-7233: --- bq. For performance sake, we may want to keep the Cell interface to these low level methods: getTagsArray(), getNumTags(), getTagsOffset(), getTagsLength(), and then have methods for parsing them [an iterator] Agreed. My hack-patch does approximately this. I need to study up on how Cells would be materialized into KeyValues. In the AccessController we wrap InternalScanners with a filter that looks at each KeyValue on the way out to the client and evaluates their visibility to the user. Somehow from the KeyValue API we'd need to get to the cell tags iterator to extract the ACL (or visibility tag). A type byte or even making tags name-value pairs would avoid accumulation of ad-hoc means for distinguishing between them. Tags stored on disk shouldn't necessarily be sent to clients, though for the sake of performance we can concede this, where/if streaming on disk encoding directly to the client. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531685#comment-13531685 ] Ted Yu commented on HBASE-7233: --- bq. A type byte or Cell interface already provides type byte: {code} /** * see {@link #KeyValue.TYPE} * @return The byte representation of the KeyValue.TYPE of this cell: one of Put, Delete, etc */ byte getTypeByte(); {code} Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531686#comment-13531686 ] Andrew Purtell commented on HBASE-7233: --- [~ted_yu] A type byte for the tag, not the KeyValue Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529729#comment-13529729 ] Matt Corgan commented on HBASE-7233: {quote}I don't follow unless you are saying I should just use COS in place of Encoder{quote}argh, i guess i don't have a better solution. Was thinking encoder implementations implement the CellOutputStream, but the exceptions complicate it. Would it be too weird to have CellOutputStream extend Encoder, adding the IOException? {quote}I want a random seeker Interface. This looks like it has what we'd need.{quote}It's designed to be that, but might have a few more methods than hbase currently needs. After some confusion, I found the EncodedSeeker does almost everything it needs with just positionAtOrBefore(Cell key). {quote}Ok on the vints... ugh{quote} fyi - on the UVintTool, there's a method that pulls the value off an InputStream without allocating objects: UVIntTool.getInt(InputStream is). However, I think i'd recommend sticking to well-known hadoop formats for the basic RPC stuff. If people actually write high performance clients in other languages they would have to read/write these formats. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530503#comment-13530503 ] Matt Corgan commented on HBASE-7233: Sounds good to me. The IOException on CellInputStream.read() may not be ideal since it will force its way all the way up through the StoreFileScanner, StoreHeap, StoreScanner, RegionHeap, RegionScanner, etc... I haven't thought of a better suggestion though. Can change later if we think of something. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530750#comment-13530750 ] Matt Corgan commented on HBASE-7233: Sorry if I'm confusing things. Agree we need a well thought plan distinguishing between fine grained operations on individual cells through the scanners vs transferring blocks of cells to wire/disk. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530761#comment-13530761 ] stack commented on HBASE-7233: -- I was just going to apologize for confusing this issue by taking your model, changing it a few times, and then in essence coming back to the model you had in the first place. Will be back to work on this for rpc after little cp-pb detour Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 7233v6_encoder.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529644#comment-13529644 ] stack commented on HBASE-7233: -- bq. ...though now thinking it should not even have the resetToBeforeFirstEntryMethod()... I think that is right. You need to have a markable stream or some such under it to do the above. Maybe you'd have resetToBeforeFirstEntryMethod on something that implemented CellSearcher? I pulled in CellScanner and its dependency into the patch I'm to attach here. I think adding IOE to CellOutputStream and its inverse CellScanner is probably right. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529683#comment-13529683 ] Matt Corgan commented on HBASE-7233: Good stuff Stack. Some thoughts: * Move the codec package up out of io package? For readability, but also may be doing some encoding/decoding that's purely in memory at some point (memstore) * Do we need both Encoder and CellOutputStream interfaces? * You think CodecException should extend IOException? I was thinking they're separate concepts that just happen to be used together a lot. Like if we encode the memstore it would throw IOExceptions. Looks to be from the relationship between CellOutputStream and Encoder which I'm not clear on. * I saw you grabbed the CellSearcher interface from prefix-tree as well. I'm not confident that the methods in that one are best for all of hbase, but we can change them later when we figure out what should be there. Same with ReversibleCellScanner. * I saw you changed CellScanner.next() (an ambiguous word) to read() which is fine. I'd throw advance() in as a candidate - i guess you're picturing RPC decoding and i'm picturing block decoding. Not important {quote}resetToBeforeFirstEntryMethod on something that implemented CellSearcher?{quote} yep, CellSearcher {quote}Wondering in particular if Interface will work for Encoders that compress; i.e. PrefixTree.{quote} I think it will work great on the underlying DataBlockEncoders. Tricky part is figuring out how to modify the HFileDataBlockEncoderImpl to allow the streaming. Might be able to simplify that thing in the process. I wonder if it's time to ditch the separate disk/memory encoding feature as I have a feeling people don't use it. {quote}Do you you know if your vint stuff is faster than what is in hadoop in WritableUtils.vint?{quote} Speed difference is probably negligible. I made that one because it encodes only positive numbers, so you can get 255 in 1b rather than only 127. It can actually matter when writing a lot of vint indexes into a token dictionary type thing. You're using it to write array lengths which are always positive, so probably a good fit, but i originally intended for it to be hidden in the prefix-tree's black box implementation. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529711#comment-13529711 ] stack commented on HBASE-7233: -- bq. Move the codec package up out of io package? Yeah, could. Let me look. I was thinking codecs always going against stream but you make a good point if we memstore it. Will fix codecexception too. bq. Do we need both Encoder and CellOutputStream interfaces? I don't follow unless you are saying I should just use COS in place of Encoder ... I had Encoder extend COS for a while. It could work. What to do about the COS IOEs though? We'd have them bubble up through codec implementations? On CellSearcher, I grabbed it but am not using it. Will drop from patch for now. I want a random seeker Interface. This looks like it has what we'd need. I was thinking a codec could implement the Decoder or CellScanner AND CellSearcher. Would not be backed by a stream. On CellScanner#next vs #read, yeah, I changed it to #read but actually thought I'd put it back to #next. It was #read because I'd renamed CellScanner as CellInputStream to match CellOutputStream... but then went back on myself. Will fix. bq. I wonder if it's time to ditch the separate disk/memory encoding feature as I have a feeling people don't use it. Not well enough versed to say whether or which. I like idea of simplifying but at same time am afraid to touch and am more inclined to bump the hfile version and start writing new hfiles w/ new encoders keeping around the old encoding classes for reading legacy hfiles. Ok on the vints... ugh, I just noticed we have vint'ing in Bytes class copied from WritableUtils... so could get byte arrays rather than streams. Might use that. Will look around too Thanks for feedback. Yeah, I'm about rpc these times so good having differing perspectives on this stuff. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528179#comment-13528179 ] stack commented on HBASE-7233: -- [~mcorgan] I was looking for the inverse of CellOutputStream. Its not committed? I suppose its CellScanner looking at your prefix tree. Should I commit it as part of this patch to hbase-common? We should use the name CellInputStream I'd say... will make On IOE, mind pointing me at an example where you are rethrowing IOEs unchecked exceptions? You mean this? +// try { +os.write(b); +// } catch (IOException e) { +// throw new RuntimeException(e); +// } Yeah, the IOE stuff is all over the place and appreciate your trying to remove them but thinking that they are likely legit here of all places? The write should throw IOE in CellOutputStream too? Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528196#comment-13528196 ] Matt Corgan commented on HBASE-7233: Yeah - inverse of CellOutputStream is CellScanner, though now thinking it should not even have the resetToBeforeFirstEntryMethod() for cases where you can no longer access the beginning of the stream, like when scanning chunks of cells coming across the wire. CellScanner is basically taking the sequential access methods of KeyValueScanner, while CellSearcher has the random access methods. I was thinking they should be separate because you can't always do random access, like when streaming server-client. I was leaving them in the prefix-tree module in case we wanted to adapt KeyValueScanner rather than replacing it. But it looks like you need something to work with in the mean time, so I'd say using the CellScanner is a good solution. I'm not sure about the IOException on CellOutputStream.write() given that the interface also has the flush() method. Like what does flush do if write is doing IO? All of my use cases use it more like a buffer/append method where you are writing it only to memory structures. That being said, looks like java.io.OutputStream.write() throws it, so i suppose we should follow suit. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527660#comment-13527660 ] stack commented on HBASE-7233: -- [~mcorgan] Why does CellOutputStream not throw IOE when you call write or flush? Where is the CellInputStream? It does not seem to be checked in? Or CellIterator whatever it was called? Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527666#comment-13527666 ] Matt Corgan commented on HBASE-7233: Here's CellOutputStream: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hbase/cell/CellOutputStream.java I'm rethrowing IOExceptions as unchecked exceptions in the current code else they will need to be declared basically everywhere. I thought at one point there was a notion of reducing the checked exceptions, which i'm a big fan of, but I guess we haven't gone down that route yet. So yeah, flush() should throw IOException and i will stop converting them in prefix-tree module. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527331#comment-13527331 ] Matt Corgan commented on HBASE-7233: I wouldn't mind dropping the DataBlock prefix as it gets a little unwieldy in places. It's a confusing right now that there is a DataBlockEncoder which arranges bytes and an HFileDataBlockEncoder that sets up and triggers the DataBlockEncoder. There is also a NoOpDataBlockEncoder which should really be called NoOpHFileDataBlockEncoder. Could do: DataBlockEncoding - Encoding DataBlockEncoder - Encoder HFileDataBlockEncoder - HFileEncoder NoOpDataBlockEncoder - NoOpHFileEncoder HFileBlockEncodingContext - HFileEncodingContext Though the HFileEncoders are not really encoders at heart - they're just setting up the environment for the actual encoders. It's more like an HFileBlockConverter. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526937#comment-13526937 ] stack commented on HBASE-7233: -- bq. Will clients ever want a value in the mvccVersion? No. MVCC is server internals. bq. In the replacement interface, we'll want to switch from encoding a ByteBuffer of KeyValue format bytes to a streaming interface where Cells are given to the encoder individually and a flush method is called when you want the encoded byte[] spit out. Streaming Interface sounds good. Should we call these new base classes Encoder, Encoding, Context, etc. i.e. drop the DataBlock prefix. Hopefully DataBlockEncoder could inherit Encoder. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511216#comment-13511216 ] Andrew Purtell commented on HBASE-7233: --- Perhaps the title of this JIRA should be shortened to simply Serializing KeyValues. Using any of protobufs, Avro, or Thrift for marshalling/unmarshalling the KeyValue is unlikely to be viable, lots of object creation churn, small copies, this will kill performance. However sending a protobuf encoded prologue to a stream of KVs to a client makes sense. I like the idea of KeyValue encoder. I also like the idea of negotiating KeyValue encoder selection at connection setup time. Beyond RPC, I've been looking at extending KeyValue to add tags as described in HBASE-6222. What I have is a transitional approach. No matter what else happens here, if KeyValue could be a versioned serialization that would be great, we could introduce tags without overloading existing fields in ugly ways (e.g. writing a negative value length to indicate the presence of tags). Or, without storing tags physically distinct from their KVs in a separate shadow column. I have implementations that do both, the latter has some undesirable cost as you might imagine. Versioning KeyValue is tricky if we must be backwards compatible with existing data, if migration does not involve a HFile rewrite step. How controversial is this? Serializing KeyValues when passing them over RPC Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511220#comment-13511220 ] stack commented on HBASE-7233: -- bq. we can make a KEY_VALUE encoder that serializes cells in the current wire format which is pretty simple for other languages to parse. it can be a slightly more performant fallback than per-field protocol buffers So, set a pb header and then write out lengthbytearray as we have now after we send the pb. It won't be evolvable, right? Unless we put a 'version' in the pb header or client I suppose could say what version of this it wants and server would accomodate? Serializing KeyValues when passing them over RPC Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511233#comment-13511233 ] stack commented on HBASE-7233: -- bq. I like the idea of KeyValue encoder. It'd write lengthbytearraylengthbytearray and the byte array would be the backing array of a KV? The format version would be in the pb preamble. Client would volunteer what it could digest. We'd package the kv appropriately... version1 if that was what they asked for. If they asked for version2, they'd get Andrew's tags if any specified? A step above this would be a datablock encoder for sending lots of KVs in a compact form. bq. How controversial is this? Rewriting all hfiles? Pretty controversial I'd say. Maybe you were talking about how tricky versioning KV is? Changed title of issue. Moved its original intent, removing Writable from KV to HBASE-7289 Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511254#comment-13511254 ] Andrew Purtell commented on HBASE-7233: --- bq. We'd package the kv appropriately... version1 if that was what they asked for. If they asked for version2, they'd get Andrew's tags if any specified? On disk encoding. The tags should be serialized with the KV, inline, so can be read with the KV data in the same read op. What I'm doing now, for backwards compatibility, is write the value length as negative integer to flag the presence of tags and store the tags pretended to user data as part of the value section of the KV. It's ugly. Or, as mentioned, I store tags distinct from their associated KVs as KVs in a shadow column family. Especially when you up Blockcache pressure you can see a significant latency penalty on gets for the latter. Putting tags inline seems wise. How to get them in? Or, what about future evolution of KV? I would really prefer not to double the number of kV types just to say foo with tags. And then double again for foo with tags and bar. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13512006#comment-13512006 ] Matt Corgan commented on HBASE-7233: doh - looks like i added an accidental quote tag after We could put a version in the PB header. so the remaining quotes are all inverted. I don't have permission to edit it. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13512012#comment-13512012 ] Lars Hofhansl commented on HBASE-7233: -- I fixed it. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13512052#comment-13512052 ] stack commented on HBASE-7233: -- [~andrew.purt...@gmail.com] Lets make it so KV is evolvable else lets go home! Has to be backward compatible though -- yeah. Can you not leverage the hfile version and if older, transform old to new style blocks? (Sorry if that a dumb idea. Did you look at overriding the key type to add in 'version' on the top few bits? Hmm... that is probably no good because you need to be able to find the type in the middle of the byte array ... ) bq. ...and store the tags pretended to user data as part of the value section of the KV. Ugh. Yeah, needs to be inline. So, we can say that KV is going to evolve so we need to just deal. [~mcorgan] We can't do pb kvs to put them into an hfile. Sorry if you got that impression. Would be just way too slow. I think a new KV/Cell format would require a new encoder, one that could send all in the new format. Clients would ask for the new encoder format only if they knew how to decode. Chatting w/ Todd, he had some good suggestions. I tried on him my concern that we would be putting ourselves in a ghetto if we are not spitting a well-known serialization like avro or thrift out the front door. He made Andrew's above argument that can't do prefixtree like compressions w/ thrift/avro and that a client that goes natively against hbase is already an undertaking keeping cache of regions etc., so not too much to ask it be able to do at least a basic data block encoding/decoding. Rather than KVs, because they are too atomic an entity, we should probably send datablocks after we send a pb header (as per Matt). The most basic would serialize kvs as we do now (as per Matt). Other interesting suggestions were sending the data first, before we send the pb header describing its content w/ say a DATAlength prefix so client accumulates the data and then reads the pb header to figure which encoder to use on it. So, at its base, our RPC becomes sending of DATAlength and PBUCserialized delimited pb. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526041#comment-13526041 ] stack commented on HBASE-7233: -- Looking at DataBlockEncoder, it has KeyValue and mvcc pollution. Its hfile origins are showing through. I'd think that we'd want a more basic Interface than this, a DataBlockEncoder that does Cells. Looking at pulling out the more basic Interface, it is a bit of work. I'm thinking that we try and get something going w/ DBE as it is and then come along later to do clean up after. It'll help us figure what in the current DBE is needed putting Cells on the wire. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526071#comment-13526071 ] Matt Corgan commented on HBASE-7233: Will clients ever want a value in the mvccVersion? We can probably nullify that when encoding for the client, so maybe the includesMemstoreTS parameter is necessary? In the replacement interface, we'll want to switch from encoding a ByteBuffer of KeyValue format bytes to a streaming interface where Cells are given to the encoder individually and a flush method is called when you want the encoded byte[] spit out. We should probably split the Encoder, Decoder, and Seeker interfaces as well. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510732#comment-13510732 ] Todd Lipcon commented on HBASE-7233: For the RPC transport, I'd vote that we reuse some of the block encoder type stuff that we've got in HFile. That way we get prefix compression on the transport of a list of KVs within RPC, which should improve performance. Serializing KeyValues when passing them over RPC Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510802#comment-13510802 ] Matt Corgan commented on HBASE-7233: Most of the ProtocolBuffer uses are not performance critical and PB gives great flexibility and a well-known paradigm, but sending big chunks of Cells over the wire as fast as possible in a long scan is worth a special case i'd say. Using the DataBlockEncoding stuff might consume roughly the same cpu as PB encoding on the server, but will save a ton of network bandwith for many tables and would be much easier for the client to decode. Serializing KeyValues when passing them over RPC Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510838#comment-13510838 ] stack commented on HBASE-7233: -- As I see it then, we'll send a pb Result and then on the wire, it'll be directly followed by an encoded block of KVs. The Result will describe the block that is coming immediately after. Need to do same for Mutation sending in the data. Hopefully, can doctor the rpc so I can get better access to the channel. Currently we are composing the response in a bytebuffer that we give to a WritableByteChannel (this is after pb has done similar when we build the messages). The composing of the response in a bytebuffer is a known temporary stopgap while moving to pb but we'll need to undo it before we ship (except when doing secure connection.. there we need to sasl wrap the byte array response). Let me finish the baseline case where we do pure pb throughout. Then will have a go at trying to send a follow-along encoded block. Serializing KeyValues when passing them over RPC Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511156#comment-13511156 ] stack commented on HBASE-7233: -- Yeah, will have to keep versions on datablockencoding. Clients other than hbase clients will be pretty hosed; if they are doing pure pb, hbase will be dog slow marshaling and unmarshaling, and if they want to go faster, they'll have to implement datablockencoding in whatever their language. Looking, avro would let us pass schema independent of data -- say at connection setup -- and because schema is external, could have tight on the wire representation. It lets you stream too it seems (haven't looked in code). Thrift supposedly too. Serializing KeyValues when passing them over RPC Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511171#comment-13511171 ] Lars Hofhansl commented on HBASE-7233: -- bq. Yeah, will have to keep versions on datablockencoding. Will that be enough to have old clients talk to new server (or vice versa)? That's what Writable did, and it did not work so well. Client and Server have pre-negotiate what they understand? Serializing KeyValues when passing them over RPC Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511183#comment-13511183 ] stack commented on HBASE-7233: -- bq. Will that be enough to have old clients talk to new server (or vice versa)? Should have said, new server would also have to be able to do the old datablockencoding formats too -- whatever the client proffered -- or else fall back to lowest common denominator pb all the time. Serializing KeyValues when passing them over RPC Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511196#comment-13511196 ] Matt Corgan commented on HBASE-7233: few thoughts: - we can make a KEY_VALUE encoder that serializes cells in the current wire format which is pretty simple for other languages to parse. it can be a slightly more performant fallback than per-field protocol buffers - encoders will have to be backwards compatible for a while on the server anyway because people have lots of hfiles encoded with them - encoders could have versions, but they are also pretty intricate, so any changes might merit a whole new encoder like FAST_DIFF2 - the client could pass a short list of encoder options in decending order of preference like FAST_DIFF2, KEY_VALUE, PB, where PB is the forever-supported fallback I'm a little skeptical that this will be the last client hbase ever supports. If something really major changes, we could make a whole new client and the server could translate things to support the old client. Serializing KeyValues when passing them over RPC Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Attachments: 7233.txt, 7233-v2.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira