[jira] [Commented] (HBASE-7233) Serializing KeyValues over RPC

2013-03-14 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602394#comment-13602394
 ] 

ramkrishna.s.vasudevan commented on HBASE-7233:
---

bq.Tags are a little different than our current fields because there are 
multiple per Cell
The above comment is from Matt.
So basically we say we will have multiple tags for a Cell.  
Which means cell which internally is now a represenation of a KV will have more 
than one additional attributes added to it (which is Tags) and one among them 
will be an ACL tag, visibility tag etc.
So now how will we say which tag to see if i want to know only the Visibility 
part of the Cell?  
I could see an tagIterator() api added that iterates thro the tags, so is it 
like every time iterate to find out which is my Visisbility tag.
Will there be a mechanism which says visibility tag should be the first tag or 
second .something like that?

 Serializing KeyValues over RPC
 --

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.95.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.
 This issue wandered and became general discussion of KeyValue serialization, 
 in particular, how to pass lots of KeyValues across rpc.  It was noticed that 
 what we were passing over the wire for KeyValues was not protobuf'd KeyValues 
 but the old serialization which assumes the KeyValue version 1 format.  After 
 a bunch of good discussion working out rpc formats, was decided to close this 
 issue in favor of more specific issues: see summary at 
 https://issues.apache.org/jira/browse/HBASE-7233?focusedCommentId=13573259page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13573259

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues over RPC

2013-02-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583456#comment-13583456
 ] 

stack commented on HBASE-7233:
--

I made HBASE-7898 for passing Cells over RPC and HBASE-7897 for adding tags to 
Cells.  Let me close this issue now as won't fix.  I believe items raised here 
now have dedicated jiras.

 Serializing KeyValues over RPC
 --

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.
 This issue wandered and became general discussion of KeyValue serialization, 
 in particular, how to pass lots of KeyValues across rpc.  It was noticed that 
 what we were passing over the wire for KeyValues was not protobuf'd KeyValues 
 but the old serialization which assumes the KeyValue version 1 format.  After 
 a bunch of good discussion working out rpc formats, was decided to close this 
 issue in favor of more specific issues: see summary at 
 https://issues.apache.org/jira/browse/HBASE-7233?focusedCommentId=13573259page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13573259

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573744#comment-13573744
 ] 

stack commented on HBASE-7233:
--

[~mcorgan] Yeah, I think that the way to go.  Client can send over that it 
favors v2 but that it can do v1 too... Server will do to the best of its 
abilities.  This won't help [~andrew.purt...@gmail.com] if he wants to refer to 
tags via the Cell Interface?  He'll have to cast to KV2 which will 'break'?  
So, wondering if we should add to Cell Interface now support for tags?  Only 
other feature we have ever talked of is adding a mvcc into the key but 
Interface already has that so a shift in how it is implemented should have no 
effect on the Cell Interface.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-07 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573756#comment-13573756
 ] 

Andrew Purtell commented on HBASE-7233:
---

If versioning the codecs like BasicCellEncoderV1, then could we have accessors 
for tags in the Cell interface but BasicCellEncoderV1 would throw 
UnsupportedOperationException, while a BasicCellEncoderV2 would support it? 
And/or a method in the interface that a user can interrogate for capabilities, 
i.e. can do tags or not?

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573757#comment-13573757
 ] 

stack commented on HBASE-7233:
--

[~andrew.purt...@gmail.com] I was thinking would just return no tags in v1 
rather than unsupported?

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-07 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573760#comment-13573760
 ] 

Andrew Purtell commented on HBASE-7233:
---

bq. I was thinking would just return no tags in v1 rather than unsupported?

Sure, easy.

Let's at least get the placeholders in. Both HBASE-7662 and HBASE-7663 would 
need efficiently stored and accessed tags in KVs, and [~enis] was also talking 
about using KV tags for holding metadata for grouping rows recently if I recall 
correctly. It would be even better if there was a BasicCellEncoderV2 that 
could actually store and retrieve tags, at least in unit tests, even if not 
baked enough to actually use until a later release. Something to build on.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-07 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573784#comment-13573784
 ] 

Matt Corgan commented on HBASE-7233:


{quote}So, wondering if we should add to Cell interface now support for 
tags?{quote}Guess I'm confused why we need to add it to the Cell interface now 
since the reason for the versioning is to enable us to add it later.  It would 
help future-proof the KeyValueEncoderV1?  I think Andy has found a way to work 
the tags into the current KeyValue serialization, so might not even need a V2 
for that.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-07 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573823#comment-13573823
 ] 

Andrew Purtell commented on HBASE-7233:
---

Yeah I guess there is a little bit of confusion here. I was thinking Cell can 
support getting tag data, and the encoders might not support it (v1) or could 
(v2).

Should tags be a concern of Cell or KV? 

As maybe an interesting consideration, I only need tags in on disk 
representation and actually should not send the ones I'd be working with over 
the wire to the client.

bq. I think Andy has found a way to work the tags into the current KeyValue 
serialization

I did, but it is ugly IMHO: I store the value length as negative, prepend 
delimited tag data to the user value data, and parse the tags into in-memory 
metadata and fix up offsets on KV instantiation. Do we actually want this? If 
so, then I guess we can have tagged KVs mixed with old KVs in a backwards 
compatible way.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573883#comment-13573883
 ] 

stack commented on HBASE-7233:
--

Tags should be concern of Cell else you will have to cast to KV2 before getting 
them.

I'd think we'd add tag support to Cell so we don't have to change it later 
(that'd be painful, wouldn't it?)


 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-07 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574120#comment-13574120
 ] 

Matt Corgan commented on HBASE-7233:


I don't see a big problem adding them to the interface.  Current encoders may 
ignore them when writing a file (which would be bad for security), but future 
or modified encoders could add support for them.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573189#comment-13573189
 ] 

Ted Yu commented on HBASE-7233:
---

Fro KeyValueDecoder:
{code}
+  public boolean next() {
+if (!this.hasNext) return !this.hasNext;
{code}
I think this.hasNext should be returned.

For TestBasicCellCodec:
{code}
+  public void testOne() throws IOException {
...
+  public void testThree() throws IOException {
{code}
Would testOneKeyValue(), testThreeKeyValue() be better names ?
Similar comment for TestCellMessageCodec.testOne()

For ProtobufUtil.java:
{code}
-builder.addKeyValue(toKeyValue(c));
+builder.addKeyValue(toCell(c));
{code}
It would be nice if the method name for builder can be changed to addCell().

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573204#comment-13573204
 ] 

stack commented on HBASE-7233:
--

Thanks for the comments.  First one is good.  Was sort of looking for more 
high-level commentary on whether this a good direction or not.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573210#comment-13573210
 ] 

Ted Yu commented on HBASE-7233:
---

{code}
+public class BasicCellDecoder implements CellScanner {
+public class CellMessageDecoder implements CellScanner {
+public class KeyValueDecoder implements CellScanner {
{code}
The decoders all depend on InputStream#available(). It would be nice if class 
javadoc is added explaining the context where each of them would be used.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573220#comment-13573220
 ] 

stack commented on HBASE-7233:
--

Should have said more plainly that this is experimental/prototyping code for 
ideas mentioned in cited google doc and mentioned elsewhere over in rpc spec 
issue.  Patches are posted for high level does this look right feedback, not 
javadoc could be better or method naming suggestions.  Thanks.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573232#comment-13573232
 ] 

Ted Yu commented on HBASE-7233:
---

I read the google doc linked above. I think the patch reflects what the doc 
describes.
Meaning the direction is good.

RPC spec mentions EncodedDataBlock in several places. I am not sure the spec 
has been updated.
Will go over the spec and patch in HBASE-7533 tomorrow so that I can gain 
better understanding of these two JIRAs.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573241#comment-13573241
 ] 

stack commented on HBASE-7233:
--

[~ted_yu] Beware it is a work in progress

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573258#comment-13573258
 ] 

Matt Corgan commented on HBASE-7233:


I like it.  Nice and simple.

What are your thoughts on BasicCellEncoder vs KeyValueEncoder?  Are you 
introducing the new BasicCell format because it's easier for a non-java 
client to decode?

KeyValueEncoder appears faster in this benchmark because the serialization is 
less granular, but I think that will become irrelevant over time if we get the 
Cell interface all the way up the read path.  It's faster now because you know 
the input cell is KeyValue so can just cast.  If you don't know the input Cell 
implementation, you'd have to append each KeyValue field separately.  

Would probably be good to include BasicCellEncoder and KeyValue encoder just to 
make sure that the necessary abstractions are in place to add future encodings. 
 Later on, my guess is one of the delta-style encoders will be best for java 
client RPC.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573259#comment-13573259
 ] 

stack commented on HBASE-7233:
--

I think we should close this issue now.  Its scope has wandered and we got a 
bunch of use out of it but I think its time has come.

The original intent was removing Writable from KV which has happened over in 
another issue.  How KVs went across MapReduce when not Writable came up and 
that got fixed out in other issues also.

We then did a bunch of the back and forth on how to serialize KVs particularly 
across RPC.

Most of the discussion has been captured here:

https://docs.google.com/document/d/1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw/edit#

...and will be realized in code over in the rpc revamp issue, HBASE-7533 
(patches added here will reappear in that issue)

Regards Andrew's request regards HBASE-6222 above on being able to pass a 
KeyValue version2, a v2 should make
it across the rpc and even into hfiles (after doing some work on 
EncodedDataBlocks so they sling Cells instead
of KVs) but what is required will not happen in this issue.

The client and server will need to all be moved to reference the Cell Interface 
rather than KV1 as they currently do.  Only then could a KV2 traverse the 
client and server.  That is work to do (As Lars Hofhansl's found out, doing 
this often makes for speedups since we
are often realizing KVs when all we need is a piece).  Lets take up the effort 
to change the servers to be Cell based
rather than KV elsehwere.

The EncodedDataBlocks revamp should happen elsewhere too.

Neither of the above should hold up 0.96 release (A 0.96 client should be able 
to talk to
a future server that can do KV version 2).

Should we add anything to the Cell Interface before 0.96 ships; e.g. the 
getTagsArray, etc., Matt suggets above for Andrew's tag work?  If so, lets get 
that in.

Will close in a day or two unless objection.


 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573267#comment-13573267
 ] 

stack commented on HBASE-7233:
--

bq. What are your thoughts on BasicCellEncoder vs KeyValueEncoder? Are you 
introducing the new BasicCell format because it's easier for a non-java 
client to decode?

Thanks [~mcorgan] for review.

I think KVEncoder will be lowest common denominator.  Should include 
BasicCellEncoder too.  The CellMessageCodec was just to see.  Not worth 
including I'd say.  I should look again at BasicCellEncoder.  Might be able to 
make it a bit better.  Your point on KV codec having an unfair advantage 
currently is a indeed the case.  Will move this code over to the rpc issue 
hbase-7533.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573283#comment-13573283
 ] 

Andrew Purtell commented on HBASE-7233:
---

I'm good if we have placeholders for tags (HBASE-7448) and can build on that as 
the Cell work takes shape. The first drop for HBASE-6222 can use out of line 
storage for the ACL metadata with some slowdown. 

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573284#comment-13573284
 ] 

stack commented on HBASE-7233:
--

[~andrew.purt...@gmail.com] What about additions to Cell Interface?  The ones 
Matt suggested above?  Too soon?

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-06 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573288#comment-13573288
 ] 

Matt Corgan commented on HBASE-7233:


One idea is to version the codecs like BasicCellEncoderV1, then if we want to 
add tags we make V2.  To avoid historical version explosion we apply the normal 
policy of only supporting upgrade over a single major release.  Just delete the 
old versions after that.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233v10.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571080#comment-13571080
 ] 

stack commented on HBASE-7233:
--

Here are the last few runs encoding/decoding 100k KVs with small key and value 
(so, worst case).  I did 30 cycles each encoding then decoding so hotspot would 
cut in.  I print out the last three runs of each in below.

In short, as has been said already, we can't have pb do Cell serialization.  
Its ten times slower both encoding and decoding (I had a look w/ profiler and 
doesn't seem to be anything particularly dumb going on... encoding, its just 
the copying of byte arrays out of Cell into pb ByteString and then decoding, 
its inching over the stream reading varints and allocating the arrays to copy 
into).

{code}
13/02/04 21:57:05 INFO codec.CodecPerformance: 27 encoded count=10 in 13ms 
for encoder org.apache.hbase.codec.KeyValueEncoder@7fcebc9f
13/02/04 21:57:05 INFO codec.CodecPerformance: 28 encoded count=10 in 13ms 
for encoder org.apache.hbase.codec.KeyValueEncoder@5dc1ac46
13/02/04 21:57:05 INFO codec.CodecPerformance: 29 encoded count=10 in 13ms 
for encoder org.apache.hbase.codec.KeyValueEncoder@14718242

13/02/04 21:57:06 INFO codec.CodecPerformance: 26 decoded count=10 in 16ms 
for decoder org.apache.hbase.codec.KeyValueDecoder@962522b
13/02/04 21:57:06 INFO codec.CodecPerformance: 27 decoded count=10 in 27ms 
for decoder org.apache.hbase.codec.KeyValueDecoder@53ea0105
13/02/04 21:57:06 INFO codec.CodecPerformance: 28 decoded count=10 in 32ms 
for decoder org.apache.hbase.codec.KeyValueDecoder@25dd9891
13/02/04 21:57:06 INFO codec.CodecPerformance: 29 decoded count=10 in 16ms 
for decoder org.apache.hbase.codec.KeyValueDecoder@774b6b02

13/02/04 21:57:08 INFO codec.CodecPerformance: 27 encoded count=10 in 62ms 
for encoder org.apache.hbase.codec.BasicCellEncoder@407e75d2
13/02/04 21:57:08 INFO codec.CodecPerformance: 28 encoded count=10 in 62ms 
for encoder org.apache.hbase.codec.BasicCellEncoder@2e694f12
13/02/04 21:57:08 INFO codec.CodecPerformance: 29 encoded count=10 in 61ms 
for encoder org.apache.hbase.codec.BasicCellEncoder@4c309f9f

13/02/04 21:57:09 INFO codec.CodecPerformance: 27 decoded count=10 in 38ms 
for decoder org.apache.hbase.codec.BasicCellDecoder@76f1fad1
13/02/04 21:57:09 INFO codec.CodecPerformance: 28 decoded count=10 in 37ms 
for decoder org.apache.hbase.codec.BasicCellDecoder@5ee771f3
13/02/04 21:57:09 INFO codec.CodecPerformance: 29 decoded count=10 in 40ms 
for decoder org.apache.hbase.codec.BasicCellDecoder@1c8321c8

13/02/04 21:57:11 INFO codec.CodecPerformance: 7 decoded count=10 in 174ms 
for decoder org.apache.hbase.codec.CellMessageDecoder@64d1afd3
13/02/04 21:57:11 INFO codec.CodecPerformance: 8 decoded count=10 in 176ms 
for decoder org.apache.hbase.codec.CellMessageDecoder@4ecd200f
13/02/04 21:57:12 INFO codec.CodecPerformance: 9 decoded count=10 in 175ms 
for decoder org.apache.hbase.codec.CellMessageDecoder@151cc2a8

13/02/04 21:57:15 INFO codec.CodecPerformance: 27 decoded count=10 in 178ms 
for decoder org.apache.hbase.codec.CellMessageDecoder@4226c7da
13/02/04 21:57:15 INFO codec.CodecPerformance: 28 decoded count=10 in 177ms 
for decoder org.apache.hbase.codec.CellMessageDecoder@5083198c
13/02/04 21:57:15 INFO codec.CodecPerformance: 29 decoded count=10 in 186ms 
for decoder org.apache.hbase.codec.CellMessageDecoder@263b84ee

{code}

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571097#comment-13571097
 ] 

stack commented on HBASE-7233:
--

Updated 
https://docs.google.com/document/d/1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw/edit#

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt, 7233v9.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-02-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569821#comment-13569821
 ] 

Ted Yu commented on HBASE-7233:
---

OldSchoolKeyValueDecoder.java and CodecException.java miss license.

For OldSchoolKeyValueDecoder:
{code}
+  @Override
+  public boolean next() {
+if (!this.hasNext) return !this.hasNext;
{code}
True is returned above. Does this align with the javadoc for next() ?
{code}
+  /**
+   * Advance the scanner 1 cell.
+   * @return true if the next cell is found and getCurrentCell() will return a 
valid Cell
+   */
+  boolean next();
{code}
There seems to be dependency on HBASE-4676.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt, 7233v7.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2013-01-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546669#comment-13546669
 ] 

stack commented on HBASE-7233:
--

Some unfinished notes I've been keeping on how to pass KeyValues: 
https://docs.google.com/document/pub?id=1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531437#comment-13531437
 ] 

Andrew Purtell commented on HBASE-7233:
---

{quote}
bq. I would really prefer not to double the number of kV types just to say foo 
with tags. And then double again for foo with tags and bar.
That would be ugly, but at the same time it's difficult and maybe wasteful to 
future-proof it from every angle. Tags are already sort of a flexible 
future-proofing mechanism. Maybe tags can be added in a backwards compatible 
way to the existing encoders. I'd have to think about it for PrefixTree, 
probably punting them to a PREFIX_TREE2 encoder with some other 
additions/improvements.
{quote}

The use case I'm looking at is adding security policy information to KVs 
(HBASE-6222), could be either ACLs or visibility labels, both can be handled 
the same way. There's a 1:1 mapping, so it makes sense to store the policy 
information in the KV. This also has the nice property of reading in the ACL 
for free in the same op that reads in the KV. I'm not asking for specifically 
more than tagging KVs with this specific metadata but, given that tags could be 
easily made generic enough to support a number of other cases, I think it makes 
sense to do that. Then security is just one user of something more generally 
useful, we haven't done something fixed for security's sake only.

Adding tag support to the encoders might be the right answer. Would we still 
have the trouble of teaching KeyValue about where in the bytebuffers coming out 
of the encoder the tag data resides? Any thoughts on how we might distinguish a 
KV with tags from one without? Maybe we don't, we just have the encoder add the 
discovered tag data to the KV by way of an API that adds out of band metadata 
to the KV's in memory representation? And likewise add tags to the blocks 
beyond the KV itself if they are present?

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-13 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531619#comment-13531619
 ] 

Matt Corgan commented on HBASE-7233:


{quote}Would we still have the trouble of teaching KeyValue about where in the 
bytebuffers coming out of the encoder the tag data resides?{quote}Tags are a 
little different than our current fields because there are multiple per Cell.  
For performance sake, we may want to keep the Cell interface to these low level 
methods: getTagsArray(), getNumTags(), getTagsOffset(), getTagsLength(), and 
then have methods for parsing them like {code}Iterablebyte[] tags = 
CellTool.getTagsIterator(cell){code}.  So the requirement for the 
encoder/decoder would be to line them up in a single array: vint 
length0bytes tag0vint length1bytes tag1etc.

Behind the scenes, tags could be encoded similarly to qualifiers (speaking for 
prefix-tree)

{quote}Any thoughts on how we might distinguish a KV with tags from one 
without? {quote}Could just have Cell.getNumTags() return 0



 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531668#comment-13531668
 ] 

Andrew Purtell commented on HBASE-7233:
---

bq. For performance sake, we may want to keep the Cell interface to these low 
level methods: getTagsArray(), getNumTags(), getTagsOffset(), getTagsLength(), 
and then have methods for parsing them [an iterator]

Agreed. My hack-patch does approximately this. 

I need to study up on how Cells would be materialized into KeyValues. In the 
AccessController we wrap InternalScanners with a filter that looks at each 
KeyValue on the way out to the client and evaluates their visibility to the 
user. Somehow from the KeyValue API we'd need to get to the cell tags iterator 
to extract the ACL (or visibility tag).

A type byte or even making tags name-value pairs would avoid accumulation of 
ad-hoc means for distinguishing between them.

Tags stored on disk shouldn't necessarily be sent to clients, though for the 
sake of performance we can concede this, where/if streaming on disk encoding 
directly to the client.



 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531685#comment-13531685
 ] 

Ted Yu commented on HBASE-7233:
---

bq. A type byte or 
Cell interface already provides type byte:
{code}
  /**
   * see {@link #KeyValue.TYPE}
   * @return The byte representation of the KeyValue.TYPE of this cell: one of 
Put, Delete, etc
   */
  byte getTypeByte();
{code}

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531686#comment-13531686
 ] 

Andrew Purtell commented on HBASE-7233:
---

[~ted_yu] A type byte for the tag, not the KeyValue

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-12 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529729#comment-13529729
 ] 

Matt Corgan commented on HBASE-7233:


{quote}I don't follow unless you are saying I should just use COS in place of 
Encoder{quote}argh, i guess i don't have a better solution.  Was thinking 
encoder implementations implement the CellOutputStream, but the exceptions 
complicate it.  Would it be too weird to have CellOutputStream extend Encoder, 
adding the IOException?

{quote}I want a random seeker Interface. This looks like it has what we'd 
need.{quote}It's designed to be that, but might have a few more methods than 
hbase currently needs.  After some confusion, I found the EncodedSeeker does 
almost everything it needs with just positionAtOrBefore(Cell key).

{quote}Ok on the vints... ugh{quote}
fyi - on the UVintTool, there's a method that pulls the value off an 
InputStream without allocating objects: UVIntTool.getInt(InputStream is).  
However, I think i'd recommend sticking to well-known hadoop formats for the 
basic RPC stuff.  If people actually write high performance clients in other 
languages they would have to read/write these formats.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-12 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530503#comment-13530503
 ] 

Matt Corgan commented on HBASE-7233:


Sounds good to me.  The IOException on CellInputStream.read() may not be ideal 
since it will force its way all the way up through the StoreFileScanner, 
StoreHeap, StoreScanner, RegionHeap, RegionScanner, etc...  I haven't thought 
of a better suggestion though.  Can change later if we think of something.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-12 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530750#comment-13530750
 ] 

Matt Corgan commented on HBASE-7233:


Sorry if I'm confusing things.  Agree we need a well thought plan 
distinguishing between fine grained operations on individual cells through the 
scanners vs transferring blocks of cells to wire/disk.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530761#comment-13530761
 ] 

stack commented on HBASE-7233:
--

I was just going to apologize for confusing this issue by taking your model, 
changing it a few times, and then in essence coming back to the model you had 
in the first place.  Will be back to work on this for rpc after little cp-pb 
detour

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt, 
 7233v6_encoder.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529644#comment-13529644
 ] 

stack commented on HBASE-7233:
--

bq. ...though now thinking it should not even have the 
resetToBeforeFirstEntryMethod()...

I think that is right.  You need to have a markable stream or some such under 
it to do the above.  Maybe you'd have resetToBeforeFirstEntryMethod on 
something that implemented CellSearcher?

I pulled in CellScanner and its dependency into the patch I'm to attach here.

I think adding IOE to CellOutputStream and its inverse CellScanner is probably 
right.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-11 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529683#comment-13529683
 ] 

Matt Corgan commented on HBASE-7233:


Good stuff Stack.  Some thoughts:

* Move the codec package up out of io package?  For readability, but also may 
be doing some encoding/decoding that's purely in memory at some point (memstore)
* Do we need both Encoder and CellOutputStream interfaces?
* You think CodecException should extend IOException?  I was thinking they're 
separate concepts that just happen to be used together a lot.  Like if we 
encode the memstore it would throw IOExceptions.  Looks to be from the 
relationship between CellOutputStream and Encoder which I'm not clear on.
* I saw you grabbed the CellSearcher interface from prefix-tree as well.  I'm 
not confident that the methods in that one are best for all of hbase, but we 
can change them later when we figure out what should be there.  Same with 
ReversibleCellScanner.
* I saw you changed CellScanner.next() (an ambiguous word) to read() which is 
fine.  I'd throw advance() in as a candidate - i guess you're picturing RPC 
decoding and i'm picturing block decoding.  Not important

{quote}resetToBeforeFirstEntryMethod on something that implemented 
CellSearcher?{quote}
yep, CellSearcher

{quote}Wondering in particular if Interface will work for Encoders that 
compress; i.e. PrefixTree.{quote}
I think it will work great on the underlying DataBlockEncoders.  Tricky part is 
figuring out how to modify the HFileDataBlockEncoderImpl to allow the 
streaming.  Might be able to simplify that thing in the process.  I wonder if 
it's time to ditch the separate disk/memory encoding feature as I have a 
feeling people don't use it.

{quote}Do you you know if your vint stuff is faster than what is in hadoop in 
WritableUtils.vint?{quote}
Speed difference is probably negligible.  I made that one because it encodes 
only positive numbers, so you can get 255 in 1b rather than only 127.  It can 
actually matter when writing a lot of vint indexes into a token dictionary type 
thing.  You're using it to write array lengths which are always positive, so 
probably a good fit, but i originally intended for it to be hidden in the 
prefix-tree's black box implementation.


 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529711#comment-13529711
 ] 

stack commented on HBASE-7233:
--

bq. Move the codec package up out of io package?

Yeah, could.  Let me look.  I was thinking codecs always going against stream 
but you make a good point if we memstore it.  Will fix codecexception too.

bq. Do we need both Encoder and CellOutputStream interfaces?

I don't follow unless you are saying I should just use COS in place of Encoder 
... I had Encoder extend COS for a while.  It could work.  What to do about the 
COS IOEs though?  We'd have them bubble up through codec implementations?

On CellSearcher, I grabbed it but am not using it.  Will drop from patch for 
now.  I want a random seeker Interface.  This looks like it has what we'd need. 
 I was thinking a codec could implement the Decoder or CellScanner AND 
CellSearcher.  Would not be backed by a stream.

On CellScanner#next vs #read, yeah, I changed it to #read but actually thought 
I'd put it back to #next.  It was #read because I'd renamed CellScanner as 
CellInputStream to match CellOutputStream... but then went back on myself.  
Will fix.

bq. I wonder if it's time to ditch the separate disk/memory encoding feature as 
I have a feeling people don't use it.

Not well enough versed to say whether or which.  I like idea of simplifying but 
at same time am afraid to touch and am more inclined to bump the hfile version 
and start writing new hfiles w/ new encoders keeping around the old encoding 
classes for reading legacy hfiles.

Ok on the vints... ugh, I just noticed we have vint'ing in Bytes class 
copied from WritableUtils... so could get byte arrays rather than streams.  
Might use that.  Will look around too

Thanks for feedback.  Yeah, I'm about rpc these times so good having differing 
perspectives on this stuff.




 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528179#comment-13528179
 ] 

stack commented on HBASE-7233:
--

[~mcorgan] I was looking for the inverse of CellOutputStream.  Its not 
committed?  I suppose its CellScanner looking at your prefix tree.  Should I 
commit it as part of this patch to hbase-common?  We should use the name 
CellInputStream I'd say... will make 

On IOE, mind pointing me at an example where you are rethrowing IOEs unchecked 
exceptions?  You mean this?

+// try {
+os.write(b);
+// } catch (IOException e) {
+// throw new RuntimeException(e);
+// }

Yeah, the IOE stuff is all over the place and appreciate your trying to remove 
them but thinking that they are likely legit here of all places?  The write 
should throw IOE in CellOutputStream too?


 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-10 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528196#comment-13528196
 ] 

Matt Corgan commented on HBASE-7233:


Yeah - inverse of CellOutputStream is CellScanner, though now thinking it 
should not even have the resetToBeforeFirstEntryMethod() for cases where you 
can no longer access the beginning of the stream, like when scanning chunks of 
cells coming across the wire.  CellScanner is basically taking the sequential 
access methods of KeyValueScanner, while CellSearcher has the random access 
methods.  I was thinking they should be separate because you can't always do 
random access, like when streaming server-client.

I was leaving them in the prefix-tree module in case we wanted to adapt 
KeyValueScanner rather than replacing it.  But it looks like you need something 
to work with in the mean time, so I'd say using the CellScanner is a good 
solution.

I'm not sure about the IOException on CellOutputStream.write() given that the 
interface also has the flush() method.  Like what does flush do if write is 
doing IO?  All of my use cases use it more like a buffer/append method where 
you are writing it only to memory structures.  That being said, looks like 
java.io.OutputStream.write() throws it, so i suppose we should follow suit.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527660#comment-13527660
 ] 

stack commented on HBASE-7233:
--

[~mcorgan] Why does CellOutputStream not throw IOE when you call write or 
flush?  Where is the CellInputStream?  It does not seem to be checked in?  Or 
CellIterator whatever it was called?



 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-09 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527666#comment-13527666
 ] 

Matt Corgan commented on HBASE-7233:


Here's CellOutputStream: 
https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hbase/cell/CellOutputStream.java

I'm rethrowing IOExceptions as unchecked exceptions in the current code else 
they will need to be declared basically everywhere.  I thought at one point 
there was a notion of reducing the checked exceptions, which i'm a big fan of, 
but I guess we haven't gone down that route yet.  So yeah, flush() should throw 
IOException and i will stop converting them in prefix-tree module.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-08 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527331#comment-13527331
 ] 

Matt Corgan commented on HBASE-7233:


I wouldn't mind dropping the DataBlock prefix as it gets a little unwieldy in 
places.  It's a confusing right now that there is a DataBlockEncoder which 
arranges bytes and an HFileDataBlockEncoder that sets up and triggers the 
DataBlockEncoder.  There is also a NoOpDataBlockEncoder which should really be 
called NoOpHFileDataBlockEncoder.  Could do:

DataBlockEncoding - Encoding
DataBlockEncoder - Encoder
HFileDataBlockEncoder - HFileEncoder
NoOpDataBlockEncoder - NoOpHFileEncoder
HFileBlockEncodingContext - HFileEncodingContext

Though the HFileEncoders are not really encoders at heart - they're just 
setting up the environment for the actual encoders.  It's more like an 
HFileBlockConverter.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526937#comment-13526937
 ] 

stack commented on HBASE-7233:
--

bq. Will clients ever want a value in the mvccVersion?

No.  MVCC is server internals.

bq. In the replacement interface, we'll want to switch from encoding a 
ByteBuffer of KeyValue format bytes to a streaming interface where Cells are 
given to the encoder individually and a flush method is called when you want 
the encoded byte[] spit out. 

Streaming Interface sounds good.  Should we call these new base classes 
Encoder, Encoding, Context, etc. i.e. drop the DataBlock prefix.  Hopefully 
DataBlockEncoder could inherit Encoder.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

2012-12-06 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511216#comment-13511216
 ] 

Andrew Purtell commented on HBASE-7233:
---

Perhaps the title of this JIRA should be shortened to simply Serializing 
KeyValues.

Using any of protobufs, Avro, or Thrift for marshalling/unmarshalling the 
KeyValue is unlikely to be viable, lots of object creation churn, small copies, 
this will kill performance. However sending a protobuf encoded prologue to a 
stream of KVs to a client makes sense.

I like the idea of KeyValue encoder.

I also like the idea of negotiating KeyValue encoder selection at connection 
setup time.

Beyond RPC, I've been looking at extending KeyValue to add tags as described in 
HBASE-6222. What I have is a transitional approach. No matter what else 
happens here, if KeyValue could be a versioned serialization that would be 
great, we could introduce tags without overloading existing fields in ugly ways 
(e.g. writing a negative value length to indicate the presence of tags). Or, 
without storing tags physically distinct from their KVs in a separate shadow 
column. I have implementations that do both, the latter has some undesirable 
cost as you might imagine. Versioning KeyValue is tricky if we must be 
backwards compatible with existing data, if migration does not involve a HFile 
rewrite step. How controversial is this?

 Serializing KeyValues when passing them over RPC
 

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

2012-12-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511220#comment-13511220
 ] 

stack commented on HBASE-7233:
--

bq. we can make a KEY_VALUE encoder that serializes cells in the current wire 
format which is pretty simple for other languages to parse. it can be a 
slightly more performant fallback than per-field protocol buffers

So, set a pb header and then write out lengthbytearray as we have now after 
we send the pb.  It won't be evolvable, right?  Unless we put a 'version' in 
the pb header or client I suppose could say what version of this it wants and 
server would accomodate?

 Serializing KeyValues when passing them over RPC
 

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511233#comment-13511233
 ] 

stack commented on HBASE-7233:
--

bq. I like the idea of KeyValue encoder.

It'd write lengthbytearraylengthbytearray and the byte array would be 
the backing array of a KV?  The format version would be in the pb preamble.  
Client would volunteer what it could digest.  We'd package the kv 
appropriately... version1 if that was what they asked for.  If they asked for 
version2, they'd get Andrew's tags if any specified?

A step above this would be a datablock encoder for sending lots of KVs in a 
compact form.

bq. How controversial is this?

Rewriting all hfiles?  Pretty controversial I'd say.  Maybe you were talking 
about how tricky versioning KV is?

Changed title of issue.  Moved its original intent, removing Writable from KV 
to HBASE-7289



 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-06 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511254#comment-13511254
 ] 

Andrew Purtell commented on HBASE-7233:
---

bq. We'd package the kv appropriately... version1 if that was what they asked 
for.  If they asked for version2, they'd get Andrew's tags if any specified?

On disk encoding. The tags should be serialized with the KV, inline, so can be 
read with the KV data in the same read op. 

What I'm doing now, for backwards compatibility, is write the value length as 
negative integer to flag the presence of tags and store the tags pretended to 
user data as part of the value section of the KV. It's ugly. Or, as mentioned, 
I store tags distinct from their associated KVs as KVs in a shadow column 
family. Especially when you up Blockcache pressure you can see a significant 
latency penalty on gets for the latter. Putting tags inline seems wise. How to 
get them in? Or, what about future evolution of KV? I would really prefer not 
to double the number of kV types just to say foo with tags. And then double 
again for foo with tags and bar.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-06 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13512006#comment-13512006
 ] 

Matt Corgan commented on HBASE-7233:


doh - looks like i added an accidental quote tag after We could put a version 
in the PB header. so the remaining quotes are all inverted.  I don't have 
permission to edit it.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13512012#comment-13512012
 ] 

Lars Hofhansl commented on HBASE-7233:
--

I fixed it.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13512052#comment-13512052
 ] 

stack commented on HBASE-7233:
--

[~andrew.purt...@gmail.com] Lets make it so KV is evolvable else lets go home!  
Has to be backward compatible though -- yeah.  Can you not leverage the hfile 
version and if older, transform old to new style blocks?  (Sorry if that a dumb 
idea.  Did you look at overriding the key type to add in 'version' on the top 
few bits?  Hmm... that is probably no good because you need to be able to find 
the type in the middle of the byte array ... )

bq. ...and store the tags pretended to user data as part of the value section 
of the KV.

Ugh.  Yeah, needs to be inline.

So, we can say that KV is going to evolve so we need to just deal.

[~mcorgan] We can't do pb kvs to put them into an hfile.  Sorry if you got that 
impression.  Would be just way too slow.

I think a new KV/Cell format would require a new encoder, one that could send 
all in the new format.  Clients would ask for the new encoder format only if 
they knew how to decode.

Chatting w/ Todd, he had some good suggestions.  I tried on him my concern that 
we would be putting ourselves in a ghetto if we are not spitting a well-known 
serialization like avro or thrift out the front door.  He made Andrew's above 
argument that can't do prefixtree like compressions w/ thrift/avro and that a 
client that goes natively against hbase is already an undertaking keeping cache 
of regions etc., so not too much to ask it be able to do at least a basic data 
block encoding/decoding.

Rather than KVs, because they are too atomic an entity, we should probably send 
datablocks after we send a pb header (as per Matt).  The most basic would 
serialize kvs as we do now (as per Matt).

Other interesting suggestions were sending the data first, before we send the 
pb header describing its content w/ say a DATAlength prefix so client 
accumulates the data and then reads the pb header to figure which encoder to 
use on it.  So, at its base, our RPC becomes sending of DATAlength and 
PBUCserialized delimited pb.



 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526041#comment-13526041
 ] 

stack commented on HBASE-7233:
--

Looking at DataBlockEncoder, it has KeyValue and mvcc pollution.  Its hfile 
origins are showing through.  I'd think that we'd want a more basic Interface 
than this, a DataBlockEncoder that does Cells.  Looking at pulling out the more 
basic Interface, it is a bit of work.  I'm thinking that we try and get 
something going w/ DBE as it is and then come along later to do clean up after. 
 It'll help us figure what in the current DBE is needed putting Cells on the 
wire.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-06 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526071#comment-13526071
 ] 

Matt Corgan commented on HBASE-7233:


Will clients ever want a value in the mvccVersion?  We can probably nullify 
that when encoding for the client, so maybe the includesMemstoreTS parameter is 
necessary?

In the replacement interface, we'll want to switch from encoding a ByteBuffer 
of KeyValue format bytes to a streaming interface where Cells are given to the 
encoder individually and a flush method is called when you want the encoded 
byte[] spit out.  We should probably split the Encoder, Decoder, and Seeker 
interfaces as well.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

2012-12-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510732#comment-13510732
 ] 

Todd Lipcon commented on HBASE-7233:


For the RPC transport, I'd vote that we reuse some of the block encoder type 
stuff that we've got in HFile. That way we get prefix compression on the 
transport of a list of KVs within RPC, which should improve performance.

 Serializing KeyValues when passing them over RPC
 

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

2012-12-05 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510802#comment-13510802
 ] 

Matt Corgan commented on HBASE-7233:


Most of the ProtocolBuffer uses are not performance critical and PB gives great 
flexibility and a well-known paradigm, but sending big chunks of Cells over the 
wire as fast as possible in a long scan is worth a special case i'd say.  Using 
the DataBlockEncoding stuff might consume roughly the same cpu as PB encoding 
on the server, but will save a ton of network bandwith for many tables and 
would be much easier for the client to decode.

 Serializing KeyValues when passing them over RPC
 

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

2012-12-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510838#comment-13510838
 ] 

stack commented on HBASE-7233:
--

As I see it then, we'll send a pb Result and then on the wire, it'll be 
directly followed by an encoded block of KVs.  The Result will describe the 
block that is coming immediately after.  Need to do same for Mutation sending 
in the data.

Hopefully, can doctor the rpc so I can get better access to the channel.  
Currently we are composing the response in a bytebuffer that we give to a 
WritableByteChannel (this is after pb has done similar when we build the 
messages).  The composing of the response in a bytebuffer is a known temporary 
stopgap while moving to pb but we'll need to undo it before we ship (except 
when doing secure connection.. there we need to sasl wrap the byte array 
response).

Let me finish the baseline case where we do pure pb throughout.  Then will have 
a go at trying to send a follow-along encoded block.

 Serializing KeyValues when passing them over RPC
 

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

2012-12-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511156#comment-13511156
 ] 

stack commented on HBASE-7233:
--

Yeah, will have to keep versions on datablockencoding.

Clients other than hbase clients will be pretty hosed; if they are doing pure 
pb, hbase will be dog slow marshaling and unmarshaling, and if they want to go 
faster, they'll have to implement datablockencoding in whatever their language.

Looking, avro would let us pass schema independent of data -- say at connection 
setup -- and because schema is external, could have tight on the wire 
representation.  It lets you stream too it seems (haven't looked in code).  
Thrift supposedly too.

 Serializing KeyValues when passing them over RPC
 

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

2012-12-05 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511171#comment-13511171
 ] 

Lars Hofhansl commented on HBASE-7233:
--

bq. Yeah, will have to keep versions on datablockencoding.
Will that be enough to have old clients talk to new server (or vice versa)? 
That's what Writable did, and it did not work so well. Client and Server have 
pre-negotiate what they understand?


 Serializing KeyValues when passing them over RPC
 

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

2012-12-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511183#comment-13511183
 ] 

stack commented on HBASE-7233:
--

bq. Will that be enough to have old clients talk to new server (or vice versa)? 

Should have said, new server would also have to be able to do the old 
datablockencoding formats too -- whatever the client proffered -- or else fall 
back to lowest common denominator pb all the time.

 Serializing KeyValues when passing them over RPC
 

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

2012-12-05 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511196#comment-13511196
 ] 

Matt Corgan commented on HBASE-7233:


few thoughts:
- we can make a KEY_VALUE encoder that serializes cells in the current wire 
format which is pretty simple for other languages to parse.  it can be a 
slightly more performant fallback than per-field protocol buffers
- encoders will have to be backwards compatible for a while on the server 
anyway because people have lots of hfiles encoded with them
- encoders could have versions, but they are also pretty intricate, so any 
changes might merit a whole new encoder like FAST_DIFF2
- the client could pass a short list of encoder options in decending order of 
preference like FAST_DIFF2, KEY_VALUE, PB, where PB is the forever-supported 
fallback

I'm a little skeptical that this will be the last client hbase ever supports.  
If something really major changes, we could make a whole new client and the 
server could translate things to support the old client.

 Serializing KeyValues when passing them over RPC
 

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 7233.txt, 7233-v2.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira