[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715979#comment-13715979
 ] 

Hadoop QA commented on HBASE-8693:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12593621/0001-HBASE-8693-Extensible-data-types-API.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 28 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6432//console

This message is automatically generated.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, 
> KijiFormattedEntityId.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-19 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713834#comment-13713834
 ] 

Nick Dimiduk commented on HBASE-8693:
-

This {{HDataType}} interface and the two codecs upon which the implementations 
rely is not schema management for HBase. {{HDataType}} can be used to manage 
encoding values into rowkeys, column qualifiers, or values. Use an instance of 
{{Struct}}, or don't, in any of those contexts. The use of {{Struct}} in the 
order-sensitive context has driven more design thought, but it generates a 
{{byte[]}} wherever it's used. Would an example of an Avro, Thrift, or 
Protobuff {{HDataType}} implementation help to drive this idea home?

My trouble with using the word "schema" for key-values is that context is too 
narrow a scope. Being able to consistently read a value out of a cell does not 
tell me what the schema of the database is. HBase provides basic *table* 
definition management but not *data* definition management, the effective 
meaning of schema. Pheonix and Kiji both provide a layer of schema management 
on top of HBase. Through them you define the logical layout of data in tables, 
and you abandon to them how that data is physically arranged and encoded. 
{{HDataType}} provides an API with which its user can control how data is 
physically arranged and encoded. Its user is still left to manage the logical 
layout and its meaning to their application for themselves.

This patch is not schema management. It provides a common set of primitives 
that other applications can consume -- be them user applications developed 
directly against HBase or Phoenix or Kiji themselves. The consumers I've always 
had in mind have always been myself and application developers like me, Hive, 
Pig, and Phoenix. The primary benefit being that all those applications gain 
some level of interoperability through data in HBase. That I was able to read 
Kiji's avdl file and in an afternoon understand how HDataType could be used to 
make it's implementation simpler and more extensible is validation of utility.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, 
> KijiFormattedEntityId.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713153#comment-13713153
 ] 

stack commented on HBASE-8693:
--

IIRC, their avro idl is for all but the description of the rowkey.  When they 
talk about rowkey 'schema', it is allowed that it cannot evolve for reasons 
discussed above.  Adding to the right of a rowkey should be fine though.  Ditto 
when serializing column qualifiers.

High in this issue you raise: "Do you think we should have a similar kind of 
dichotomy for encoding into order-preserving context vs non-order-preserving 
context? My initial thinking is probably not (due to additional API surface 
area), but I want to have the conversation."

You allow that there are two contexts (and indeed Matteo asks for clarification 
on this) -- one where there is no way around it but you need to rewrite the 
data if you want to refer to it using a different struct/'schema'; e.g. a 
rowkey (caveat adding fields to the right) -- and then there are the contexts 
where you should be able to evolve the content; e.g. cell content and even to a 
higher level where you might impose a schema made of multiple column content 
(or full row), and so on.

This seems like a good split.  In the cell context, the area where you would 
like to be able to evolve, sort order preservation is not required.  In the 
simple case, an int16 type, you probably don't need versioning either?  Its 
serialization is unlikely to change but you might want version even these 
primitive types just in case?  If a compound type in a cell, you would like to 
be able to evolve it; to add fields, etc.  So you could add a version to 
structs here?  (but why would user use this lib over pb in this case?)  Now you 
bleed over into higher level issues; schema and its follow-ons, where to store 
it and how to evolve, etc. (Matteo's concerns).

I suppose we are fine given you have 'schema' and 'schema evolution' as 
out-of-scope in your answer to Matteo.  We should be clear that these problems 
remain as to-be-solved (or solved by others -- see kiji) after this patch is 
done and be sure folks don't get the wrong impression.  Just saying.

On the adding fields to the right of your struct, where you have the 
application use the right struct version, pity your lib couldn't do that for 
the app.  PB has a lead-off serialized length which saves it reading off the 
end of the record.  You can't do that because you'll mess up your ordering.  
You can't lead the record with a version since that will also mess your sort 
order (as you say above).  A buffer where you check available would be 
expensive...





> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, 
> KijiFormattedEntityId.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-18 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713136#comment-13713136
 ] 

Nick Dimiduk commented on HBASE-8693:
-

bq. again, I think that your focus at the moment is more on the key side... and 
my guess is that the struct is fine for that. but this jira is "serialization 
primitives" without a "row-keys" in front... so I assume you plan to use this 
stuff also for the cell values, and from what I said above... I don't see an 
easy way to evolve my cell data, without rewrite every time or doing "manual" 
mappings for each struct version.

You're right, this implementation is too simplistic for storing complex 
entities in a Cell. You can do it, but you'll be a bit stuck as there's no 
concept schema identification or of evolution. I can see how the title can be 
miss-leading. OrderedBytes and HDataType are no replacement for application use 
of \{protobuf,avro,thrift\}, particularly in the "entity-centric modeling" 
approach with fat key-values.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, 
> KijiFormattedEntityId.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-18 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713076#comment-13713076
 ] 

Matteo Bertozzi commented on HBASE-8693:


Thanks for keeping following up on my out of scope questions.

again, I think that I'm focusing more on the cell-value side instead of the key 
part which will be the one that will have the benefit from the ordered byte 
stuff and will probably have more restriction on the evolution since this stuff 
is client side only and you've to deal with the raw byte sorting of hbase.

{quote}It's quite out of scope for my purposes, but I'm curious what you think 
about the future direction with schema. I think the Phoenix and Kiji folk will 
have some good insights.{quote}

(I'll talk only about cell-values here, so I'm not interested in the ordered 
stuff in this case)
I want to write my app today with this library.
I'll start off using a Struct, and it's ok until I have to add/remove a field.
so.. I can add a version/schema id.. but now I have the problem that I have to 
keep all the schemas and then project to the schema that I want to use.

Example:
- get row0 -> cell with schema 1
- get row1 -> cell with schema 2
- get row2 -> cell with schema 3
- Now the user/api have to handle this 3 different rows and project to a user 
provided schema to get out something useful to the user...

In this case, you have to store all the schemas and you've to provide a mapping 
for each schema to the one that the user wants.

The other approach, more protobuf like is each field has an id that must be 
unique. on read you provide your "read schema" and you load only the field 
present in the "read schema".
note that this can also work with just with the api similar to what you have 
"getField(field_id)" where the id is the unique id and not the index.

again, I think that your focus at the moment is more on the key side... and my 
guess is that the struct is fine for that.
but this jira is "serialization primitives" without a "row-keys" in front... so 
I assume you plan to use this stuff also for the cell values, and from what I 
said above... I don't see an easy way to evolve my cell data, without rewrite 
every time or doing "manual" mappings for each struct version.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, 
> KijiFormattedEntityId.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713028#comment-13713028
 ] 

Hadoop QA commented on HBASE-8693:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12593082/KijiFormattedEntityId.java
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6404//console

This message is automatically generated.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, 
> KijiFormattedEntityId.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-18 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713011#comment-13713011
 ] 

Nick Dimiduk commented on HBASE-8693:
-

To be fair, sort order also is of concern in column names. My choice of the 
word "schema" was unfortunate in my previous comment. I should have said "no 
composite structure is written into the concatenation." Because HBase's only 
native data type is byte[], encodings are necessary for any application value 
other than byte[], wherever it hits a rowkey, qualifier, or value.

It's quite out of scope for my purposes, but I'm curious what you think about 
the future direction with schema. I think the Phoenix and Kiji folk will have 
some good insights.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-17 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712084#comment-13712084
 ] 

Matteo Bertozzi commented on HBASE-8693:


above you talk about the sort order, and I guess just about the key.
but when I talk about data or schema I refer to the cell value, not the key.
For the key I think that the fixed or append-only, as you pointed out is good 
enough.

again, maybe I'm out of scope.. but do you see those classes used only to 
encode the key? e.g. the struct mention explicitly the key in the comment. I 
probably see this as more generic key/value serialization, knowing about the 
future direction with the schema.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-17 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711775#comment-13711775
 ] 

Nick Dimiduk commented on HBASE-8693:
-

bq. Ok, make sense with this limited scope (no schema) have a fixed list of 
fields.

Right. In this implementation Struct is a simple concatenation of fields. No 
schema information is written into that concatenation because to do so will 
mess with sort order. Struct is merely API convenience. Now, the field 
encodings implemented in OrderedBytes include a header byte which is currently 
used to identify the type of encoded field that follows. The full space of 256 
available bit patterns in that header bit is not consumed by the current 
implementation. I've been thinking about extending that header byte to include 
some version bits at the very beginning. That would enable evolution of the 
individual field encodings (say, if you later want to re-implement blob-mid, 
for example). This doesn't address the user-level logical structure of a Struct 
data type, only evolution of the OrderedBytes codec.

bq. My main concern is: I start use 96 with this struct encoding... is fixed so 
I can't add fields.. so I work around it adding a version number in front of 
the struct and then I do the switch for v1, v2, v3 with all the fixed struct 
that I know...

Prepending a version number to the Struct's members will impact sort order. 
Struct definition is fixed in that you can't prepend or interpose a new field 
in the middle of an existing encoded value. You're free to append fields. 
Appending a field would look like the following:

 # application defines Struct v0 with members [A,B,C]
 # application writes lots of data
 # application changes, Struct v1 becomes [A,B,C,D,E]
 # application writes lots more data

At step 3, the application now needs to become version aware. Because the 
fields of v0 are a subset of v1, the application can use the definition of 
struct v1 with the following safe-guards. (1) Any place where v0 was used, it 
now needs to be sure to check for end-of-buffer and skip over the two new 
elements. (2) Anywhere v1 is used, mindful of truncated records and be prepared 
to only receive the v0 fields. Maybe the API defined around Struct can be 
improved to support these needs?

Records of v0 and v1 can be intermixed, ie, as rowkeys in the same table. 
According to the documented sort semantics, they'll sort "left-to-right and 
depth-first". Meaning, they'll sort first according to v0 values and then 
within that group, by v1 values.

We leave all of this up to user applications today, so this change management 
isn't mitigated. Changing a compound rowkey today requires rewriting data (or 
duplication into a new table). A smarter struct encoding, one that's able to 
preserve the sorted semantics I've described but that can also track more 
sophisticated schama change would be very useful indeed -- I don't think it 
exists.

Prepending a version field to a Struct will change the sorting behavior; v0 
will sort before v1, &c. IMHO, this is a less flexible migration strategy than 
the append behavior described above. It's also perfectly valid, and the user of 
the Struct API is free to do so in their own application. In that case, the 
application is still version-aware. Instead of being cautious about consuming 
the potentially truncated records, instead it's executing a scan for each 
version.

bq. as you said, data evolution is out of the scope. so if you consider this 
patch just as a "smarter" alternative to the Bytes encoding.

HBASE-8201 is a smarter alternative to Bytes and this ticket adds some 
higher-level APIs for manipulating them. In short, yes, schema definition and 
evolution is out of scope.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-17 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711708#comment-13711708
 ] 

Matteo Bertozzi commented on HBASE-8693:


{quote}Struct is a programatic data structure, not a tool for schema 
management. It has no concept of "upgrade Struct Foo, version 1 to Foo version 
2 by adding a new field in the middle here and changing the last one from X to 
Y." It's a convenience for manipulating complex byte[] structures. Schema 
management may become of concern for HBase, but that's out of scope.{quote}
Ok, make sense with this limited scope (no schema) have a fixed list of fields.

My main concern is: I start use 96 with this struct encoding... is fixed so I 
can't add  fields.. so I work around it adding a version number in front of the 
struct and then I do the switch for v1, v2, v3 with all the fixed struct that I 
know...

...later I switch to a future release that have the code for table schema that 
"half" relies on this patch. How can I map my data? since I've done some tricks 
for my versioning I probably can't do anything... and I must rewrite 
everything..

as you said, data evolution is out of the scope. so if you consider this patch 
just as a  "smarter" alternative to the Bytes encoding. feel free to ignore my 
comments since this stuff already looks good to me as it is.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-17 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711250#comment-13711250
 ] 

Nick Dimiduk commented on HBASE-8693:
-

Moving [~mbertozzi]'s comment from the dev list back to JIRA:

bq. I was looking at the HBASE-8693 patch, and looks good to me for the 
primitive types.

Thanks, and I'm glad to hear it. Any comments about redundant or missing types 
a user would expect out of the box?

bq. but I can't see how do you plan to evolve stuff like the struct.

Struct is a programatic data structure, not a tool for schema management. It 
has no concept of "upgrade Struct Foo, version 1 to Foo version 2 by adding a 
new field in the middle here and changing the last one from X to Y." It's a 
convenience for manipulating complex {{byte[]}} structures. Schema management 
may become of concern for HBase, but that's out of scope. Any chance this topic 
came up at yesterday's meetup?

bq.  By "evolve" I mean add/remove fields, or just query it with a subset of 
fields. the fields don't have an id, and on read you must specify all of them 
in the same order as you've used for write. (but maybe is just an 
immutable/fixed list of fields, and I'm ok with just adding that info to the 
comment on top of the class)

I discovered this missing API while working through the example use patch, 
above. The update I posted on RB yesterday adds an API for accessing a specific 
struct member by position. If RB links work, take a look at 
[Struct#read(ByteBuffer, 
int)|https://reviews.apache.org/r/12069/diff/2-3/#12.21].

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-07-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710336#comment-13710336
 ] 

Nick Dimiduk commented on HBASE-8693:
-

RB is down at the moment. I have some incremental work on github, including an 
[example|https://github.com/ndimiduk/hbase/commit/c07c1c4f0187b78169b11f3a1ed20d9d268f5b65]
 of using {{HDataType}} in {{HRegionInfo}}.

I'm thinking I want to rename {{HDataType#\{read,write\}}} to 
{{HDataType#\{decode,encode\}}}. Thoughts?

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-06-26 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694262#comment-13694262
 ] 

Nick Dimiduk commented on HBASE-8693:
-

bq. Should this work be in hbase-common rather than in hbase-client?

Initial conversations required the type stuff not be in common. I agree, it 
makes more sense there and I think that community opinion is changing. The 
current implementation doesn't bring in any dependencies, so it should be 
painless.

bq. What is Order here?

{{Order}} is a component from the {{OrderedBytes}} implementation (see patch on 
HBASE-8201). It enables users to store data sorted in ascending or descending 
order. Right now it's mostly a vestigial appendage; I don't know how the data 
types API wants to expose and consume this functionality. I'm hoping to gain 
insight from Phoenix, Kiji, &c in future reviews.

bq. When would I use isCoercibleTo?

This comes from examination of Phoenix's {{PDataType}}. My understanding is, in 
the absence of secondary indices, the query planner can use type coercion to 
its advantage. This is the part of the data type API that I understand the 
least. I'm hoping for more clarity from [~giacomotaylor].

bq. I see a read on Union4

Sounds like a bug to me.

bq. How I describe a Struct outside of a Struct..?

Examples to follow.

bq. Whats a Binary?

Equivalent to SQL BLOB. This is how a user can inject good old fashion 
{{byte[]}}s into a {{Struct}} or {{Union}}.

bq. Do we need all these types?

Great question. That conversation is happening up on HBASE-8089. My preference 
is no, but I think the SQL guys want more of these for better interoperability 
between them.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-06-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693632#comment-13693632
 ] 

stack commented on HBASE-8693:
--

Should this work be in hbase-common rather than in hbase-client?  They are 
client facility at first but one day they might go server-side.  Also, easier 
adding to hbase-common than to hbase-client.  Unless they have dependencies?

What is Order here?

+  /**
+   * Write instance v into buffer b.
+   */
+  public abstract void write(ByteBuffer b, T v, Order ord);


When would I use isCoercibleTo?

I see a read on Union4 but not a write.  That intentional?  The union3 will 
take care of it?  Ditto union3...

How I describe a Struct outside of a Struct (JSON to describe how to make one?)

Whats a Binary?

Agree that example usage would help.

Do we need all these types?

Good stuff N.

I think you should post today's slides here too; they are good on the 
high-level.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-06-25 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693214#comment-13693214
 ] 

Nick Dimiduk commented on HBASE-8693:
-

bq. You kept the java name for all types, except BigDecimal, is there a reason?

The data types come from the (outdated) spec posted on HBASE-8089. I believe 
there is value in choosing types that are meaningful in a SQL context, but we 
shouldn't limit our thinking on this to what was laid down 30 years ago.

bq. Some unit tests could help to see how it should be used.

Agreed. Look for those in a followup patch. 

bq. For example, the constructors are private...

The idea is instances of {{HDataType}} are type definitions, not data values. 
For instance, the {{o.a.h.h.t.Decimal#DECIMAL}} instance is the definition of 
how to encode,decode values and how values of this type relate to values of 
other types. It does not represent a numeric value.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-06-25 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692855#comment-13692855
 ] 

Nicolas Liochon commented on HBASE-8693:


You kept the java name for all types, except BigDecimal, is there a reason?
Some unit tests could help to see how it should be used. For example, the 
constructors are private, I was wondering if one would not want to create these 
objects from core java classes (i.e.: create a hbase.Double from a java double).

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-06-24 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692489#comment-13692489
 ] 

Nick Dimiduk commented on HBASE-8693:
-

A note regarding variable length encodings. Variable vs fixed-width encodings 
was a highlighted point during conversations around HBASE-7221 and HBASE-7692. 
These data type implementations make exclusive use of the {{OrderedBytes}} 
encodings. That is because my thinking around them thus far is focused on use 
as rowkeys and column qualifiers. However, this requirement isn't strictly 
necessary for use in values. I noticed a rough analogy in Postgres's data type 
implementation is the distinction between the encoding used to store data and 
the encoding used for an index entry.

Do you think we should have a similar kind of dichotomy for encoding into 
order-preserving context vs non-order-preserving context? My initial thinking 
is probably not (due to additional API surface area), but I want to have the 
conversation.

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-06-24 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692447#comment-13692447
 ] 

Nick Dimiduk commented on HBASE-8693:
-

on rb: https://reviews.apache.org/r/12069/

(cc [~dmeil], [~giacomotaylor], [~eclark], [~owen.omalley], [~ashutoshc], 
[~alangates])

> Implement extensible type API based on serialization primitives
> ---
>
> Key: HBASE-8693
> URL: https://issues.apache.org/jira/browse/HBASE-8693
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.95.2
>
> Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira