[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715979#comment-13715979 ] Hadoop QA commented on HBASE-8693: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593621/0001-HBASE-8693-Extensible-data-types-API.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 28 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6432//console This message is automatically generated. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, > KijiFormattedEntityId.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713834#comment-13713834 ] Nick Dimiduk commented on HBASE-8693: - This {{HDataType}} interface and the two codecs upon which the implementations rely is not schema management for HBase. {{HDataType}} can be used to manage encoding values into rowkeys, column qualifiers, or values. Use an instance of {{Struct}}, or don't, in any of those contexts. The use of {{Struct}} in the order-sensitive context has driven more design thought, but it generates a {{byte[]}} wherever it's used. Would an example of an Avro, Thrift, or Protobuff {{HDataType}} implementation help to drive this idea home? My trouble with using the word "schema" for key-values is that context is too narrow a scope. Being able to consistently read a value out of a cell does not tell me what the schema of the database is. HBase provides basic *table* definition management but not *data* definition management, the effective meaning of schema. Pheonix and Kiji both provide a layer of schema management on top of HBase. Through them you define the logical layout of data in tables, and you abandon to them how that data is physically arranged and encoded. {{HDataType}} provides an API with which its user can control how data is physically arranged and encoded. Its user is still left to manage the logical layout and its meaning to their application for themselves. This patch is not schema management. It provides a common set of primitives that other applications can consume -- be them user applications developed directly against HBase or Phoenix or Kiji themselves. The consumers I've always had in mind have always been myself and application developers like me, Hive, Pig, and Phoenix. The primary benefit being that all those applications gain some level of interoperability through data in HBase. That I was able to read Kiji's avdl file and in an afternoon understand how HDataType could be used to make it's implementation simpler and more extensible is validation of utility. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, > KijiFormattedEntityId.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713153#comment-13713153 ] stack commented on HBASE-8693: -- IIRC, their avro idl is for all but the description of the rowkey. When they talk about rowkey 'schema', it is allowed that it cannot evolve for reasons discussed above. Adding to the right of a rowkey should be fine though. Ditto when serializing column qualifiers. High in this issue you raise: "Do you think we should have a similar kind of dichotomy for encoding into order-preserving context vs non-order-preserving context? My initial thinking is probably not (due to additional API surface area), but I want to have the conversation." You allow that there are two contexts (and indeed Matteo asks for clarification on this) -- one where there is no way around it but you need to rewrite the data if you want to refer to it using a different struct/'schema'; e.g. a rowkey (caveat adding fields to the right) -- and then there are the contexts where you should be able to evolve the content; e.g. cell content and even to a higher level where you might impose a schema made of multiple column content (or full row), and so on. This seems like a good split. In the cell context, the area where you would like to be able to evolve, sort order preservation is not required. In the simple case, an int16 type, you probably don't need versioning either? Its serialization is unlikely to change but you might want version even these primitive types just in case? If a compound type in a cell, you would like to be able to evolve it; to add fields, etc. So you could add a version to structs here? (but why would user use this lib over pb in this case?) Now you bleed over into higher level issues; schema and its follow-ons, where to store it and how to evolve, etc. (Matteo's concerns). I suppose we are fine given you have 'schema' and 'schema evolution' as out-of-scope in your answer to Matteo. We should be clear that these problems remain as to-be-solved (or solved by others -- see kiji) after this patch is done and be sure folks don't get the wrong impression. Just saying. On the adding fields to the right of your struct, where you have the application use the right struct version, pity your lib couldn't do that for the app. PB has a lead-off serialized length which saves it reading off the end of the record. You can't do that because you'll mess up your ordering. You can't lead the record with a version since that will also mess your sort order (as you say above). A buffer where you check available would be expensive... > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, > KijiFormattedEntityId.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713136#comment-13713136 ] Nick Dimiduk commented on HBASE-8693: - bq. again, I think that your focus at the moment is more on the key side... and my guess is that the struct is fine for that. but this jira is "serialization primitives" without a "row-keys" in front... so I assume you plan to use this stuff also for the cell values, and from what I said above... I don't see an easy way to evolve my cell data, without rewrite every time or doing "manual" mappings for each struct version. You're right, this implementation is too simplistic for storing complex entities in a Cell. You can do it, but you'll be a bit stuck as there's no concept schema identification or of evolution. I can see how the title can be miss-leading. OrderedBytes and HDataType are no replacement for application use of \{protobuf,avro,thrift\}, particularly in the "entity-centric modeling" approach with fat key-values. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, > KijiFormattedEntityId.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713076#comment-13713076 ] Matteo Bertozzi commented on HBASE-8693: Thanks for keeping following up on my out of scope questions. again, I think that I'm focusing more on the cell-value side instead of the key part which will be the one that will have the benefit from the ordered byte stuff and will probably have more restriction on the evolution since this stuff is client side only and you've to deal with the raw byte sorting of hbase. {quote}It's quite out of scope for my purposes, but I'm curious what you think about the future direction with schema. I think the Phoenix and Kiji folk will have some good insights.{quote} (I'll talk only about cell-values here, so I'm not interested in the ordered stuff in this case) I want to write my app today with this library. I'll start off using a Struct, and it's ok until I have to add/remove a field. so.. I can add a version/schema id.. but now I have the problem that I have to keep all the schemas and then project to the schema that I want to use. Example: - get row0 -> cell with schema 1 - get row1 -> cell with schema 2 - get row2 -> cell with schema 3 - Now the user/api have to handle this 3 different rows and project to a user provided schema to get out something useful to the user... In this case, you have to store all the schemas and you've to provide a mapping for each schema to the one that the user wants. The other approach, more protobuf like is each field has an id that must be unique. on read you provide your "read schema" and you load only the field present in the "read schema". note that this can also work with just with the api similar to what you have "getField(field_id)" where the id is the unique id and not the index. again, I think that your focus at the moment is more on the key side... and my guess is that the struct is fine for that. but this jira is "serialization primitives" without a "row-keys" in front... so I assume you plan to use this stuff also for the cell values, and from what I said above... I don't see an easy way to evolve my cell data, without rewrite every time or doing "manual" mappings for each struct version. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, > KijiFormattedEntityId.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713028#comment-13713028 ] Hadoop QA commented on HBASE-8693: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593082/KijiFormattedEntityId.java against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6404//console This message is automatically generated. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, > KijiFormattedEntityId.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713011#comment-13713011 ] Nick Dimiduk commented on HBASE-8693: - To be fair, sort order also is of concern in column names. My choice of the word "schema" was unfortunate in my previous comment. I should have said "no composite structure is written into the concatenation." Because HBase's only native data type is byte[], encodings are necessary for any application value other than byte[], wherever it hits a rowkey, qualifier, or value. It's quite out of scope for my purposes, but I'm curious what you think about the future direction with schema. I think the Phoenix and Kiji folk will have some good insights. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712084#comment-13712084 ] Matteo Bertozzi commented on HBASE-8693: above you talk about the sort order, and I guess just about the key. but when I talk about data or schema I refer to the cell value, not the key. For the key I think that the fixed or append-only, as you pointed out is good enough. again, maybe I'm out of scope.. but do you see those classes used only to encode the key? e.g. the struct mention explicitly the key in the comment. I probably see this as more generic key/value serialization, knowing about the future direction with the schema. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711775#comment-13711775 ] Nick Dimiduk commented on HBASE-8693: - bq. Ok, make sense with this limited scope (no schema) have a fixed list of fields. Right. In this implementation Struct is a simple concatenation of fields. No schema information is written into that concatenation because to do so will mess with sort order. Struct is merely API convenience. Now, the field encodings implemented in OrderedBytes include a header byte which is currently used to identify the type of encoded field that follows. The full space of 256 available bit patterns in that header bit is not consumed by the current implementation. I've been thinking about extending that header byte to include some version bits at the very beginning. That would enable evolution of the individual field encodings (say, if you later want to re-implement blob-mid, for example). This doesn't address the user-level logical structure of a Struct data type, only evolution of the OrderedBytes codec. bq. My main concern is: I start use 96 with this struct encoding... is fixed so I can't add fields.. so I work around it adding a version number in front of the struct and then I do the switch for v1, v2, v3 with all the fixed struct that I know... Prepending a version number to the Struct's members will impact sort order. Struct definition is fixed in that you can't prepend or interpose a new field in the middle of an existing encoded value. You're free to append fields. Appending a field would look like the following: # application defines Struct v0 with members [A,B,C] # application writes lots of data # application changes, Struct v1 becomes [A,B,C,D,E] # application writes lots more data At step 3, the application now needs to become version aware. Because the fields of v0 are a subset of v1, the application can use the definition of struct v1 with the following safe-guards. (1) Any place where v0 was used, it now needs to be sure to check for end-of-buffer and skip over the two new elements. (2) Anywhere v1 is used, mindful of truncated records and be prepared to only receive the v0 fields. Maybe the API defined around Struct can be improved to support these needs? Records of v0 and v1 can be intermixed, ie, as rowkeys in the same table. According to the documented sort semantics, they'll sort "left-to-right and depth-first". Meaning, they'll sort first according to v0 values and then within that group, by v1 values. We leave all of this up to user applications today, so this change management isn't mitigated. Changing a compound rowkey today requires rewriting data (or duplication into a new table). A smarter struct encoding, one that's able to preserve the sorted semantics I've described but that can also track more sophisticated schama change would be very useful indeed -- I don't think it exists. Prepending a version field to a Struct will change the sorting behavior; v0 will sort before v1, &c. IMHO, this is a less flexible migration strategy than the append behavior described above. It's also perfectly valid, and the user of the Struct API is free to do so in their own application. In that case, the application is still version-aware. Instead of being cautious about consuming the potentially truncated records, instead it's executing a scan for each version. bq. as you said, data evolution is out of the scope. so if you consider this patch just as a "smarter" alternative to the Bytes encoding. HBASE-8201 is a smarter alternative to Bytes and this ticket adds some higher-level APIs for manipulating them. In short, yes, schema definition and evolution is out of scope. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711708#comment-13711708 ] Matteo Bertozzi commented on HBASE-8693: {quote}Struct is a programatic data structure, not a tool for schema management. It has no concept of "upgrade Struct Foo, version 1 to Foo version 2 by adding a new field in the middle here and changing the last one from X to Y." It's a convenience for manipulating complex byte[] structures. Schema management may become of concern for HBase, but that's out of scope.{quote} Ok, make sense with this limited scope (no schema) have a fixed list of fields. My main concern is: I start use 96 with this struct encoding... is fixed so I can't add fields.. so I work around it adding a version number in front of the struct and then I do the switch for v1, v2, v3 with all the fixed struct that I know... ...later I switch to a future release that have the code for table schema that "half" relies on this patch. How can I map my data? since I've done some tricks for my versioning I probably can't do anything... and I must rewrite everything.. as you said, data evolution is out of the scope. so if you consider this patch just as a "smarter" alternative to the Bytes encoding. feel free to ignore my comments since this stuff already looks good to me as it is. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711250#comment-13711250 ] Nick Dimiduk commented on HBASE-8693: - Moving [~mbertozzi]'s comment from the dev list back to JIRA: bq. I was looking at the HBASE-8693 patch, and looks good to me for the primitive types. Thanks, and I'm glad to hear it. Any comments about redundant or missing types a user would expect out of the box? bq. but I can't see how do you plan to evolve stuff like the struct. Struct is a programatic data structure, not a tool for schema management. It has no concept of "upgrade Struct Foo, version 1 to Foo version 2 by adding a new field in the middle here and changing the last one from X to Y." It's a convenience for manipulating complex {{byte[]}} structures. Schema management may become of concern for HBase, but that's out of scope. Any chance this topic came up at yesterday's meetup? bq. By "evolve" I mean add/remove fields, or just query it with a subset of fields. the fields don't have an id, and on read you must specify all of them in the same order as you've used for write. (but maybe is just an immutable/fixed list of fields, and I'm ok with just adding that info to the comment on top of the class) I discovered this missing API while working through the example use patch, above. The update I posted on RB yesterday adds an API for accessing a specific struct member by position. If RB links work, take a look at [Struct#read(ByteBuffer, int)|https://reviews.apache.org/r/12069/diff/2-3/#12.21]. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch, > 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710336#comment-13710336 ] Nick Dimiduk commented on HBASE-8693: - RB is down at the moment. I have some incremental work on github, including an [example|https://github.com/ndimiduk/hbase/commit/c07c1c4f0187b78169b11f3a1ed20d9d268f5b65] of using {{HDataType}} in {{HRegionInfo}}. I'm thinking I want to rename {{HDataType#\{read,write\}}} to {{HDataType#\{decode,encode\}}}. Thoughts? > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, > 0001-HBASE-8693-Extensible-data-types-API.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694262#comment-13694262 ] Nick Dimiduk commented on HBASE-8693: - bq. Should this work be in hbase-common rather than in hbase-client? Initial conversations required the type stuff not be in common. I agree, it makes more sense there and I think that community opinion is changing. The current implementation doesn't bring in any dependencies, so it should be painless. bq. What is Order here? {{Order}} is a component from the {{OrderedBytes}} implementation (see patch on HBASE-8201). It enables users to store data sorted in ascending or descending order. Right now it's mostly a vestigial appendage; I don't know how the data types API wants to expose and consume this functionality. I'm hoping to gain insight from Phoenix, Kiji, &c in future reviews. bq. When would I use isCoercibleTo? This comes from examination of Phoenix's {{PDataType}}. My understanding is, in the absence of secondary indices, the query planner can use type coercion to its advantage. This is the part of the data type API that I understand the least. I'm hoping for more clarity from [~giacomotaylor]. bq. I see a read on Union4 Sounds like a bug to me. bq. How I describe a Struct outside of a Struct..? Examples to follow. bq. Whats a Binary? Equivalent to SQL BLOB. This is how a user can inject good old fashion {{byte[]}}s into a {{Struct}} or {{Union}}. bq. Do we need all these types? Great question. That conversation is happening up on HBASE-8089. My preference is no, but I think the SQL guys want more of these for better interoperability between them. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693632#comment-13693632 ] stack commented on HBASE-8693: -- Should this work be in hbase-common rather than in hbase-client? They are client facility at first but one day they might go server-side. Also, easier adding to hbase-common than to hbase-client. Unless they have dependencies? What is Order here? + /** + * Write instance v into buffer b. + */ + public abstract void write(ByteBuffer b, T v, Order ord); When would I use isCoercibleTo? I see a read on Union4 but not a write. That intentional? The union3 will take care of it? Ditto union3... How I describe a Struct outside of a Struct (JSON to describe how to make one?) Whats a Binary? Agree that example usage would help. Do we need all these types? Good stuff N. I think you should post today's slides here too; they are good on the high-level. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693214#comment-13693214 ] Nick Dimiduk commented on HBASE-8693: - bq. You kept the java name for all types, except BigDecimal, is there a reason? The data types come from the (outdated) spec posted on HBASE-8089. I believe there is value in choosing types that are meaningful in a SQL context, but we shouldn't limit our thinking on this to what was laid down 30 years ago. bq. Some unit tests could help to see how it should be used. Agreed. Look for those in a followup patch. bq. For example, the constructors are private... The idea is instances of {{HDataType}} are type definitions, not data values. For instance, the {{o.a.h.h.t.Decimal#DECIMAL}} instance is the definition of how to encode,decode values and how values of this type relate to values of other types. It does not represent a numeric value. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692855#comment-13692855 ] Nicolas Liochon commented on HBASE-8693: You kept the java name for all types, except BigDecimal, is there a reason? Some unit tests could help to see how it should be used. For example, the constructors are private, I was wondering if one would not want to create these objects from core java classes (i.e.: create a hbase.Double from a java double). > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692489#comment-13692489 ] Nick Dimiduk commented on HBASE-8693: - A note regarding variable length encodings. Variable vs fixed-width encodings was a highlighted point during conversations around HBASE-7221 and HBASE-7692. These data type implementations make exclusive use of the {{OrderedBytes}} encodings. That is because my thinking around them thus far is focused on use as rowkeys and column qualifiers. However, this requirement isn't strictly necessary for use in values. I noticed a rough analogy in Postgres's data type implementation is the distinction between the encoding used to store data and the encoding used for an index entry. Do you think we should have a similar kind of dichotomy for encoding into order-preserving context vs non-order-preserving context? My initial thinking is probably not (due to additional API surface area), but I want to have the conversation. > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692447#comment-13692447 ] Nick Dimiduk commented on HBASE-8693: - on rb: https://reviews.apache.org/r/12069/ (cc [~dmeil], [~giacomotaylor], [~eclark], [~owen.omalley], [~ashutoshc], [~alangates]) > Implement extensible type API based on serialization primitives > --- > > Key: HBASE-8693 > URL: https://issues.apache.org/jira/browse/HBASE-8693 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.95.2 > > Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira