[jira] [Commented] (HBASE-5612) Data types for HBase values
[ https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402677#comment-13402677 ] Mikhail Bautin commented on HBASE-5612: --- Enis: yes, it makes sense to develop a more fine-grained data model for rows/qualifiers/values, not just for values. Data types for HBase values --- Key: HBASE-5612 URL: https://issues.apache.org/jira/browse/HBASE-5612 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin In many real-life applications all values in a certain column family are of a certain data type, e.g. 64-bit integer. We could specify that in the column descriptor and enable data type-specific compression such as variable-length integer encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5612) Data types for HBase values
[ https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401899#comment-13401899 ] Enis Soztutar commented on HBASE-5612: -- At the recent HBase hackaton, and the BOF sessions, we had some discussions about adding some kind of schemas/data types to hbase, and Ian gave a short talk about it. Other than the use cases for this jira, having optional schema-data has the advantages of: - HBase internals can make use of data types (like the block level encoding, comparators for sub-fields in keys, etc) - HBase shell can make use of the data types, and display the data correctly - Hive/Pig can better map their own data-types to hbase types, and their schemas to hbase schema, instead of managing it themselves. - Client written coprocessors or system level coprocessors can do data validation according to the schema and data types. So, what I am trying to say is that we can start to think of a bigger picture for the data types, rather than doing something only for compression/block encoding. WDTY? Data types for HBase values --- Key: HBASE-5612 URL: https://issues.apache.org/jira/browse/HBASE-5612 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin In many real-life applications all values in a certain column family are of a certain data type, e.g. 64-bit integer. We could specify that in the column descriptor and enable data type-specific compression such as variable-length integer encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5612) Data types for HBase values
[ https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401108#comment-13401108 ] alex gemini commented on HBASE-5612: for the filter, if we know the data types,we can use more meaningful comparison filter for column name to filter out more data in LESS filter and GREATER filter. Data types for HBase values --- Key: HBASE-5612 URL: https://issues.apache.org/jira/browse/HBASE-5612 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin In many real-life applications all values in a certain column family are of a certain data type, e.g. 64-bit integer. We could specify that in the column descriptor and enable data type-specific compression such as variable-length integer encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5612) Data types for HBase values
[ https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401110#comment-13401110 ] alex gemini commented on HBASE-5612: for fixed-length data type(byte,integer,double etc),we can store key-length in RLE encoding like RCFile's metedata did. for data type like String for column email or address , we can use prefix compression or dictionary compression to store most used word in hfile's head, this will both save memory and disk space. Data types for HBase values --- Key: HBASE-5612 URL: https://issues.apache.org/jira/browse/HBASE-5612 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin In many real-life applications all values in a certain column family are of a certain data type, e.g. 64-bit integer. We could specify that in the column descriptor and enable data type-specific compression such as variable-length integer encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5612) Data types for HBase values
[ https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234895#comment-13234895 ] Nicolas Spiegelberg commented on HBASE-5612: Are we wanting to do type-specific compression for a struct schema or do we want to do something higher-level? For example, are you suggesting something like Megastore or are suggesting something like JSON format where we can do optimizations like dictionary hashing? Data types for HBase values --- Key: HBASE-5612 URL: https://issues.apache.org/jira/browse/HBASE-5612 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin In many real-life applications all values in a certain column family are of a certain data type, e.g. 64-bit integer. We could specify that in the column descriptor and enable data type-specific compression such as variable-length integer encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5612) Data types for HBase values
[ https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234898#comment-13234898 ] Mikhail Bautin commented on HBASE-5612: --- @Nicolas: it is not totally clear to me what you mean by a struct schema. I was discussing data block encoding and its benefits when applied to our applications with Liyin, and the simple example of variable-length integer encoding for counters came up. We can take it in many different directions, and I just want to open a discussion to gauge interest. Data types for HBase values --- Key: HBASE-5612 URL: https://issues.apache.org/jira/browse/HBASE-5612 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin In many real-life applications all values in a certain column family are of a certain data type, e.g. 64-bit integer. We could specify that in the column descriptor and enable data type-specific compression such as variable-length integer encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira