[jira] [Commented] (HBASE-5612) Data types for HBase values

2012-06-27 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402677#comment-13402677
 ] 

Mikhail Bautin commented on HBASE-5612:
---

Enis: yes, it makes sense to develop a more fine-grained data model for 
rows/qualifiers/values, not just for values.

 Data types for HBase values
 ---

 Key: HBASE-5612
 URL: https://issues.apache.org/jira/browse/HBASE-5612
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 In many real-life applications all values in a certain column family are of a 
 certain data type, e.g. 64-bit integer. We could specify that in the column 
 descriptor and enable data type-specific compression such as variable-length 
 integer encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5612) Data types for HBase values

2012-06-26 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401899#comment-13401899
 ] 

Enis Soztutar commented on HBASE-5612:
--

At the recent HBase hackaton, and the BOF sessions, we had some discussions 
about adding some kind of schemas/data types to hbase, and Ian gave a short 
talk about it. Other than the use cases for this jira, having optional 
schema-data has the advantages of:
 - HBase internals can make use of data types (like the block level encoding, 
comparators for sub-fields in keys, etc)
 - HBase shell can make use of the data types, and display the data correctly
 - Hive/Pig can better map their own data-types to hbase types, and their 
schemas to hbase schema, instead of managing it themselves.
 - Client written coprocessors or system level coprocessors can do data 
validation according to the schema and data types.

So, what I am trying to say is that we can start to think of a bigger picture 
for the data types, rather than doing something only for compression/block 
encoding. WDTY? 

 Data types for HBase values
 ---

 Key: HBASE-5612
 URL: https://issues.apache.org/jira/browse/HBASE-5612
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 In many real-life applications all values in a certain column family are of a 
 certain data type, e.g. 64-bit integer. We could specify that in the column 
 descriptor and enable data type-specific compression such as variable-length 
 integer encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5612) Data types for HBase values

2012-06-25 Thread alex gemini (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401108#comment-13401108
 ] 

alex gemini commented on HBASE-5612:


for the filter, if we know the data types,we can use more meaningful comparison 
filter for column name to filter out more data in LESS filter and GREATER 
filter.

 Data types for HBase values
 ---

 Key: HBASE-5612
 URL: https://issues.apache.org/jira/browse/HBASE-5612
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 In many real-life applications all values in a certain column family are of a 
 certain data type, e.g. 64-bit integer. We could specify that in the column 
 descriptor and enable data type-specific compression such as variable-length 
 integer encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5612) Data types for HBase values

2012-06-25 Thread alex gemini (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401110#comment-13401110
 ] 

alex gemini commented on HBASE-5612:


for fixed-length data type(byte,integer,double etc),we can store key-length in 
RLE encoding like RCFile's metedata did. for data type like String for column 
email or address , we can use prefix compression or dictionary compression 
to store most used word in hfile's head, this will both save memory and disk 
space.

 Data types for HBase values
 ---

 Key: HBASE-5612
 URL: https://issues.apache.org/jira/browse/HBASE-5612
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 In many real-life applications all values in a certain column family are of a 
 certain data type, e.g. 64-bit integer. We could specify that in the column 
 descriptor and enable data type-specific compression such as variable-length 
 integer encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5612) Data types for HBase values

2012-03-21 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234895#comment-13234895
 ] 

Nicolas Spiegelberg commented on HBASE-5612:


Are we wanting to do type-specific compression for a struct schema or do we 
want to do something higher-level?  For example, are you suggesting something 
like Megastore or are suggesting something like JSON format where we can do 
optimizations like dictionary hashing?

 Data types for HBase values
 ---

 Key: HBASE-5612
 URL: https://issues.apache.org/jira/browse/HBASE-5612
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 In many real-life applications all values in a certain column family are of a 
 certain data type, e.g. 64-bit integer. We could specify that in the column 
 descriptor and enable data type-specific compression such as variable-length 
 integer encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5612) Data types for HBase values

2012-03-21 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234898#comment-13234898
 ] 

Mikhail Bautin commented on HBASE-5612:
---

@Nicolas: it is not totally clear to me what you mean by a struct schema. I was 
discussing data block encoding and its benefits when applied to our 
applications with Liyin, and the simple example of variable-length integer 
encoding for counters came up. We can take it in many different directions, and 
I just want to open a discussion to gauge interest.

 Data types for HBase values
 ---

 Key: HBASE-5612
 URL: https://issues.apache.org/jira/browse/HBASE-5612
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 In many real-life applications all values in a certain column family are of a 
 certain data type, e.g. 64-bit integer. We could specify that in the column 
 descriptor and enable data type-specific compression such as variable-length 
 integer encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira