[jira] [Comment Edited] (HBASE-17257) Add column-aliasing capability to hbase-client

2016-12-14 Thread Daniel Vimont (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728144#comment-15728144
 ] 

Daniel Vimont edited comment on HBASE-17257 at 12/15/16 5:24 AM:
-

Here are some specifications of what I’ve currently designed and coded for 
column-aliasing (and will soon be submitting as a patch)...


*COLUMN-ALIASING FOR THE END-USER*:
>From the end-user perspective, column-aliasing entails the following two 
>things...


(1) _Environmental configuration to enable aliasing_:
Aliasing makes use of the already-existing {{hbase.client.connection.impl}} 
configuration parameter. The following entry should be added to 
{{hbase-site.xml}}:
{code}
  
hbase.client.connection.impl
org.apache.hadoop.hbase.client.AliasEnabledConnection
  
{code}
Setting this parameter to this value results in 
{{ConnectionFactory#createConnection}} returning Connections of the new 
{{AliasEnabledConnection}} class (subclass of the {{ConnectionImplementation}} 
class).


(2) _Alias-enabling individual column families_:
Aliasing is enabled at the column-family level. When adding a column-descriptor 
(i.e. family) to a Table, the new method {{HColumnDescriptor#setAliasSize}} may 
be invoked to immutably set the fixed size (in bytes) of column-qualifier 
aliases for the column family. The default value of 0 (aliasing disabled) may 
be changed to either 1, 2, or 4.


Other than the above, the end-user-application code should neither require nor 
contain any “awareness” whatsoever that column-aliasing is being utilized for a 
column-family. An end-user-application continues to interact only with the 
standard interfaces of the client API ({{Connection}}, {{Table}}, 
{{BufferedMutator}}, and {{HTableMultiplexer}}).


*COLUMN-ALIASING INTERNALS*:
One of the overriding goals in designing the column-aliasing infrastructure is 
to minimize alterations and insertions into already-existing hbase-client code, 
and to have very-close-to-zero impact on already-existing functionality, 
particularly in those situations in which aliasing will NOT be used. The 
following is a comprehensive list of all new and modified modules, along with 
an explanation as to the role that the new or modified module plays in aliasing.


_Modified classes_:
*HColumnDescriptor*: as described above, the new method {{#setAliasSize}} has 
been added; also corresponding methods, {{#getAliasSize}} and 
{{#isAliasEnabled}} (returns “true” if aliasSize not zero).
*HTableDescriptor*: new method {{#hasAliasEnabledFamily}} has been added 
(returns “true” if one or more of the table’s families are aliasEnabled).
(Corresponding modifications were also made to {{TestHColumnDescriptor}} and 
{{TestHTableDescriptor}}, to test the new methods appropriately.)


_New class_:
*AliasEnabledConnection* (subclass of {{ConnectionImplementation}}): provides 
overrides of {{#getTable}} and {{#getBufferedMutator}} to return objects of the 
{{AliasEnabledTable}} class and the {{AliasEnabledBufferedMutator}} class, 
respectively.


_Modified class_:
*HTableMultiplexer*: new static method added -- 
{{#getAliasEnabledTableMultiplexer}}, returns an {{HTableMultiplexer}} object 
that is actually an instance of the new subclass, 
{{AliasEnabledTableMultiplexer}}.


_New classes_:
*AliasEnabledTable* (subclass of {{HTable}})
*AliasEnabledBufferedMutator* (subclass of {{BufferedMutatorImpl}})
*AliasEnabledTableMultiplexer* (subclass of {{HTableMultiplexer}})
-- all the above contain overrides which allow for {{AliasManager}} methods to 
be invoked when needed -- to perform qualifier-to-alias conversions (for 
{{Get}}, {{Scan}}, and {{Mutation}} objects), and alias-to-qualifier 
conversions (for {{Result}} objects) -- for any Table for which 
{{HTableDescriptor#hasAliasEnabledFamily}} is “true”.


_New class_:
*AliasManager*: performs all alias-oriented conversions -- qualifier-to-alias 
conversions for queries and  mutations, and alias-to-qualifier conversions for 
results. It fully encapsulates all CRUD transactions against the 
{{aliasMappingTable}} (the HBase table in which qualifier-to-alias mappings are 
persisted for each alias-enabled column family). When a {{Mutation}} object 
contains a column-qualifier for which an alias entry does not yet exist, a new 
alias is generated and stored in a qualifier-to-alias mapping entry in the 
{{aliasMappingTable}}. The first time an {{AliasManager}} is instantiated 
against an HBase cluster, the {{aliasMappingTable}} will be created if it does 
not already exist.


_New reserved HBase table_:
*aliasMappingTable*: Each row on the {{aliasMappingTable}} corresponds to an 
aliasEnabled column-family on a user-table. The rowId of each 
{{aliasMappingTable}} row is in the format: {{[fully-qualified-user-table-name 
+ ":" + aliasEnabled-Family]}}. On each {{aliasMappingTable}} row, the column 
with an EMPTY_BYTE_ARRAY column-qualifier 

[jira] [Comment Edited] (HBASE-17257) Add column-aliasing capability to hbase-client

2016-12-14 Thread Daniel Vimont (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728144#comment-15728144
 ] 

Daniel Vimont edited comment on HBASE-17257 at 12/15/16 5:24 AM:
-

Here are some specifications of what I’ve currently designed and coded for 
column-aliasing (and will soon be submitting as a patch)...


*COLUMN-ALIASING FOR THE END-USER*:
>From the end-user perspective, column-aliasing entails the following two 
>things...


(1) _Environmental configuration to enable aliasing_:
Aliasing makes use of the already-existing {{hbase.client.connection.impl}} 
configuration parameter. The following entry should be added to 
{{hbase-site.xml}}:
{code}
  
hbase.client.connection.impl
org.apache.hadoop.hbase.client.AliasEnabledConnection
  
{code}
Setting this parameter to this value results in 
{{ConnectionFactory#createConnection}} returning Connections of the new 
{{AliasEnabledConnection}} class (subclass of the {{ConnectionImplementation}} 
class).


(2) _Alias-enabling individual column families_:
Aliasing is enabled at the column-family level. When adding a column-descriptor 
(i.e. family) to a Table, the new method {{HColumnDescriptor#setAliasSize}} may 
be invoked to immutably set the fixed size (in bytes) of column-qualifier 
aliases for the column family. The default value of 0 (aliasing disabled) may 
be changed to either 1, 2, or 4.


Other than the above, the end-user-application code should neither require nor 
contain any “awareness” whatsoever that column-aliasing is being utilized for a 
column-family. An end-user-application continues to interact only with the 
standard interfaces of the client API ({{Connection}}, {{Table}}, 
{{BufferedMutator}}, and {{HTableMultiplexer}}).


*COLUMN-ALIASING INTERNALS*:
One of the overriding goals in designing the column-aliasing infrastructure is 
to minimize alterations and insertions into already-existing hbase-client code, 
and to have very-close-to-zero impact on already-existing functionality, 
particularly in those situations in which aliasing will NOT be used. The 
following is a comprehensive list of all new and modified modules, along with 
an explanation as to the role that the new or modified module plays in aliasing.


_Modified classes_:
*HColumnDescriptor*: as described above, the new method {{#setAliasSize}} has 
been added; also corresponding methods, {{#getAliasSize}} and 
{{#isAliasEnabled}} (returns “true” if aliasSize not zero).
*HTableDescriptor*: new method {{#hasAliasEnabledFamily}} has been added 
(returns “true” if one or more of the table’s families are aliasEnabled).
(Corresponding modifications were also made to {{TestHColumnDescriptor}} and 
{{TestHTableDescriptor}}, to test the new methods appropriately.)


_New class_:
*AliasEnabledConnection* (subclass of {{ConnectionImplementation}}): provides 
overrides of {{#getTable}} and {{#getBufferedMutator}} to return objects of the 
{{AliasEnabledTable}} class and the {{AliasEnabledBufferedMutator}} class, 
respectively.


_Modified class_:
*HTableMultiplexer*: new static method added -- 
{{#getAliasEnabledTableMultiplexer}}, returns an {{HTableMultiplexer}} object 
that is actually an instance of the new subclass, 
{{AliasEnabledTableMultiplexer}}.


_New classes_:
*AliasEnabledTable* (subclass of {{HTable}})
*AliasEnabledBufferedMutator* (subclass of {{BufferedMutatorImpl}})
*AliasEnabledTableMultiplexer* (subclass of {{HTableMultiplexer}})
-- all the above contain overrides which allow for {{AliasManager}} methods to 
be invoked when needed -- to perform qualifier-to-alias conversions (for 
{{Get}}, {{Scan}}, and {{Mutation}} objects), and alias-to-qualifier 
conversions (for {{Result}} objects) -- for any Table for which 
{{HTableDescriptor#hasAliasEnabledFamily}} is “true”.


_New class_:
*AliasManager*: performs all alias-oriented conversions -- qualifier-to-alias 
conversions for queries and  mutations, and alias-to-qualifier conversions for 
results. It fully encapsulates all CRUD transactions against the 
{{aliasMappingTable}} (the HBase table in which qualifier-to-alias mappings are 
persisted for each alias-enabled column family). When a {{Mutation}} object 
contains a column-qualifier for which an alias entry does not yet exist, a new 
alias is generated and stored in a qualifier-to-alias mapping entry in the 
{{aliasMappingTable}}. The first time an {{AliasManager}} is instantiated 
against an HBase cluster, the {{aliasMappingTable}} will be created if it does 
not already exist.


_New reserved HBase table_:
*aliasMappingTable*: Each row on the {{aliasMappingTable}} corresponds to an 
aliasEnabled column-family on a user-table. The rowId of each 
{{aliasMappingTable}} row is in the format: {{[fully-qualified-user-table-name 
+ ":" + aliasEnabled-Family]}}. On each {{aliasMappingTable}} row, the column 
with an EMPTY_BYTE_ARRAY column-qualifier 

[jira] [Comment Edited] (HBASE-17257) Add column-aliasing capability to hbase-client

2016-12-10 Thread Daniel Vimont (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15738807#comment-15738807
 ] 

Daniel Vimont edited comment on HBASE-17257 at 12/11/16 1:20 AM:
-

Okay, when I Google about for "hbase review board", two different RB 
implementations come up: one hosted by Apache (review.apache.org), the other by 
Cloudera (review.cloudera.org). I'm starting out presuming that we're using the 
Apache-hosted one. My Apache account that already exists for logging into this 
JIRA system does not work for getting me into review.apache.org, so I created a 
new account to get in.

Curious item: when I Googled for "how to post to hbase review board", the first 
link that came up was by a certain "Ted Yu" ;) who was running into trouble 
(back in 2010) getting logged into Review Board and posting a diff file: 
http://grokbase.com/t/hbase/dev/10c5bdz188/putting-patch-up-on-review-board

Anyway, using my new account to get into review.apache.org, I selected "New 
Review Request", and in the "New Review Request for Pending Change" pane I 
uploaded my patch, whereupon the web UI asked me "What is the base directory 
for this diff?". Supposing the correct answer to this was the relative 
directory within the project in which I generated the patch, I entered 
"/hbase". Then I receive the error, "The specified diff file could not be 
parsed. Line 43: No valid separator after the filename was found in the diff 
header".

So, must I manually alter the patch header to make it acceptable to RB? Or do I 
need to use some other git function to generate a differently formatted "diff" 
file? Or something else altogether?

BTW, after I get this all sorted out, I will definitely create a new JIRA entry 
for inserting basic info about the existence and usage of RB into the HBase 
Reference Guide!


was (Author: daniel_vimont):
Okay, when I Google about for "hbase review board", two different RB 
implementations come up: one hosted by Apache (review.apache.org), the other by 
Cloudera (review.cloudera.org). I'm starting out presuming that we're using the 
Apache-hosted one. My Apache account that already exists for logging into this 
JIRA system does not work for getting me into review.apache.org, so I created a 
new account to get in.

Curious item: when I Googled for "how to post to hbase review board", the first 
link that came up was by a certain "Ted Yu" who was running into trouble (back 
in 2010) getting logged into Review Board and posting a diff file: 
http://grokbase.com/t/hbase/dev/10c5bdz188/putting-patch-up-on-review-board

Anyway, using my new account to get into review.apache.org, I selected "New 
Review Request", and in the "New Review Request for Pending Change" pane I 
uploaded my patch, whereupon the web UI asked me "What is the base directory 
for this diff?". Supposing the correct answer to this was the relative 
directory within the project in which I generated the patch, I entered 
"/hbase". Then I receive the error, "The specified diff file could not be 
parsed. Line 43: No valid separator after the filename was found in the diff 
header".

So, must I manually alter the patch header to make it acceptable to RB? Or do I 
need to use some other git function to generate a differently formatted "diff" 
file? Or something else altogether?

BTW, after I get this all sorted out, I will definitely create a new JIRA entry 
for inserting basic info about the existence and usage of RB into the HBase 
Reference Guide!

> Add column-aliasing capability to hbase-client
> --
>
> Key: HBASE-17257
> URL: https://issues.apache.org/jira/browse/HBASE-17257
> Project: HBase
>  Issue Type: New Feature
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Daniel Vimont
>Assignee: Daniel Vimont
>  Labels: features
> Attachments: HBASE-17257-v2.patch, HBASE-17257-v3.patch, 
> HBASE-17257.patch
>
>
> Column aliasing will provide the option for a 1, 2, or 4 byte alias value to 
> be stored in each cell of an "alias enabled" column-family, in place of the 
> full-length column-qualifier. Aliasing is intended to operate completely 
> invisibly to the end-user developer, with absolutely no "awareness" of 
> aliasing required to be coded into a front-end application. No new public 
> hbase-client interfaces are to be introduced, and only a few new public 
> methods should need to be added to existing interfaces, primarily to allow an 
> administrator to designate that a new column-family is to be alias-enabled by 
> setting its aliasSize attribute to 1, 2, or 4.
> To facilitate such functionality, new subclasses of HTable, 
> BufferedMutatorImpl, and HTableMultiplexer are to be provided. The overriding 
> methods of these new subclasses will invoke methods of 

[jira] [Comment Edited] (HBASE-17257) Add column-aliasing capability to hbase-client

2016-12-07 Thread Daniel Vimont (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728157#comment-15728157
 ] 

Daniel Vimont edited comment on HBASE-17257 at 12/7/16 8:50 AM:


Submitting this patch -- not with the expectation that it will sail through "as 
is" (although I suppose anything is possible) -- but rather to allow 
examination, testing, bench-marking, commenting, etc. on what has been 
developed thus far for column-aliasing. I will soon be making further comments 
on areas that I see as open to (and needful of) discussion.

PLEASE SEE SPECIFICATIONS ABOVE FOR DETAILS ON PATCH CONTENT.


was (Author: daniel_vimont):
Submitting this patch -- not with the expectation that it will sail through "as 
is" (although I suppose anything is possible) -- but rather to allow 
examination, testing, bench-marking, commenting, etc. on what has been 
developed thus far for column-aliasing. I will soon be making further comments 
on areas that I see as open to (and needful of) discussion.

> Add column-aliasing capability to hbase-client
> --
>
> Key: HBASE-17257
> URL: https://issues.apache.org/jira/browse/HBASE-17257
> Project: HBase
>  Issue Type: New Feature
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Daniel Vimont
>Assignee: Daniel Vimont
>  Labels: features
> Attachments: HBASE-17257.patch
>
>
> Column aliasing will provide the option for a 1, 2, or 4 byte alias value to 
> be stored in each cell of an "alias enabled" column-family, in place of the 
> full-length column-qualifier. Aliasing is intended to operate completely 
> invisibly to the end-user developer, with absolutely no "awareness" of 
> aliasing required to be coded into a front-end application. No new public 
> hbase-client interfaces are to be introduced, and only a few new public 
> methods should need to be added to existing interfaces, primarily to allow an 
> administrator to designate that a new column-family is to be alias-enabled by 
> setting its aliasSize attribute to 1, 2, or 4.
> To facilitate such functionality, new subclasses of HTable, 
> BufferedMutatorImpl, and HTableMultiplexer are to be provided. The overriding 
> methods of these new subclasses will invoke methods of the new AliasManager 
> class to facilitate qualifier-to-alias conversions (for user-submitted Gets, 
> Scans, and Mutations) and alias-to-qualifier conversions (for Results 
> returned from HBase) for any Table that has one or more alias-enabled column 
> families. All conversion logic will be encapsulated in the new AliasManager 
> class, and all qualifier-to-alias mappings will be persisted in a new 
> aliasMappingTable in a new, reserved namespace.
> An informal polling of HBase users at HBaseCon East and at the 
> Strata/Hadoop-World conference in Sept. 2016 showed that Column Aliasing 
> could be a popular enhancement to standard HBase functionality, due to the 
> fact that full column-qualifiers are stored in each cell, and reducing this 
> qualifier storage requirement down to 1, 2, or 4 bytes per cell could prove 
> beneficial in terms of reduced storage and bandwidth needs. Aliasing is 
> intended chiefly for column-families which are of the "narrow and tall" 
> variety (i.e., that are designed to use relatively few distinct 
> column-qualifiers throughout a large number of rows, throughout the lifespan 
> of the column-family). A column-family that is set up with an alias-size of 1 
> byte can contain up to 255 unique column-qualifiers; a 2 byte alias-size 
> allows for up to 65,535 unique column-qualifiers; and a 4 byte alias-size 
> allows for up to 4,294,967,295 unique column-qualifiers.
> Fuller specifications will be entered into the comments section below. Note 
> that it may well not be viable to add aliasing support in the new "async" 
> classes that appear to be currently under development.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-17257) Add column-aliasing capability to hbase-client

2016-12-05 Thread Daniel Vimont (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723952#comment-15723952
 ] 

Daniel Vimont edited comment on HBASE-17257 at 12/6/16 1:24 AM:


I very much hope that my original research is mistaken, and that a system 
table/region CAN be split (because arguably one of the ugliest aspects of my 
current design is the need for a separate namespace for the aliasMappingTable).

I will get out some details of the overall design soon, and I will review my 
original research which seemed to indicate that system tables/regions cannot be 
split.


was (Author: daniel_vimont):
I very much hope that my original research is mistaken, and that a system 
table/region CAN be split (because arguably one of the ugliest aspects of my 
current design is the need for a separate namespace for the aliasMappingTable).

I will get out some details of the overall design soon, and I will review my 
original research which seemed to indicate that system tables cannot be split.

> Add column-aliasing capability to hbase-client
> --
>
> Key: HBASE-17257
> URL: https://issues.apache.org/jira/browse/HBASE-17257
> Project: HBase
>  Issue Type: New Feature
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Daniel Vimont
>Assignee: Daniel Vimont
>  Labels: features
>
> Column aliasing will provide the option for a 1, 2, or 4 byte alias value to 
> be stored in each cell of an "alias enabled" column-family, in place of the 
> full-length column-qualifier. Aliasing is intended to operate completely 
> invisibly to the end-user developer, with absolutely no "awareness" of 
> aliasing required to be coded into a front-end application. No new public 
> hbase-client interfaces are to be introduced, and only a few new public 
> methods should need to be added to existing interfaces, primarily to allow an 
> administrator to designate that a new column-family is to be alias-enabled by 
> setting its aliasSize attribute to 1, 2, or 4.
> To facilitate such functionality, new subclasses of HTable, 
> BufferedMutatorImpl, and HTableMultiplexer are to be provided. The overriding 
> methods of these new subclasses will invoke methods of the new AliasManager 
> class to facilitate qualifier-to-alias conversions (for user-submitted Gets, 
> Scans, and Mutations) and alias-to-qualifier conversions (for Results 
> returned from HBase) for any Table that has one or more alias-enabled column 
> families. All conversion logic will be encapsulated in the new AliasManager 
> class, and all qualifier-to-alias mappings will be persisted in a new 
> aliasMappingTable in a new, reserved namespace.
> An informal polling of HBase users at HBaseCon East and at the 
> Strata/Hadoop-World conference in Sept. 2016 showed that Column Aliasing 
> could be a popular enhancement to standard HBase functionality, due to the 
> fact that full column-qualifiers are stored in each cell, and reducing this 
> qualifier storage requirement down to 1, 2, or 4 bytes per cell could prove 
> beneficial in terms of reduced storage and bandwidth needs. Aliasing is 
> intended chiefly for column-families which are of the "narrow and tall" 
> variety (i.e., that are designed to use relatively few distinct 
> column-qualifiers throughout a large number of rows, throughout the lifespan 
> of the column-family). A column-family that is set up with an alias-size of 1 
> byte can contain up to 255 unique column-qualifiers; a 2 byte alias-size 
> allows for up to 65,535 unique column-qualifiers; and a 4 byte alias-size 
> allows for up to 4,294,967,295 unique column-qualifiers.
> Fuller specifications will be entered into the comments section below. Note 
> that it may well not be viable to add aliasing support in the new "async" 
> classes that appear to be currently under development.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)