[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867428#action_12867428 ] Alex Newman commented on PIG-759: - I could recut it for 0.7.0 if that would be better. What do you guys want me to do? HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner Fix For: 0.7.0 Attachments: patch.p1 We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867558#action_12867558 ] Dmitriy V. Ryaboy commented on PIG-759: --- Alex, Check out the modified Hbase loader I have in ElephantBird that does this and more: http://github.com/kevinweil/elephant-bird/tree/master/src/java/com/twitter/elephantbird/pig/load/ (you want HBaseLoader and HBaseSlice). It works with 0.6; a major backwards-incompatible change is that it doesn't expect the HBase table to contain string represenations of everything, and tries to work on the byte level instead. Porting this into 0.7 might involve allowing the Caster interface to be user-specified, in which case it would be trivial to use the old String approach or the new Binary approach. Feel free to take on porting this to 0.7, I probably won't get to that for at least a month. We are completely open to putting this back into Pig, the only reason it's in EB is that 0.6 was frozen when this was created, and we don't yet run 0.7 at Twitter :). -D HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner Fix For: 0.7.0 Attachments: patch.p1 We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867577#action_12867577 ] Daniel Dai commented on PIG-759: Alex, It is closed because it is marked with 0.7 and it is closed by rule due to the release of 0.7. Feel free to open a new Jira if you want to continue the work. HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner Fix For: 0.7.0 Attachments: patch.p1 We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867580#action_12867580 ] Alex Newman commented on PIG-759: - Daniel: Harsh, I'll work the other way and fix a random other pig jira. Thanks for the clarification though :) HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner Fix For: 0.7.0 Attachments: patch.p1 We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839884#action_12839884 ] Dmitriy V. Ryaboy commented on PIG-759: --- Olga, I am not sure why this got marked as invalid. Seems like the point isn't validation of path, but the ability to filter by start and end rows, avoiding unnecessary data scans. The patch clearly won't apply to 0.7, but maybe it should be marked as available for 0.6? HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner Fix For: 0.7.0 Attachments: patch.p1 We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752601#action_12752601 ] Alan Gates commented on PIG-759: Things can be passed as bytes in Pig by passing them as bytearrays. This is the default if a type is not declared. I can't assign the bug to you because you're not in the list of assignable people for Pig bugs. I think Olga has to add you to that list. HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner Attachments: patch.p1 We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752668#action_12752668 ] Hadoop QA commented on PIG-759: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416226/patch.p1 against trunk revision 811203. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 249 javac compiler warnings (more than the trunk's current 247 warnings). -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/2/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/2/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/2/console This message is automatically generated. HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner Fix For: 0.4.0 Attachments: patch.p1 We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718475#action_12718475 ] Alex Newman commented on PIG-759: - Indeed, I am down with this syntax. HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702535#action_12702535 ] David Ciemiewicz commented on PIG-759: -- If hbase has named columns in it's schema, why wouldn't it be appropriate to say something like: table = load '$tablename/$subsection' using HBaseStorage() as (a, b); Since HBaseStorage() is specified: 1) Isn't hbase:// implicit? 2) Shouldn't I be able to just specify the names in the AS clause? HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702602#action_12702602 ] Alex Newman commented on PIG-759: - I actually like the protocol specification as it allows us flexibility to hit hbase with another protocol like thrift but I may be overthinking it. Sent from mobile -- Sent from my mobile device HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702002#action_12702002 ] Alan Gates commented on PIG-759: Are you suggesting that the hbase scheme include ways to specify table subsections and column lists? Something like this: {code} table = load 'hbase://tablename/subsection?col=a,col=b' using HBaseStorage(); {code} HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702057#action_12702057 ] Alex Newman commented on PIG-759: - Exactly although I am not sure what the subsection syntax should look like. HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697606#action_12697606 ] Alex Newman commented on PIG-759: - This looks like a great idea although I still am a big fan of being able to specify sub sections of an HBase table along with specific columns in the load statement HBaseStorage scheme for Load/Slice function --- Key: PIG-759 URL: https://issues.apache.org/jira/browse/PIG-759 Project: Pig Issue Type: Bug Reporter: Gunther Hagleitner We would like to change the HBaseStorage function to use a scheme when loading a table in pig. The scheme we are thinking of is: hbase. So in order to load an hbase table in a pig script the statement should read: {noformat} table = load 'hbase://tablename' using HBaseStorage(); {noformat} If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage function would use the last component of the path as a table name and output a warning. For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.