[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085973#comment-14085973 ] Lefty Leverenz commented on HIVE-6584: -- Does this need to be documented in the wiki? Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086306#comment-14086306 ] Carter Shanklin commented on HIVE-6584: --- I had to read the source code to get it to work, so I vote yes. Using Hive over HBase snapshots requires 2 variables to be set, hive.hbase.snapshot.name - The name of the HBase snapshot to be used when reading the HBase data. hive.hbase.snapshot.restoredir - A temporary directory into which the hbase snapshot is restored when queried using hive.hbase.snapshot.name. A number of directories and small files will be created under this directory, proportional to the number of regions in the HBase table. The table data itself will not be copied under this directory, only metadata. After query execution is complete, this directory can be removed. Example: set hive.hbase.snapshot.name=snapshot_2014_08_03; set hive.hbase.snapshot.restoredir=/tmp/restore select count(*) from hbase_table; After the job is complete, /tmp/restore and its subdirectories can be deleted. [~ndimiduk] talked about making hive.hbase.snapshot.restoredir an optional setting, he can comment whether he implemented this or not. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086308#comment-14086308 ] Carter Shanklin commented on HIVE-6584: --- I can't edit my comment but due to some unwanted formatting let me clarify the example: {code} Example: set hive.hbase.snapshot.name=snapshot_2014_08_03; set hive.hbase.snapshot.restoredir=/tmp/restore select count(*) from hbase_table; After the job is complete, /tmp/restore and its subdirectories can be deleted. {code} Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086367#comment-14086367 ] Nick Dimiduk commented on HIVE-6584: Restore location is optional. It defaults to /tmp. The restore process creates a uniquely named (random uuid) directory under this path for any give restore, so users who never set this value will not conflict with each other. It would be nice if hive had some kind of post-job hook that could be used to clean up the restoredir artifacts after the input format is finished with them. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084852#comment-14084852 ] Nick Dimiduk commented on HIVE-6584: Thanks folks! Any chance of getting a commit this week? :) Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085140#comment-14085140 ] Sushanth Sowmyan commented on HIVE-6584: Committed to trunk. Thanks Nick, and Thanks Navis for the review as well. :) Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084227#comment-14084227 ] Navis commented on HIVE-6584: - +1 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081249#comment-14081249 ] Nick Dimiduk commented on HIVE-6584: I updated RB as well, the interesting addition is HBaseTableSnapshotInputFormatUtil.java and its use: https://reviews.apache.org/r/23824/diff/1-2/#7 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081518#comment-14081518 ] Sushanth Sowmyan commented on HIVE-6584: I like the changes made to .14.patch, definitely cleaner, and will make resolution of HIVE-7534 trivial. I also like the new error message, makes it more obvious for the end-user as to what they need. +1. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081645#comment-14081645 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658980/HIVE-6584.14.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5861 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/125/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/125/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-125/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658980 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076331#comment-14076331 ] Nick Dimiduk commented on HIVE-6584: Thanks for having a look, [~navis]. As it is, this patch requires HBASE-11137, which has not been pack-ported to 0.96. There's no technical reason not to back-port it, simply that 0.96 is in maintenance mode only and we're encouraging folks to upgrade from 0.96.2 to 0.98.x. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076742#comment-14076742 ] Sushanth Sowmyan commented on HIVE-6584: +1 on the patch. The one thing I'd change before committing is a word-wrap for the ASF header in conf/hive-default.xml.template, to retain old newline behaviour there. But otherwise, looks good to me. We'll need to update those TODOs in a bit once we upgrade to a newer version of HBase (0.98.5+) to pick up HBASE-11555. I would have suggested doing that in this patch itself, given that you're already bumping version up to 0.98.3, except that I see that that got resolved only recently, and I don't want to drag this patch out any further. Could you please open another jira to track that TODO? Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076782#comment-14076782 ] Nick Dimiduk commented on HIVE-6584: Thanks for having a look, [~sushanth]! bq. The one thing I'd change before committing is a word-wrap for the ASF header in conf/hive-default.xml.template, to retain old newline behaviour there. But otherwise, looks good to me. I believe HIVE-7496 drops conf/hive-default.xml.template all together. bq. We'll need to update those TODOs in a bit once we upgrade to a newer version of HBase (0.98.5+) to pick up HBASE-11555. I opened HIVE-7534 to track this. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076810#comment-14076810 ] Sushanth Sowmyan commented on HIVE-6584: Aha, sounds good. And thanks for creating the new jira. +1 on .13.patch. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077132#comment-14077132 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658238/HIVE-6584.13.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5786 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/82/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/82/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-82/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658238 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077304#comment-14077304 ] Navis commented on HIVE-6584: - [~ndimiduk] Could you refactor-out new codes in HBaseStorageHandler to a new utility class? Something like, {code} if (this.configureInputJobProps) { String snapshotName = HiveConf.getVar(jobConf, HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME); if (snapshotName != null) { HBaseSnapshotUtil.configure(jobConf, hbaseConf, jobProperties); } } {code} By doing this, we can use hbase-0.96.0 if snapshot is not configured. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075863#comment-14075863 ] Navis commented on HIVE-6584: - LGTM, and one question. Can we use pre-snapshot-hbase versions (hbase-0.96.0, for example) with this patch applied? TableSnapshotInputFormatImpl is referenced from HBaseStorageHandler and seemed to throw exception loading this class. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073958#comment-14073958 ] Carter Shanklin commented on HIVE-6584: --- I tested the .12 version of this patch on a 20 node cluster to see what sort of performance gains might be expected. I did a YCSB load of 180m rows and ran a few simple SQL queries in Hive while simultaneously running a YCSB 32-thread workload. TLDR the snapshot approach provides a nice performance boost of about 2.5x across different types of queries. The more fields I queried the better the performance was. |Query|Run|Workload|Snapshot Time (s)|Direct Time (s)|Time X Factor| |count(*)|1|a|191.019|488.915|2.56x| |count(*)|2|a|200.641|480.837|2.40x| |Aggregate 1 field|1|a|214.452|499.304|2.33x| |Aggregate 1 field|2|a|217.744|500.07|2.30x| |Aggregate 9 fields|1|a|281.514|802.799|2.85x| |Aggregate 9 fields|2|a|272.358|785.816|2.89x| |Aggregate 1 with GBY|1|a|248.874|558.143|2.24x| |Aggregate 1 with GBY|2|a|269.658|533.562|1.98x| |count(*)|1|b|194.739|482.261|2.48x| |count(*)|2|b|195.178|481.437|2.47x| |Aggregate 1 field|1|b|220.325|498.956|2.26x| |Aggregate 1 field|2|b|227.117|489.27|2.15x| |Aggregate 9 fields|1|b|276.939|817.118|2.95x| |Aggregate 9 fields|2|b|290.288|876.753|3.02x| |Aggregate 1 with GBY|1|b|244.025|563.884|2.31x| |Aggregate 1 with GBY|2|b|225.431|570.723|2.53x| |count(*)|1|c|194.568|502.79|2.58x| |count(*)|2|c|205.418|508.319|2.47x| |Aggregate 1 field|1|c|209.709|531.39|2.53x| |Aggregate 1 field|2|c|217.551|526.878|2.42x| |Aggregate 9 fields|1|c|267.93|756.476|2.82x| |Aggregate 9 fields|2|c|273.107|723.459|2.65x| |Aggregate 1 with GBY|1|c|240.991|526.053|2.18x| |Aggregate 1 with GBY|2|c|258.06|527.845|2.05x| For those not familiar with YCSB it uses a table with 9 fields, each filled with random junk 100 characters long. It defines workloads A-F, of which I've used A-C. The main point to note is the more of the fields my query fetches, the better it works in snapshot mode. The other thing I measured was throughput as reported by the YCSB tool. For the most part, when running the query over a snapshot the throughput was much better. |Workload|Tput Snapshot|Tput Direct|Throughput Improvement (Snapshot)| |a|83443.11623|56267.34148|48.30%| |b|45709.15011|31224.30376|46.39%| |c|46634.58415|43224.86383|7.89%| The throughput when using the snapshot seems to be close to the throughput when not scanning data, but I didn't run the baseline tests long enough to get anything conclusive here. In any event this looks like a good patch, especially considering its small size. The numbers quoted here are for reference only, YMMV, etc. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071119#comment-14071119 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12657183/HIVE-6584.12.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5752 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_fail_8 org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.TestJdbcDriver2.testParentReferences {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/9/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/9/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-9/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12657183 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071133#comment-14071133 ] Nick Dimiduk commented on HIVE-6584: I think these failed tests are unrelated. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069319#comment-14069319 ] Nick Dimiduk commented on HIVE-6584: HBASE-11557 will remove the requirement of specifying high-scale-lib.jar in HADOOP_CLASSPATH. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069649#comment-14069649 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12656962/HIVE-6584.10.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5750 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_pipe org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_fail_8 org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/889/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/889/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-889/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12656962 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069829#comment-14069829 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12657009/HIVE-6584.11.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5752 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_snapshot org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_fail_8 org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/893/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/893/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-893/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12657009 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066930#comment-14066930 ] Nick Dimiduk commented on HIVE-6584: [~tenggyut]: bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned inputformat will always be HiveHBaseTabelInputFormat (at least according to my test) My patch has the logic necessary to perform the switch at runtime. It does indeed work with the latest patch. bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check whether the HBase table exist or not, regardless the external table that hive gonna create is based on actual table or a snapshot. I'm not sure about this. Anyway that's not related to this feature. HBaseStorageHandler has no means of creating/dropping table snapshots. If you're seeing some issue here with StorageHandler DDL operations, please file a separate JIRA. bq. 3. the TableSnapshotRegionSplit used in TableSnapshotInputFormat is a direct subclass of InputSplit, not a subclass of tablesplit Nor should it be. The TableSnapshotRegionSplit is tracking different information from TableSplit. bq. 4. there is no public setScan method in TableSnapshotInputFormat.RecordReader, instead it will translate a string into a scan instance by using mapreduce.TableMapReduceUitls.convertStringToScan. Indeed, there is disparity between the HBase's mapred and mapreduce implementations. I opened HBASE-11179 for some cleanup on the HBase side. convertStringToScan details are HBase-private API as of 0.96. I opened HBASE-11163 to make necessary scanner support available in mapred API, but it's not yet been implemented. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065667#comment-14065667 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12656322/HIVE-6584.9.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5726 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/836/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/836/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-836/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12656322 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063719#comment-14063719 ] Nick Dimiduk commented on HIVE-6584: Ouch. Most of these tests run/pass for me locally. Will investigate further. I'm also curious why the {{explain}} commands in {{hbase_handler_snapshot.q}} are not including the Input/OutputFormats. [~sushanth], [~ashutoshc] any ideas on this latter issue? Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063004#comment-14063004 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655948/HIVE-6584.8.patch {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 5735 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_bulk org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_snapshot org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_generatehfiles_require_family_path org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/803/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/803/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-803/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12655948 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051499#comment-14051499 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12653797/HIVE-6584.7.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/672/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/672/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-672/ Messages: {noformat} This message was trimmed, see log for full details [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-contrib --- [INFO] Compiling 39 source files to /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/classes [WARNING] /data/hive-ptest/working/apache-svn-trunk-source/contrib/src/java/org/apache/hadoop/hive/contrib/udaf/example/UDAFExampleMax.java: Some input files use or override a deprecated API. [WARNING] /data/hive-ptest/working/apache-svn-trunk-source/contrib/src/java/org/apache/hadoop/hive/contrib/udaf/example/UDAFExampleMax.java: Recompile with -Xlint:deprecation for details. [WARNING] /data/hive-ptest/working/apache-svn-trunk-source/contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleStructPrint.java: /data/hive-ptest/working/apache-svn-trunk-source/contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleStructPrint.java uses unchecked or unsafe operations. [WARNING] /data/hive-ptest/working/apache-svn-trunk-source/contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleStructPrint.java: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-contrib --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/contrib/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-contrib --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-contrib --- [INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/test-classes [WARNING] /data/hive-ptest/working/apache-svn-trunk-source/contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java: /data/hive-ptest/working/apache-svn-trunk-source/contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java uses or overrides a deprecated API. [WARNING] /data/hive-ptest/working/apache-svn-trunk-source/contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java: Recompile with -Xlint:deprecation for details. [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-contrib --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-contrib --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/hive-contrib-0.14.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-contrib --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-contrib --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/hive-contrib-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-contrib/0.14.0-SNAPSHOT/hive-contrib-0.14.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/contrib/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-contrib/0.14.0-SNAPSHOT/hive-contrib-0.14.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive HBase Handler 0.14.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-hbase-handler --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-hbase-handler --- [INFO]
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045804#comment-14045804 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12652773/HIVE-6584.6.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/617/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/617/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-617/ Messages: {noformat} This message was trimmed, see log for full details [INFO] - [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[10,66] package org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[17,17] cannot find symbol symbol: class TableSnapshotRegionSplit location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[24,29] cannot find symbol symbol: class TableSnapshotRegionSplit location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[29,10] cannot find symbol symbol: class TableSnapshotRegionSplit location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[28,41] cannot find symbol symbol: class TableSnapshotInputFormat location: package org.apache.hadoop.hbase.mapreduce [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[31,66] package org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[33,47] cannot find symbol symbol: class ColumnMapping location: class org.apache.hadoop.hive.hbase.HBaseSerDe [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[76,3] cannot find symbol symbol: class TableSnapshotInputFormat location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotStorageHandler.java:[34,41] cannot find symbol symbol: class TableSnapshotInputFormatImpl location: package org.apache.hadoop.hbase.mapreduce [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotStorageHandler.java:[37,47] cannot find symbol symbol: class ColumnMapping location: class org.apache.hadoop.hive.hbase.HBaseSerDe [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[21,17] cannot find symbol symbol: class TableSnapshotRegionSplit location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[76,43] cannot find symbol symbol: class TableSnapshotInputFormat location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[87,10] cannot find symbol symbol: class ColumnMapping location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[89,54] incompatible types required: java.util.ListColumnMapping found:org.apache.hadoop.hive.hbase.ColumnMappings [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[108,9] cannot find symbol symbol: variable HiveHBaseInputFormatUtil location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat [ERROR]
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039340#comment-14039340 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651366/HIVE-6584.5.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/532/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/532/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-532/ Messages: {noformat} This message was trimmed, see log for full details [INFO] - [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[10,66] package org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[17,17] cannot find symbol symbol: class TableSnapshotRegionSplit location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[24,29] cannot find symbol symbol: class TableSnapshotRegionSplit location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[29,10] cannot find symbol symbol: class TableSnapshotRegionSplit location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[28,41] cannot find symbol symbol: class TableSnapshotInputFormat location: package org.apache.hadoop.hbase.mapreduce [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[31,66] package org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[33,47] cannot find symbol symbol: class ColumnMapping location: class org.apache.hadoop.hive.hbase.HBaseSerDe [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[76,3] cannot find symbol symbol: class TableSnapshotInputFormat location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotStorageHandler.java:[34,41] cannot find symbol symbol: class TableSnapshotInputFormatImpl location: package org.apache.hadoop.hbase.mapreduce [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotStorageHandler.java:[37,47] cannot find symbol symbol: class ColumnMapping location: class org.apache.hadoop.hive.hbase.HBaseSerDe [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[21,17] cannot find symbol symbol: class TableSnapshotRegionSplit location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[76,43] cannot find symbol symbol: class TableSnapshotInputFormat location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[87,10] cannot find symbol symbol: class ColumnMapping location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[89,54] incompatible types required: java.util.ListColumnMapping found:org.apache.hadoop.hive.hbase.ColumnMappings [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[108,9] cannot find symbol symbol: variable HiveHBaseInputFormatUtil location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat [ERROR]
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039467#comment-14039467 ] Nick Dimiduk commented on HIVE-6584: Can you regenerate your patch, rooted in the trunk directory instead of above it? That's the reason this patch fails the buildbot. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036299#comment-14036299 ] Sushanth Sowmyan commented on HIVE-6584: [~zjkylyzjkyly]/[~tenggyut] : Could you please upload a patch for your proposal so we might compare/contrast? Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033591#comment-14033591 ] zjkyly commented on HIVE-6584: -- Teng YuTong and I are colleagues. we have a patch for HIVE-6584 and a patch for HBASE-11163 , and we modify org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat(line 93) from: static class TableSnapshotRegionSplit extends InputSplit implements Writable to: public static class TableSnapshotRegionSplit extends InputSplit implements Writable we can run mapred on snapshot. mapred (count(1)) result : 2014-06-17 16:29:34,540 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 2467.57 sec 2014-06-17 16:29:35,578 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2468.35 sec MapReduce Total cumulative CPU time: 41 minutes 8 seconds 350 msec Ended Job = job_1402970116480_0015 MapReduce Jobs Launched: Job 0: Map: 64 Reduce: 1 Cumulative CPU: 2468.35 sec HDFS Read: 18334 HDFS Write: 9 SUCCESS Total MapReduce CPU Time Spent: 41 minutes 8 seconds 350 msec OK 65497163 Time taken: 429.647 seconds, Fetched: 1 row(s) hbase count result: Current count: 6540, row: user987684650651905350 65497163 row(s) in 1446.2310 seconds = 65497163 but hfile has different versions of the record. We can not solve this problem. So, we set the version of hbase table is 1, and run major compact before snapshot table. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033626#comment-14033626 ] zjkyly commented on HIVE-6584: -- hi Nick Dimiduk and Teng Yutong. what we can do at repsent is to scan all kv records of snapshot. we weren't able to solve the issue that hfile had different version for record, so we set default version of hbase table as 1, and run major compact before doing snapshot table. we don't know the open soure community, is it necessary to solve the problem of same kv version? shall we find all kv version ? or just find the latest one? We will try to solve the problem of multiple versions. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031894#comment-14031894 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650362/HIVE-6584.4.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5536 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/472/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/472/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-472/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650362 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030369#comment-14030369 ] Sushanth Sowmyan commented on HIVE-6584: Teng, I'd be interested in how your patch winds up being. If you mean that at runtime, the HBaseStorageHandler decides to deputize a subclass of itself to do the work, then that might work. But if you mean that your approach would lead to the user having to create a separate table (kinda like a view) that associates with a snapshot, then speaking from the hive side, I think I would prefer having only one SH to deal with, and having it decide what to do with various set parameters as opposed to creating separate hive tables with a different SH in hive. That way, using the same hive table definition, a query could decide to use a snapshot or not. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029497#comment-14029497 ] Nick Dimiduk commented on HIVE-6584: Thanks for the insightful comments, [~tenggyut]. bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned inputformat will always be HiveHBaseTabelInputFormat (at least according to my test) I was afraid of this in my initial design thinking, but my experiments proved otherwise. Can you elaborate on your tests? I'd like to reproduce this issue if I'm able. bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check whether the HBase table exist or not, regardless the external table that hive gonna create is based on actual table or a snapshot. I haven't yet looked at the use-case of consuming a snapshot for which there is no table in HBase. I planned to approach this kind of feature in follow-on work; the goal here is to get jus the basics working. bq. 3, 4 [snip] These are both true. bq. So I suggest adding a subclass of HBaseStorageHandler(and other necessary classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot situation. A goal of this patch is to be able to query snapshots created from online tables already registered with Hive using the HBaseStorageHandler. Implementing HBaseSnapshotStorageHandler requires a separate table registration for the snapshot. I think that's undesirable. Regarding the hbase snapshot situation, let's make it better on the HBase side. What do you recommend? Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029572#comment-14029572 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12649918/HIVE-6584.3.patch {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5610 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/446/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/446/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-446/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12649918 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028753#comment-14028753 ] Teng Yutong commented on HIVE-6584: --- hi nick, i have some concerns about these patches: 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned inputformat will always be HiveHBaseTabelInputFormat (at least according to my test) 2. in the method HBaseStorageHandler.preCreateTable, hive will check whether the HBase table exist or not, regardless the external table that hive gonna create is based on actual table or a snapshot. 3. the TableSnapshotRegionSplit used in TableSnapshotInputFormat is a direct subclass of InputSplit, not a subclass of tablesplit 4. there is no public setScan method in TableSnapshotInputFormat.RecordReader, instead it will translate a string into a scan instance by using mapreduce.TableMapReduceUitls.convertStringToScan. So I suggest adding a subclass of HBaseStorageHandler(and other necessary classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot situation. In fact, I have already finished the necessary code changes and done some tests. The tests show that my modification works out. i will upload my patch soon Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004558#comment-14004558 ] Nick Dimiduk commented on HIVE-6584: Thanks [~tenggyut]. Any thoughts regarding how to test this? Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924317#comment-13924317 ] Swarnim Kulkarni commented on HIVE-6584: +1 on this one. This should be very nice addition to the existing integration. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)