subject:"\[jira\] \[Commented\] \(HIVE\-6584\) Add HiveHBaseTableSnapshotInputFormat"

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-05 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085973#comment-14085973
 ] 

Lefty Leverenz commented on HIVE-6584:
--

Does this need to be documented in the wiki?

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-05 Thread Carter Shanklin (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086306#comment-14086306
]

Carter Shanklin commented on HIVE-6584:
---

I had to read the source code to get it to work, so I vote yes.

Using Hive over HBase snapshots requires 2 variables to be set,
hive.hbase.snapshot.name - The name of the HBase snapshot to be used when
reading the HBase data.
hive.hbase.snapshot.restoredir - A temporary directory into which the hbase
snapshot is restored when queried using hive.hbase.snapshot.name. A number of
directories and small files will be created under this directory, proportional
to the number of regions in the HBase table. The table data itself will not be
copied under this directory, only metadata. After query execution is complete,
this directory can be removed.

Example:
set hive.hbase.snapshot.name=snapshot_2014_08_03;
set hive.hbase.snapshot.restoredir=/tmp/restore
select count(*) from hbase_table;
After the job is complete, /tmp/restore and its subdirectories can be deleted.

[~ndimiduk] talked about making hive.hbase.snapshot.restoredir an optional
setting, he can comment whether he implemented this or not.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch,
HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch,
HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch,
HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch,
HIVE-6584.8.patch, HIVE-6584.9.patch

HBASE-8369 provided mapreduce support for reading from HBase table snapsopts.
This allows a MR job to consume a stable, read-only view of an HBase table
directly off of HDFS. Bypassing the online region server API provides a nice
performance boost for the full scan. HBASE-10642 is backporting that feature
to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's
available, we should add an input format. A follow-on patch could work out
how to integrate this functionality into the StorageHandler, similar to how
HIVE-6473 integrates the HFileOutputFormat into existing table definitions.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-05 Thread Carter Shanklin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086308#comment-14086308
 ] 

Carter Shanklin commented on HIVE-6584:
---

I can't edit my comment but due to some unwanted formatting let me clarify the 
example:

{code}
Example:
set hive.hbase.snapshot.name=snapshot_2014_08_03;
set hive.hbase.snapshot.restoredir=/tmp/restore
select count(*) from hbase_table;
After the job is complete, /tmp/restore and its subdirectories can be deleted.
{code}

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-05 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086367#comment-14086367
]

Nick Dimiduk commented on HIVE-6584:

Restore location is optional. It defaults to /tmp. The restore process creates
a uniquely named (random uuid) directory under this path for any give restore,
so users who never set this value will not conflict with each other.

It would be nice if hive had some kind of post-job hook that could be used to
clean up the restoredir artifacts after the input format is finished with them.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-04 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084852#comment-14084852
]

Nick Dimiduk commented on HIVE-6584:

Thanks folks! Any chance of getting a commit this week? :)

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-04 Thread Sushanth Sowmyan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085140#comment-14085140
]

Sushanth Sowmyan commented on HIVE-6584:

Committed to trunk. Thanks Nick, and Thanks Navis for the review as well. :)

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-03 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084227#comment-14084227
 ] 

Navis commented on HIVE-6584:
-

+1

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-31 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081249#comment-14081249
 ] 

Nick Dimiduk commented on HIVE-6584:


I updated RB as well, the interesting addition is 
HBaseTableSnapshotInputFormatUtil.java and its use: 
https://reviews.apache.org/r/23824/diff/1-2/#7

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-31 Thread Sushanth Sowmyan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081518#comment-14081518
]

Sushanth Sowmyan commented on HIVE-6584:

I like the changes made to .14.patch, definitely cleaner, and will make
resolution of HIVE-7534 trivial. I also like the new error message, makes it
more obvious for the end-user as to what they need.

+1.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-31 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081645#comment-14081645
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658980/HIVE-6584.14.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5861 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/125/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/125/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-125/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658980

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-28 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076331#comment-14076331
]

Nick Dimiduk commented on HIVE-6584:

Thanks for having a look, [~navis]. As it is, this patch requires HBASE-11137,
which has not been pack-ported to 0.96. There's no technical reason not to
back-port it, simply that 0.96 is in maintenance mode only and we're
encouraging folks to upgrade from 0.96.2 to 0.98.x.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch,
HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch,
HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch,
HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-28 Thread Sushanth Sowmyan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076742#comment-14076742
]

Sushanth Sowmyan commented on HIVE-6584:

+1 on the patch.

The one thing I'd change before committing is a word-wrap for the ASF header in
conf/hive-default.xml.template, to retain old newline behaviour there. But
otherwise, looks good to me.

We'll need to update those TODOs in a bit once we upgrade to a newer version of
HBase (0.98.5+) to pick up HBASE-11555. I would have suggested doing that in
this patch itself, given that you're already bumping version up to 0.98.3,
except that I see that that got resolved only recently, and I don't want to
drag this patch out any further. Could you please open another jira to track
that TODO?

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-28 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076782#comment-14076782
]

Nick Dimiduk commented on HIVE-6584:

Thanks for having a look, [~sushanth]!

bq. The one thing I'd change before committing is a word-wrap for the ASF
header in conf/hive-default.xml.template, to retain old newline behaviour
there. But otherwise, looks good to me.

I believe HIVE-7496 drops conf/hive-default.xml.template all together.

bq. We'll need to update those TODOs in a bit once we upgrade to a newer
version of HBase (0.98.5+) to pick up HBASE-11555.

I opened HIVE-7534 to track this.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-28 Thread Sushanth Sowmyan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076810#comment-14076810
]

Sushanth Sowmyan commented on HIVE-6584:

Aha, sounds good. And thanks for creating the new jira.

+1 on .13.patch.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch,
HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch,
HIVE-6584.13.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch,
HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch,
HIVE-6584.9.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077132#comment-14077132
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658238/HIVE-6584.13.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5786 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/82/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/82/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-82/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658238

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, 
 HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, 
 HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-28 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077304#comment-14077304
 ] 

Navis commented on HIVE-6584:
-

[~ndimiduk] Could you refactor-out new codes in HBaseStorageHandler to a new 
utility class? Something like,
{code}
if (this.configureInputJobProps) {
  String snapshotName = HiveConf.getVar(jobConf, 
HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME);
  if (snapshotName != null) {
HBaseSnapshotUtil.configure(jobConf, hbaseConf, jobProperties);
  }
}
{code}
By doing this, we can use hbase-0.96.0 if snapshot is not configured.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, 
 HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, 
 HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-27 Thread Navis (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075863#comment-14075863
]

Navis commented on HIVE-6584:
-

LGTM, and one question. Can we use pre-snapshot-hbase versions (hbase-0.96.0,
for example) with this patch applied? TableSnapshotInputFormatImpl is
referenced from HBaseStorageHandler and seemed to throw exception loading this
class.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-24 Thread Carter Shanklin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073958#comment-14073958
 ] 

Carter Shanklin commented on HIVE-6584:
---

I tested the .12 version of this patch on a 20 node cluster to see what sort of 
performance gains might be expected.

I did a YCSB load of 180m rows and ran a few simple SQL queries in Hive while 
simultaneously running a YCSB 32-thread workload.

TLDR the snapshot approach provides a nice performance boost of about 2.5x 
across different types of queries. The more fields I queried the better the 
performance was.

|Query|Run|Workload|Snapshot Time (s)|Direct Time (s)|Time X Factor|
|count(*)|1|a|191.019|488.915|2.56x|
|count(*)|2|a|200.641|480.837|2.40x|
|Aggregate 1 field|1|a|214.452|499.304|2.33x|
|Aggregate 1 field|2|a|217.744|500.07|2.30x|
|Aggregate 9 fields|1|a|281.514|802.799|2.85x|
|Aggregate 9 fields|2|a|272.358|785.816|2.89x|
|Aggregate 1 with GBY|1|a|248.874|558.143|2.24x|
|Aggregate 1 with GBY|2|a|269.658|533.562|1.98x|
|count(*)|1|b|194.739|482.261|2.48x|
|count(*)|2|b|195.178|481.437|2.47x|
|Aggregate 1 field|1|b|220.325|498.956|2.26x|
|Aggregate 1 field|2|b|227.117|489.27|2.15x|
|Aggregate 9 fields|1|b|276.939|817.118|2.95x|
|Aggregate 9 fields|2|b|290.288|876.753|3.02x|
|Aggregate 1 with GBY|1|b|244.025|563.884|2.31x|
|Aggregate 1 with GBY|2|b|225.431|570.723|2.53x|
|count(*)|1|c|194.568|502.79|2.58x|
|count(*)|2|c|205.418|508.319|2.47x|
|Aggregate 1 field|1|c|209.709|531.39|2.53x|
|Aggregate 1 field|2|c|217.551|526.878|2.42x|
|Aggregate 9 fields|1|c|267.93|756.476|2.82x|
|Aggregate 9 fields|2|c|273.107|723.459|2.65x|
|Aggregate 1 with GBY|1|c|240.991|526.053|2.18x|
|Aggregate 1 with GBY|2|c|258.06|527.845|2.05x|

For those not familiar with YCSB it uses a table with 9 fields, each filled 
with random junk 100 characters long. It defines workloads A-F, of which I've 
used A-C.

The main point to note is the more of the fields my query fetches, the better 
it works in snapshot mode.

The other thing I measured was throughput as reported by the YCSB tool. For the 
most part, when running the query over a snapshot the throughput was much 
better.
|Workload|Tput Snapshot|Tput Direct|Throughput Improvement (Snapshot)|
|a|83443.11623|56267.34148|48.30%|
|b|45709.15011|31224.30376|46.39%|
|c|46634.58415|43224.86383|7.89%|

The throughput when using the snapshot seems to be close to the throughput when 
not scanning data, but I didn't run the baseline tests long enough to get 
anything conclusive here.

In any event this looks like a good patch, especially considering its small 
size.

The numbers quoted here are for reference only, YMMV, etc.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, 
 HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-22 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071119#comment-14071119
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12657183/HIVE-6584.12.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5752 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_fail_8
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.TestJdbcDriver2.testParentReferences
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/9/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/9/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-9/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12657183

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, 
 HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-22 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071133#comment-14071133
 ] 

Nick Dimiduk commented on HIVE-6584:


I think these failed tests are unrelated.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, 
 HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-21 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069319#comment-14069319
 ] 

Nick Dimiduk commented on HIVE-6584:


HBASE-11557 will remove the requirement of specifying high-scale-lib.jar in 
HADOOP_CLASSPATH.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, 
 HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, 
 HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069649#comment-14069649
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12656962/HIVE-6584.10.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5750 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_pipe
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_fail_8
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/889/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/889/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-889/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12656962

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069829#comment-14069829
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12657009/HIVE-6584.11.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5752 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_snapshot
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_fail_8
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/893/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/893/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-893/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12657009

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-18 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066930#comment-14066930
]

Nick Dimiduk commented on HIVE-6584:

[~tenggyut]:

bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned
inputformat will always be HiveHBaseTabelInputFormat (at least according to my
test)

My patch has the logic necessary to perform the switch at runtime. It does
indeed work with the latest patch.

bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check
whether the HBase table exist or not, regardless the external table that hive
gonna create is based on actual table or a snapshot.

I'm not sure about this. Anyway that's not related to this feature.
HBaseStorageHandler has no means of creating/dropping table snapshots. If
you're seeing some issue here with StorageHandler DDL operations, please file a
separate JIRA.

bq. 3. the TableSnapshotRegionSplit used in TableSnapshotInputFormat is a
direct subclass of InputSplit, not a subclass of tablesplit

Nor should it be. The TableSnapshotRegionSplit is tracking different
information from TableSplit.

bq. 4. there is no public setScan method in
TableSnapshotInputFormat.RecordReader, instead it will translate a string into
a scan instance by using mapreduce.TableMapReduceUitls.convertStringToScan.

Indeed, there is disparity between the HBase's mapred and mapreduce
implementations. I opened HBASE-11179 for some cleanup on the HBase side.
convertStringToScan details are HBase-private API as of 0.96. I opened
HBASE-11163 to make necessary scanner support available in mapred API, but it's
not yet been implemented.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch,
HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch,
HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065667#comment-14065667
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12656322/HIVE-6584.9.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5726 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/836/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/836/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-836/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12656322

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, 
 HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-16 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063719#comment-14063719
 ] 

Nick Dimiduk commented on HIVE-6584:


Ouch. Most of these tests run/pass for me locally. Will investigate further. 
I'm also curious why the {{explain}} commands in {{hbase_handler_snapshot.q}} 
are not including the Input/OutputFormats.

[~sushanth], [~ashutoshc] any ideas on this latter issue?

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, 
 HIVE-6584.7.patch, HIVE-6584.8.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063004#comment-14063004
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655948/HIVE-6584.8.patch

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 5735 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_bulk
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_snapshot
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_generatehfiles_require_family_path
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/803/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/803/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-803/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12655948

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, 
 HIVE-6584.7.patch, HIVE-6584.8.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051499#comment-14051499
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653797/HIVE-6584.7.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/672/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/672/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-672/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-contrib 
---
[INFO] Compiling 39 source files to 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/target/classes
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/src/java/org/apache/hadoop/hive/contrib/udaf/example/UDAFExampleMax.java:
 Some input files use or override a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/src/java/org/apache/hadoop/hive/contrib/udaf/example/UDAFExampleMax.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleStructPrint.java:
 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleStructPrint.java
 uses unchecked or unsafe operations.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleStructPrint.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-contrib ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-contrib ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/target/tmp/conf
 [copy] Copying 5 files to 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-contrib ---
[INFO] Compiling 2 source files to 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/target/test-classes
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java:
 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java
 uses or overrides a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java:
 Recompile with -Xlint:deprecation for details.
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-contrib ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-contrib ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/target/hive-contrib-0.14.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-contrib ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-contrib ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/target/hive-contrib-0.14.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-contrib/0.14.0-SNAPSHOT/hive-contrib-0.14.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/contrib/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/hive-contrib/0.14.0-SNAPSHOT/hive-contrib-0.14.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive HBase Handler 0.14.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-hbase-handler ---
[INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-hbase-handler ---
[INFO]

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-27 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045804#comment-14045804
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12652773/HIVE-6584.6.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/617/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/617/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-617/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] -
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[10,66]
 package org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat does not 
exist
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[17,17]
 cannot find symbol
  symbol:   class TableSnapshotRegionSplit
  location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[24,29]
 cannot find symbol
  symbol:   class TableSnapshotRegionSplit
  location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[29,10]
 cannot find symbol
  symbol:   class TableSnapshotRegionSplit
  location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[28,41]
 cannot find symbol
  symbol:   class TableSnapshotInputFormat
  location: package org.apache.hadoop.hbase.mapreduce
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[31,66]
 package org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat does not 
exist
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[33,47]
 cannot find symbol
  symbol:   class ColumnMapping
  location: class org.apache.hadoop.hive.hbase.HBaseSerDe
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[76,3]
 cannot find symbol
  symbol:   class TableSnapshotInputFormat
  location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotStorageHandler.java:[34,41]
 cannot find symbol
  symbol:   class TableSnapshotInputFormatImpl
  location: package org.apache.hadoop.hbase.mapreduce
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotStorageHandler.java:[37,47]
 cannot find symbol
  symbol:   class ColumnMapping
  location: class org.apache.hadoop.hive.hbase.HBaseSerDe
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[21,17]
 cannot find symbol
  symbol:   class TableSnapshotRegionSplit
  location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[76,43]
 cannot find symbol
  symbol:   class TableSnapshotInputFormat
  location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[87,10]
 cannot find symbol
  symbol:   class ColumnMapping
  location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[89,54]
 incompatible types
  required: java.util.ListColumnMapping
  found:org.apache.hadoop.hive.hbase.ColumnMappings
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[108,9]
 cannot find symbol
  symbol:   variable HiveHBaseInputFormatUtil
  location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat
[ERROR]

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039340#comment-14039340
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651366/HIVE-6584.5.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/532/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/532/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-532/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] -
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[10,66]
 package org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat does not 
exist
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[17,17]
 cannot find symbol
  symbol:   class TableSnapshotRegionSplit
  location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[24,29]
 cannot find symbol
  symbol:   class TableSnapshotRegionSplit
  location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[29,10]
 cannot find symbol
  symbol:   class TableSnapshotRegionSplit
  location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[28,41]
 cannot find symbol
  symbol:   class TableSnapshotInputFormat
  location: package org.apache.hadoop.hbase.mapreduce
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[31,66]
 package org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat does not 
exist
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[33,47]
 cannot find symbol
  symbol:   class ColumnMapping
  location: class org.apache.hadoop.hive.hbase.HBaseSerDe
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[76,3]
 cannot find symbol
  symbol:   class TableSnapshotInputFormat
  location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotStorageHandler.java:[34,41]
 cannot find symbol
  symbol:   class TableSnapshotInputFormatImpl
  location: package org.apache.hadoop.hbase.mapreduce
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotStorageHandler.java:[37,47]
 cannot find symbol
  symbol:   class ColumnMapping
  location: class org.apache.hadoop.hive.hbase.HBaseSerDe
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSnapshotSplit.java:[21,17]
 cannot find symbol
  symbol:   class TableSnapshotRegionSplit
  location: class org.apache.hadoop.hive.hbase.HBaseSnapshotSplit
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[76,43]
 cannot find symbol
  symbol:   class TableSnapshotInputFormat
  location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[87,10]
 cannot find symbol
  symbol:   class ColumnMapping
  location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[89,54]
 incompatible types
  required: java.util.ListColumnMapping
  found:org.apache.hadoop.hive.hbase.ColumnMappings
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java:[108,9]
 cannot find symbol
  symbol:   variable HiveHBaseInputFormatUtil
  location: class org.apache.hadoop.hive.hbase.HiveHBaseTableSnapshotInputFormat
[ERROR]

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-20 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039467#comment-14039467
 ] 

Nick Dimiduk commented on HIVE-6584:


Can you regenerate your patch, rooted in the trunk directory instead of above 
it? That's the reason this patch fails the buildbot.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-18 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036299#comment-14036299
 ] 

Sushanth Sowmyan commented on HIVE-6584:


[~zjkylyzjkyly]/[~tenggyut] : Could you please upload a patch for your proposal 
so we might compare/contrast?

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-17 Thread zjkyly (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033591#comment-14033591
 ] 

zjkyly commented on HIVE-6584:
--

Teng YuTong and I are colleagues. we have a patch for HIVE-6584 and a patch for 
HBASE-11163 ,   and we modify
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat（line 93）
from: static class TableSnapshotRegionSplit extends InputSplit implements 
Writable
to: public static class TableSnapshotRegionSplit extends InputSplit implements 
Writable 

we can run mapred on snapshot. mapred (count(1)) result :

2014-06-17 16:29:34,540 Stage-1 map = 100%,  reduce = 32%, Cumulative CPU 
2467.57 sec
2014-06-17 16:29:35,578 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
2468.35 sec
MapReduce Total cumulative CPU time: 41 minutes 8 seconds 350 msec
Ended Job = job_1402970116480_0015
MapReduce Jobs Launched: 
Job 0: Map: 64  Reduce: 1   Cumulative CPU: 2468.35 sec   HDFS Read: 18334 HDFS 
Write: 9 SUCCESS
Total MapReduce CPU Time Spent: 41 minutes 8 seconds 350 msec
OK
65497163
Time taken: 429.647 seconds, Fetched: 1 row(s)

hbase count result:
Current count: 6540, row: user987684650651905350


65497163 row(s) in 1446.2310 seconds
= 65497163

but hfile has different versions of the record. We can not solve this problem. 
So, we set  the version of hbase table is 1, and run major compact before 
snapshot table.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-17 Thread zjkyly (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033626#comment-14033626
]

zjkyly commented on HIVE-6584:
--

hi Nick Dimiduk and Teng Yutong. what we can do at repsent is to scan all kv
records of snapshot.
we weren't able to solve the issue that hfile had different version for record,
so we set default version of hbase table as 1, and run major compact before
doing snapshot table.
we don't know the open soure community, is it necessary to solve the problem of
same kv version? shall we find all kv version ? or just find the latest one?
We will try to solve the problem of multiple versions.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch,
HIVE-6584.3.patch, HIVE-6584.4.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031894#comment-14031894
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650362/HIVE-6584.4.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5536 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/472/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/472/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-472/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650362

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-13 Thread Sushanth Sowmyan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030369#comment-14030369
]

Sushanth Sowmyan commented on HIVE-6584:

Teng, I'd be interested in how your patch winds up being.

If you mean that at runtime, the HBaseStorageHandler decides to deputize a
subclass of itself to do the work, then that might work. But if you mean that
your approach would lead to the user having to create a separate table (kinda
like a view) that associates with a snapshot, then speaking from the hive side,
I think I would prefer having only one SH to deal with, and having it decide
what to do with various set parameters as opposed to creating separate hive
tables with a different SH in hive. That way, using the same hive table
definition, a query could decide to use a snapshot or not.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch,
HIVE-6584.3.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-12 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029497#comment-14029497
]

Nick Dimiduk commented on HIVE-6584:

Thanks for the insightful comments, [~tenggyut].

bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned
inputformat will always be HiveHBaseTabelInputFormat (at least according to my
test)

I was afraid of this in my initial design thinking, but my experiments proved
otherwise. Can you elaborate on your tests? I'd like to reproduce this issue if
I'm able.

I haven't yet looked at the use-case of consuming a snapshot for which there is
no table in HBase. I planned to approach this kind of feature in follow-on
work; the goal here is to get jus the basics working.

bq. 3, 4 [snip]

These are both true.

bq. So I suggest adding a subclass of HBaseStorageHandler(and other necessary
classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot
situation.

A goal of this patch is to be able to query snapshots created from online
tables already registered with Hive using the HBaseStorageHandler. Implementing
HBaseSnapshotStorageHandler requires a separate table registration for the
snapshot. I think that's undesirable. Regarding the hbase snapshot situation,
let's make it better on the HBase side. What do you recommend?

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch,
HIVE-6584.3.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029572#comment-14029572
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649918/HIVE-6584.3.patch

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5610 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/446/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/446/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-446/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649918

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-11 Thread Teng Yutong (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028753#comment-14028753
]

Teng Yutong commented on HIVE-6584:
---

hi nick,

i have some concerns about these patches:
1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned
inputformat will always be HiveHBaseTabelInputFormat (at least according to my
test)
2. in the method HBaseStorageHandler.preCreateTable, hive will check whether
the HBase table exist or not, regardless the external table that hive gonna
create is based on actual table or a snapshot.
3. the TableSnapshotRegionSplit used in TableSnapshotInputFormat is a direct
subclass of InputSplit, not a subclass of tablesplit
4. there is no public setScan method in TableSnapshotInputFormat.RecordReader,
instead it will translate a string into a scan instance by using
mapreduce.TableMapReduceUitls.convertStringToScan.

So I suggest adding a subclass of HBaseStorageHandler(and other necessary
classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot
situation.

In fact, I have already finished the necessary code changes and done some
tests. The tests show that my modification works out.

i will upload my patch soon

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch,
HIVE-6584.3.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-05-21 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004558#comment-14004558
 ] 

Nick Dimiduk commented on HIVE-6584:


Thanks [~tenggyut]. Any thoughts regarding how to test this?

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-03-07 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924317#comment-13924317
 ] 

Swarnim Kulkarni commented on HIVE-6584:


+1 on this one. This should be very nice addition to the existing integration.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk

 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

41 matches

Mail list logo