[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2018-07-09 Thread Colin Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537383#comment-16537383
 ] 

Colin Williams commented on HIVE-12316:
---

Hi Alan. What I meant was the frameworks mentioned here: 
[https://cwiki.apache.org/confluence/display/Hive/Unit+Testing+Hive+SQL] aside 
from [http://dev.bizo.com/2011/04/hive-unit-testing.html] which I haven't 
investigated yet weren't supporting recent hive versions. However 
[https://github.com/klarna/HiveRunner] just pushed a Hive 2.3 release over the 
weekend.

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this 
> should work with the scaling of inputs.  The dev should be able to provide 
> expected results or custom expected result generation in cases where 
> auto-generation doesn't make sense.
> * Access to the query plan should be available as an API in the tests so that 
> golden files of explain output are not required.
> * This should run in maven, junit, and java so that developers do not need to 
> manage yet another framework.
> * It should be possible to simulate user data (based on schema and 
> statistics) and quickly incorporate user queries so that tests from user 
> scenarios can be quickly incorporated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2018-07-09 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537269#comment-16537269
 ] 

Alan Gates commented on HIVE-12316:
---

I haven't pushed this patch forward, so yes, it's dead.

But I'm curious what you mean by saying that the current frameworks don't work 
with 3.  Hive's unit tests certainly work, they just take forever.

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this 
> should work with the scaling of inputs.  The dev should be able to provide 
> expected results or custom expected result generation in cases where 
> auto-generation doesn't make sense.
> * Access to the query plan should be available as an API in the tests so that 
> golden files of explain output are not required.
> * This should run in maven, junit, and java so that developers do not need to 
> manage yet another framework.
> * It should be possible to simulate user data (based on schema and 
> statistics) and quickly incorporate user queries so that tests from user 
> scenarios can be quickly incorporated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2018-07-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535964#comment-16535964
 ] 

Hive QA commented on HIVE-12316:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776619/HIVE-12316.5.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12458/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12458/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12458/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2018-07-08 02:14:21.668
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-12458/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-07-08 02:14:21.671
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 406cde9 HIVE-20085 : Druid-Hive (managed) table creation fails 
with strict managed table checks: Table is marked as a managed table but is not 
transactional (Nishant Bangarwa via Ashutosh Chauhan)
+ git clean -f -d
Removing itests/${project.basedir}/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 406cde9 HIVE-20085 : Druid-Hive (managed) table creation fails 
with strict managed table checks: Table is marked as a managed table but is not 
transactional (Nishant Bangarwa via Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-07-08 02:14:23.173
+ rm -rf ../yetus_PreCommit-HIVE-Build-12458
+ mkdir ../yetus_PreCommit-HIVE-Build-12458
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-12458
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-12458/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java:387
Falling back to three-way merge...
Applied patch to 
'hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java'
 with conflicts.
error: 
itests/hive-unit/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java: does 
not exist in index
error: patch failed: itests/pom.xml:42
Falling back to three-way merge...
Applied patch to 'itests/pom.xml' with conflicts.
error: streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java: 
does not exist in index
error: hive-unit/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java: does 
not exist in index
error: patch failed: pom.xml:42
Falling back to three-way merge...
Applied patch to 'pom.xml' with conflicts.
fatal: git diff header lacks filename information when removing 2 leading 
pathname components (line 16188)
The patch does not appear to apply with p0, p1, or p2
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-12458
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776619 - PreCommit-HIVE-Build

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, test

[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2018-07-06 Thread Colin Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535522#comment-16535522
 ] 

Colin Williams commented on HIVE-12316:
---

I was looking at 
[https://cwiki.apache.org/confluence/display/Hive/Unit+Testing+Hive+SQL] and 
stumbled upon this. But it looks like Hive 3 has been released. Is this a dead 
effort? I'm left wondering what's the best way to unit test hive, as the 
mentioned frameworks in the wiki don't seem to support recent versions of Hive.

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this 
> should work with the scaling of inputs.  The dev should be able to provide 
> expected results or custom expected result generation in cases where 
> auto-generation doesn't make sense.
> * Access to the query plan should be available as an API in the tests so that 
> golden files of explain output are not required.
> * This should run in maven, junit, and java so that developers do not need to 
> manage yet another framework.
> * It should be possible to simulate user data (based on schema and 
> statistics) and quickly incorporate user queries so that tests from user 
> scenarios can be quickly incorporated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2017-03-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942105#comment-15942105
 ] 

Hive QA commented on HIVE-12316:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776619/HIVE-12316.5.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4380/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4380/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4380/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-03-26 02:13:05.191
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4380/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-03-26 02:13:05.194
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at b4a8af9 HIVE-16274: Support tuning of NDV of columns using 
lower/upper bounds (Pengcheng Xiong, reviewed by Jason Dere)
+ git clean -f -d
Removing hbase-handler/src/java/org/apache/hadoop/hive/hbase/phoenix/
Removing 
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBaseWrapper.java
Removing serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java.orig
Removing serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDate.java
Removing 
serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioTimestamp.java
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at b4a8af9 HIVE-16274: Support tuning of NDV of columns using 
lower/upper bounds (Pengcheng Xiong, reviewed by Jason Dere)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-03-26 02:13:06.294
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java:387
error: 
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java:
 patch does not apply
error: 
itests/hive-unit/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java: No 
such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776619 - PreCommit-HIVE-Build

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  

[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2017-03-25 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942002#comment-15942002
 ] 

Pengcheng Xiong commented on HIVE-12316:


Hello, I am deferring this to Hive 3.0 as we are going to cut the first RC and 
it is not marked as blocker. Please feel free to commit to the branch if this 
can be resolved before the release.

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this 
> should work with the scaling of inputs.  The dev should be able to provide 
> expected results or custom expected result generation in cases where 
> auto-generation doesn't make sense.
> * Access to the query plan should be available as an API in the tests so that 
> golden files of explain output are not required.
> * This should run in maven, junit, and java so that developers do not need to 
> manage yet another framework.
> * It should be possible to simulate user data (based on schema and 
> statistics) and quickly incorporate user queries so that tests from user 
> scenarios can be quickly incorporated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2016-05-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305223#comment-15305223
 ] 

Hive QA commented on HIVE-12316:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776619/HIVE-12316.5.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/412/testReport
Console output: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/412/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-412/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.8.0_25 ]]
+ export JAVA_HOME=/usr/java/jdk1.8.0_25
+ JAVA_HOME=/usr/java/jdk1.8.0_25
+ export 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-412/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 0a24c88 HIVE-13084: Vectorization add support for PROJECTION 
Multi-AND/OR (Matt McCline, reviewed by Sergey Shelukhin)
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 0a24c88 HIVE-13084: Vectorization add support for PROJECTION 
Multi-AND/OR (Matt McCline, reviewed by Sergey Shelukhin)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776619 - PreCommit-HIVE-MASTER-Build

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when pe

[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2016-05-25 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300378#comment-15300378
 ] 

Jesus Camacho Rodriguez commented on HIVE-12316:


Removing 2.1.0 target. Please feel free to commit to branch-2.1 anyway and fix 
for 2.1.0 if this happens before the release.

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this 
> should work with the scaling of inputs.  The dev should be able to provide 
> expected results or custom expected result generation in cases where 
> auto-generation doesn't make sense.
> * Access to the query plan should be available as an API in the tests so that 
> golden files of explain output are not required.
> * This should run in maven, junit, and java so that developers do not need to 
> manage yet another framework.
> * It should be possible to simulate user data (based on schema and 
> statistics) and quickly incorporate user queries so that tests from user 
> scenarios can be quickly incorporated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050627#comment-15050627
 ] 

Hive QA commented on HIVE-12316:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776619/HIVE-12316.5.patch

{color:green}SUCCESS:{color} +1 due to 25 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 9985 tests 
executed
*Failed tests:*
{noformat}
TestConf - did not produce a TEST-*.xml file
TestHWISessionManager - did not produce a TEST-*.xml file
TestManager - did not produce a TEST-*.xml file
TestTable - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_date_1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6301/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6301/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6301/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776619 - PreCommit-HIVE-TRUNK-Build

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this 
> should w

[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2015-12-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048074#comment-15048074
 ] 

Hive QA commented on HIVE-12316:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776360/HIVE-12316.4.patch

{color:green}SUCCESS:{color} +1 due to 26 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 9991 tests 
executed
*Failed tests:*
{noformat}
TestConf - did not produce a TEST-*.xml file
TestHWISessionManager - did not produce a TEST-*.xml file
TestManager - did not produce a TEST-*.xml file
TestTable - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.test.capybara.generated.TestPandora.q1
org.apache.hive.test.capybara.generated.TestPandora.q2
org.apache.hive.test.capybara.generated.TestPandora.q3
org.apache.hive.test.capybara.generated.TestPandora.q4
org.apache.hive.test.capybara.generated.TestPandora.q5
org.apache.hive.test.capybara.generated.TestPandora.q6
org.apache.hive.test.capybara.generated.TestPandora.q7
org.apache.hive.test.capybara.qconverted.TestUnicode.unicodeData
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6287/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6287/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6287/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 21 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776360 - PreCommit-HIVE-TRUNK-Build

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12316.2.patch, HIVE-12316.3.patch, 
> HIVE-12316.4.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated 

[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2015-12-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043676#comment-15043676
 ] 

Hive QA commented on HIVE-12316:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12775892/HIVE-12316.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6260/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6260/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6260/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-llap-server ---
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ 
hive-llap-server ---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/llap-server/src/gen/protobuf/gen-java
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/llap-server/src/gen/thrift/gen-javabean
 added.
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-llap-server ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-llap-server ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-llap-server ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
hive-llap-server ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-llap-server ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 4 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-llap-server ---
[INFO] Executing tasks

main:
   [delete] Deleting directory 
/data/hive-ptest/working/apache-github-source-source/llap-server/target/tmp
   [delete] Deleting directory 
/data/hive-ptest/working/apache-github-source-source/llap-server/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/llap-server/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/llap-server/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/llap-server/target/tmp/conf
 [copy] Copying 14 files to 
/data/hive-ptest/working/apache-github-source-source/llap-server/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-llap-server ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-llap-server ---
[INFO] 
[INFO] 
[INFO] Building Hive Shims Aggregator 2.1.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-shims-aggregator ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-shims-aggregator ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ 
hive-shims-aggregator ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ 
hive-shims-aggregator ---
[INFO] Executing tasks

main:
   [delete] Deleting directory 
/data/hive-ptest/working/apache-github-source-source/shims/target/tmp
   [delete] Deleting directory 
/data/hive-ptest/working/apache-github-source-source/shims/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/target/tmp/conf
 [copy] Copying 14 files to 
/data/hive-ptest/working/apache-github-source-source/shims/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] 
[INFO] Building Hive TestUtils 2.1.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-testutils ---
[INFO] 
[INFO] --- maven-remote-resourc

[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2015-11-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990403#comment-14990403
 ] 

Alan Gates commented on HIVE-12316:
---

Sure.  For an idea of what tests look like take a look in the examples package 
under test.  This contains examples for both queries and explain.  The 
qconverted package has a direct translation of skewjoin_union_remove_[12].q

In terms of similarities with qfiles, this seeks to support tests for explain, 
SQL queries, and and allow developers to set config values, etc.

For differences, there are several that are key:
* A single test can be written and used to test various combinations of Hive 
features (security on/off, access via cli/jdbc, metastore in rdbms/hbase, etc.).
* The tests can be run on a users laptop with a few kilobytes of data or a 
cluster with up to terabytes of data.  Scaling and expected results generation 
are handled.
* This runs all in JUnit and Java.  There's no need for a separate 
infrastructure (QTestUtils et al)
* No more golden files.  For most queries expected results can be generated by 
the framework.  For explain the plan can be accessed programmatically rather 
than relying on string comparison.

Will we need to translate qfiles?  Maybe.  In the long run it won't make sense 
to have both this and the qfile infrastructure, as this aims to do everything 
that qfiles can do and much more.  But given that this is brand new it has to 
mature quite a bit and the community has to adopt it.  It's not time to start a 
whole sale translation.  I've looked at building a qfile translator but I'm not 
convinced it's a great idea.  One, because I think this will allow us to build 
fewer overall tests than we have qfile tests.  Two, qfile tests mix explain 
(which are really tests for the optimizer) and queries.  I'm wondering if it 
doesn't make sense to split these out.

This is all in JUnit, so adding new tests is straight forward.  One important 
thing is that there's a fair amount of setup and teardown for each test file, 
so it's good to combine like tests into a single class (e.g. one might put all 
of the ACID insert/update/delete tests together) rather than have a single test 
per class.

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this 
> should work with the scaling of inputs.  The dev should be able to provide 
> expected results or custom expected result generation in cases where 
> auto-generation doesn't make sense.
> * Access to the query plan should be available as an API in the tests so that 
> golden files of explain output are not required.
> * This should run in maven, junit, and java so that developers do not need to 
> manage yet another framework.
> * It should be possib

[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2015-11-04 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990226#comment-14990226
 ] 

Sergey Shelukhin commented on HIVE-12316:
-

Is it possible to provide at least some documentation comparing it to q files? 
What are the similarities and differences, do existing tests need translation 
(and how to do it if yes), how to add new tests compared to q files, etc.

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this 
> should work with the scaling of inputs.  The dev should be able to provide 
> expected results or custom expected result generation in cases where 
> auto-generation doesn't make sense.
> * Access to the query plan should be available as an API in the tests so that 
> golden files of explain output are not required.
> * This should run in maven, junit, and java so that developers do not need to 
> manage yet another framework.
> * It should be possible to simulate user data (based on schema and 
> statistics) and quickly incorporate user queries so that tests from user 
> scenarios can be quickly incorporated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)