[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-22 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106542#comment-14106542
 ] 

Chinna Rao Lalam commented on HIVE-7702:


Hi [~brocknoland],

Compare against MR most of the times differences are due to sorting order only.





 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-22 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107442#comment-14107442
 ] 

Brock Noland commented on HIVE-7702:


Thank you Chinna!! I agree, I used the script below and all of the result 
differences are due to sorting. Thank you!

+1

{noformat}
#!/bin/bash
while read file
do
  mr=$(echo $file | perl -pe s@/spark@@g)
  spark=$file
  mrSorted=/tmp/$(basename $mr)-mr.sorted
  sparkSorted=/tmp/$(basename $spark)-spark.sorted
  sort $mr  $mrSorted
  sort $spark  $sparkSorted
  diff -y -W 150 $mrSorted $sparkSorted
done
{noformat}

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-21 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105357#comment-14105357
 ] 

Chinna Rao Lalam commented on HIVE-7702:


Join related query files will handle in this jira HIVE-7816

filter_join_breaktask.q,\
filter_join_breaktask2.q


 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105379#comment-14105379
 ] 

Hive QA commented on HIVE-7702:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663390/HIVE-7702-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5984 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-75/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663390

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-21 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105447#comment-14105447
 ] 

Brock Noland commented on HIVE-7702:


Nice work [~chinnalalam]!! Looks like insert_into2 fails. Looking at the DIFF I 
see a bunch of odd characters at the bottom. Thank you!!

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-21 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105759#comment-14105759
 ] 

Chinna Rao Lalam commented on HIVE-7702:


insert_into2.q.out  is corrected..

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-21 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105838#comment-14105838
 ] 

Brock Noland commented on HIVE-7702:


Hi Chinna,
Thank you! Using git and the following command I was able to compare the 
results against MR

{noformat}
git status | awk '/new file:/ {print $NF}' | xargs -I {} sh -c 'diff {} $(echo 
{} | perl -pe s@/spark@@g)'
{noformat}

Do you know if the differences are due to sorting order or correctness?

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105856#comment-14105856
 ] 

Hive QA commented on HIVE-7702:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663458/HIVE-7702.1-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5985 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/76/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/76/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-76/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663458

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102719#comment-14102719
 ] 

Brock Noland commented on HIVE-7702:


Let's try and add the following tests in this JIRA:

{noformat}
  enforce_order.q,\
  filter_join_breaktask.q,\
  filter_join_breaktask2.q,\
  groupby1.q,\
  groupby2.q,\
  groupby3.q,\
  having.q,\
  insert1.q,\
  insert_into1.q,\
  insert_into2.q,\
{noformat}

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam

 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)