[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106542#comment-14106542 ] Chinna Rao Lalam commented on HIVE-7702: Hi [~brocknoland], Compare against MR most of the times differences are due to sorting order only. Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107442#comment-14107442 ] Brock Noland commented on HIVE-7702: Thank you Chinna!! I agree, I used the script below and all of the result differences are due to sorting. Thank you! +1 {noformat} #!/bin/bash while read file do mr=$(echo $file | perl -pe s@/spark@@g) spark=$file mrSorted=/tmp/$(basename $mr)-mr.sorted sparkSorted=/tmp/$(basename $spark)-spark.sorted sort $mr $mrSorted sort $spark $sparkSorted diff -y -W 150 $mrSorted $sparkSorted done {noformat} Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105357#comment-14105357 ] Chinna Rao Lalam commented on HIVE-7702: Join related query files will handle in this jira HIVE-7816 filter_join_breaktask.q,\ filter_join_breaktask2.q Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105379#comment-14105379 ] Hive QA commented on HIVE-7702: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663390/HIVE-7702-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5984 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-75/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663390 Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105447#comment-14105447 ] Brock Noland commented on HIVE-7702: Nice work [~chinnalalam]!! Looks like insert_into2 fails. Looking at the DIFF I see a bunch of odd characters at the bottom. Thank you!! Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105759#comment-14105759 ] Chinna Rao Lalam commented on HIVE-7702: insert_into2.q.out is corrected.. Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105838#comment-14105838 ] Brock Noland commented on HIVE-7702: Hi Chinna, Thank you! Using git and the following command I was able to compare the results against MR {noformat} git status | awk '/new file:/ {print $NF}' | xargs -I {} sh -c 'diff {} $(echo {} | perl -pe s@/spark@@g)' {noformat} Do you know if the differences are due to sorting order or correctness? Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105856#comment-14105856 ] Hive QA commented on HIVE-7702: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663458/HIVE-7702.1-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5985 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/76/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/76/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-76/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663458 Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102719#comment-14102719 ] Brock Noland commented on HIVE-7702: Let's try and add the following tests in this JIRA: {noformat} enforce_order.q,\ filter_join_breaktask.q,\ filter_join_breaktask2.q,\ groupby1.q,\ groupby2.q,\ groupby3.q,\ having.q,\ insert1.q,\ insert_into1.q,\ insert_into2.q,\ {noformat} Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)