[jira] [Commented] (HIVE-18684) Race condition in RemoteSparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531844#comment-16531844 ] Sahil Takiar commented on HIVE-18684: - A better approach here would be to print all the app specific info (and the {{SparkTask#printConfigInfo}}) on a per query level rather than per job. For any query that has lots of map-joins, the logs will basically get flooded with duplicate info. > Race condition in RemoteSparkJobMonitor > --- > > Key: HIVE-18684 > URL: https://issues.apache.org/jira/browse/HIVE-18684 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-18684.1.patch, HIVE-18684.2.patch, > HIVE-18684.3.patch > > > There is a race condition in {{RemoteSparkJobMonitor}}. Sometimes the info in > {{RemoteSparkJobMonitor#startMonitor.STARTED}} gets printed out, sometimes it > doesn't. This can be easily verified by running a qtest on > {{TestMiniSparkOnYarnCliDriver}} and counting the number of times {{Query > Hive on Spark job}} is printed vs. the number of times {{Finished > successfully in}} gets printed. > The issue is that {{RemoteSparkJobMonitor}} runs every one second, and checks > the state of {{JobHandle}}. Depending on the state, it prints out some > logging info. The content of the logs contain an implicit assumption that > logs in the {{STARTED}} state are printed before the logs in the > {{SUCCEEDED}} state. However, this isn't always the case. The state > transitions are driven by how long the remote Spark job takes to run, and it > it finishes within one second then the logs in the {{STARTED}} state never > printed. > This can be confusing to users, and there is key debugging information that > is printed in the {{STARTED}} state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18684) Race condition in RemoteSparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506844#comment-16506844 ] Hive QA commented on HIVE-18684: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12926962/HIVE-18684.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 14514 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/11646/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11646/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11646/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12926962 - PreCommit-HIVE-Build > Race condition in RemoteSparkJobMonitor > --- > > Key: HIVE-18684 > URL: https://issues.apache.org/jira/browse/HIVE-18684 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-18684.1.patch, HIVE-18684.2.patch, > HIVE-18684.3.patch > > > There is a race condition in {{RemoteSparkJobMonitor}}. Sometimes the info in > {{RemoteSparkJobMonitor#startMonitor.STARTED}} gets printed out, sometimes it > doesn't. This can be easily verified by running a qtest on > {{TestMiniSparkOnYarnCliDriver}} and counting the number of times {{Query > Hive on Spark job}} is printed vs. the number of times {{Finished > successfully in}} gets printed. > The issue is that {{RemoteSparkJobMonitor}} runs every one second, and checks > the state of {{JobHandle}}. Depending on the state, it prints out some > logging info. The content of the logs contain an implicit assumption that > logs in the {{STARTED}} state are printed before the logs in the > {{SUCCEEDED}} state. However, this isn't always the case. The state > transitions are driven by how long the remote Spark job takes to run, and it > it finishes within one second then the logs in the {{STARTED}} state never > printed. > This can be confusing to users, and there is key debugging information that > is printed in the {{STARTED}} state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18684) Race condition in RemoteSparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506828#comment-16506828 ] Hive QA commented on HIVE-18684: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 19s{color} | {color:blue} ql in master has 2284 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s{color} | {color:red} ql: The patch generated 15 new + 2 unchanged - 4 fixed = 17 total (was 6) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 19m 57s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11646/dev-support/hive-personality.sh | | git revision | master / 6454585 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-11646/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-11646/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Race condition in RemoteSparkJobMonitor > --- > > Key: HIVE-18684 > URL: https://issues.apache.org/jira/browse/HIVE-18684 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-18684.1.patch, HIVE-18684.2.patch, > HIVE-18684.3.patch > > > There is a race condition in {{RemoteSparkJobMonitor}}. Sometimes the info in > {{RemoteSparkJobMonitor#startMonitor.STARTED}} gets printed out, sometimes it > doesn't. This can be easily verified by running a qtest on > {{TestMiniSparkOnYarnCliDriver}} and counting the number of times {{Query > Hive on Spark job}} is printed vs. the number of times {{Finished > successfully in}} gets printed. > The issue is that {{RemoteSparkJobMonitor}} runs every one second, and checks > the state of {{JobHandle}}. Depending on the state, it prints out some > logging info. The content of the logs contain an implicit assumption that > logs in the {{STARTED}} state are printed before the logs in the > {{SUCCEEDED}} state. However, this isn't always the case. The state > transitions are driven by how long the remote Spark job takes to run, and it > it finishes within one second then the logs in the {{STARTED}} state never > printed. > This can be confusing to users, and there is key debugging information that > is printed in the {{STARTED}} state. -- This message was sent by
[jira] [Commented] (HIVE-18684) Race condition in RemoteSparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499007#comment-16499007 ] Hive QA commented on HIVE-18684: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12925801/HIVE-18684.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 14445 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.TestTriggersWorkloadManager.testMultipleTriggers1 (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testMultipleTriggers2 (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomCreatedDynamicPartitions (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomCreatedDynamicPartitionsMultiInsert (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomCreatedDynamicPartitionsUnionAll (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomNonExistent (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighBytesRead (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerShortQueryElapsedTime (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerSlowQueryElapsedTime (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerSlowQueryExecutionTime (batchId=242) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerVertexRawInputSplitsNoKill (batchId=242) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/11435/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11435/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11435/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12925801 - PreCommit-HIVE-Build > Race condition in RemoteSparkJobMonitor > --- > > Key: HIVE-18684 > URL: https://issues.apache.org/jira/browse/HIVE-18684 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-18684.1.patch, HIVE-18684.2.patch > > > There is a race condition in {{RemoteSparkJobMonitor}}. Sometimes the info in > {{RemoteSparkJobMonitor#startMonitor.STARTED}} gets printed out, sometimes it > doesn't. This can be easily verified by running a qtest on > {{TestMiniSparkOnYarnCliDriver}} and counting the number of times {{Query > Hive on Spark job}} is printed vs. the number of times {{Finished > successfully in}} gets printed. > The issue is that {{RemoteSparkJobMonitor}} runs every one second, and checks > the state of {{JobHandle}}. Depending on the state, it prints out some > logging info. The content of the logs contain an implicit assumption that > logs in the {{STARTED}} state are printed before the logs in the > {{SUCCEEDED}} state. However, this isn't always the case. The state > transitions are driven by how long the remote Spark job takes to run, and it > it finishes within one second then the logs in the {{STARTED}} state never > printed. > This can be confusing to users, and there is key debugging information that > is printed in the {{STARTED}} state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18684) Race condition in RemoteSparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498976#comment-16498976 ] Hive QA commented on HIVE-18684: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 28s{color} | {color:blue} ql in master has 2278 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 34s{color} | {color:red} ql: The patch generated 16 new + 2 unchanged - 4 fixed = 18 total (was 6) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 11s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11435/dev-support/hive-personality.sh | | git revision | master / 4463c2b | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-11435/yetus/diff-checkstyle-ql.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-11435/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-11435/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Race condition in RemoteSparkJobMonitor > --- > > Key: HIVE-18684 > URL: https://issues.apache.org/jira/browse/HIVE-18684 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-18684.1.patch, HIVE-18684.2.patch > > > There is a race condition in {{RemoteSparkJobMonitor}}. Sometimes the info in > {{RemoteSparkJobMonitor#startMonitor.STARTED}} gets printed out, sometimes it > doesn't. This can be easily verified by running a qtest on > {{TestMiniSparkOnYarnCliDriver}} and counting the number of times {{Query > Hive on Spark job}} is printed vs. the number of times {{Finished > successfully in}} gets printed. > The issue is that {{RemoteSparkJobMonitor}} runs every one second, and checks > the state of {{JobHandle}}. Depending on the state, it prints out some > logging info. The content of the logs contain an implicit assumption that > logs in the {{STARTED}} state are printed before the logs in the > {{SUCCEEDED}} state. However, this isn't always the case. The state > transitions are driven by how long the remote Spark job takes to run, and it > it finishes within one second then the logs in the {{STARTED}} state never > printed. > This can be confusing to users, and there is key debugging information that > is
[jira] [Commented] (HIVE-18684) Race condition in RemoteSparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417231#comment-16417231 ] Hive QA commented on HIVE-18684: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12916316/HIVE-18684.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 81 failed/errored test(s), 13415 tests executed *Failed tests:* {noformat} TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=172) [tez_union_group_by.q,escape2.q,llap_acid.q,vector_udf_character_length.q,tez_dynpart_hashjoin_1.q,correlationoptimizer3.q,vector_groupby_grouping_sets1.q,autoColumnStats_2.q,vector_binary_join_groupby.q,schema_evol_orc_acid_part_llap_io.q,semijoin6.q,vectorization_0.q,orc_merge8.q,orc_merge_incompat2.q,nonmr_fetch_threshold.q,vectorization_decimal_date.q,schema_evol_orc_nonvec_table_llap_io.q,vectorized_casts.q,vector_grouping_sets.q,groupby_groupingset_bug.q,schema_evol_text_vecrow_part_all_complex.q,stats11.q,tez_join_tests.q,join_acid_non_acid.q,vector_groupby_grouping_window.q,auto_join21.q,schema_evol_text_vecrow_part.q,load_dyn_part1.q,schema_evol_orc_nonvec_part_all_complex_llap_io.q,vector_decimal_1.q] TestMinimrCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=92) [infer_bucket_sort_num_buckets.q,infer_bucket_sort_reducers_power_two.q,parallel_orderby.q,bucket_num_reducers_acid.q,infer_bucket_sort_map_operators.q,infer_bucket_sort_merge.q,root_dir_external_table.q,infer_bucket_sort_dyn_part.q,udf_using.q,bucket_num_reducers_acid2.q] TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=95)
[jira] [Commented] (HIVE-18684) Race condition in RemoteSparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417156#comment-16417156 ] Hive QA commented on HIVE-18684: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 43s{color} | {color:red} ql: The patch generated 10 new + 7 unchanged - 1 fixed = 17 total (was 8) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 13s{color} | {color:red} The patch generated 50 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 14m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-9883/dev-support/hive-personality.sh | | git revision | master / 9b84ed4 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-9883/yetus/diff-checkstyle-ql.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-9883/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-9883/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Race condition in RemoteSparkJobMonitor > --- > > Key: HIVE-18684 > URL: https://issues.apache.org/jira/browse/HIVE-18684 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-18684.1.patch > > > There is a race condition in {{RemoteSparkJobMonitor}}. Sometimes the info in > {{RemoteSparkJobMonitor#startMonitor.STARTED}} gets printed out, sometimes it > doesn't. This can be easily verified by running a qtest on > {{TestMiniSparkOnYarnCliDriver}} and counting the number of times {{Query > Hive on Spark job}} is printed vs. the number of times {{Finished > successfully in}} gets printed. > The issue is that {{RemoteSparkJobMonitor}} runs every one second, and checks > the state of {{JobHandle}}. Depending on the state, it prints out some > logging info. The content of the logs contain an implicit assumption that > logs in the {{STARTED}} state are printed before the logs in the > {{SUCCEEDED}} state. However, this isn't always the case. The state > transitions are driven by how long the remote Spark job takes to run, and it > it finishes within one second then the logs in the {{STARTED}} state never > printed. > This can be confusing to users, and there is key debugging information that > is printed in the {{STARTED}} state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18684) Race condition in RemoteSparkJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361440#comment-16361440 ] Sahil Takiar commented on HIVE-18684: - Right now, it looks like the code in {{RemoteSparkJobMonitor}} is very poll-based. It polls the {{RemoteDriver}} for information every second and displays it. Ideally we would be more event driven here, and whenever the {{SparkClient}} receives an update from the {{RemoteDriver}} it is logged immediately. However, implementing an event-driven model would require re-writing a lot of this code. Unless there is a more compelling reason to implement an event-based model, we should probably just stick to the current code. There should be a simpler workaround for the bug reported in this JIRA anyway. > Race condition in RemoteSparkJobMonitor > --- > > Key: HIVE-18684 > URL: https://issues.apache.org/jira/browse/HIVE-18684 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > There is a race condition in {{RemoteSparkJobMonitor}}. Sometimes the info in > {{RemoteSparkJobMonitor#startMonitor.STARTED}} gets printed out, sometimes it > doesn't. This can be easily verified by running a qtest on > {{TestMiniSparkOnYarnCliDriver}} and counting the number of times {{Query > Hive on Spark job}} is printed vs. the number of times {{Finished > successfully in}} gets printed. > The issue is that {{RemoteSparkJobMonitor}} runs every one second, and checks > the state of {{JobHandle}}. Depending on the state, it prints out some > logging info. The content of the logs contain an implicit assumption that > logs in the {{STARTED}} state are printed before the logs in the > {{SUCCEEDED}} state. However, this isn't always the case. The state > transitions are driven by how long the remote Spark job takes to run, and it > it finishes within one second then the logs in the {{STARTED}} state never > printed. > This can be confusing to users, and there is key debugging information that > is printed in the {{STARTED}} state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)