[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084227#comment-17084227 ] Miklos Gergely commented on HIVE-23110: --- So what is the conclusion here? Can we close this as not a bug? Or should we change the level of that message to debug? > Prevent NPE in ReExecDriver if the processing is aborted > > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-23110.01.patch > > > In case of abort the context would be null, and thus the planMapper can not > be obtained from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072531#comment-17072531 ] Zoltan Haindrich commented on HIVE-23110: - I think this issue is not connect to the NPE from ReExecDriver; because if that would be there the catch would have done a [return here|https://github.com/apache/hive/blob/d2ad5b061706a1d3cd55e59c769ed4f2af01cdbe/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java#L240] I don't think we have guarantees that after cancel an exception like that may not happen: * cleanup is synch method (not that it matters) * it sets state as first step * and then it start calling driver with .close and .destroy * as a result of that internally the driver changes to some aborted state * however..if the job is already near completion it an exception may not happen; and when the actual thread (Thread-74 in the logs) starts getting out - it doesn't throw an exception; and that is not in line with the SQLOperation's expectations; which causes the illegal transition [SQLOperation.cleanup|https://github.com/apache/hive/blob/d2ad5b061706a1d3cd55e59c769ed4f2af01cdbe/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java#L396] > Prevent NPE in ReExecDriver if the processing is aborted > > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-23110.01.patch > > > In case of abort the context would be null, and thus the planMapper can not > be obtained from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072509#comment-17072509 ] Prasanth Jayachandran commented on HIVE-23110: -- I have partial logs {code:java} hiveserver2 <14>1 2020-03-31T20:52:24.702Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="ql.Driver" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] Executing command(queryId=hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e) has been interrupted after 133.75 seconds hiveserver2 <14>1 2020-03-31T20:52:24.702Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="ql.Driver" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] OK hiveserver2 <15>1 2020-03-31T20:52:24.702Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="log.PerfLogger" level="DEBUG" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] hiveserver2 <14>1 2020-03-31T20:52:24.711Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="common.LogUtils" level="INFO" thread="HiveServer2-Background-Pool: Thread-74"] Unregistered logging context. hiveserver2 <14>1 2020-03-31T20:52:24.702Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="lockmgr.DbLockManager" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] releaseLocks: hiveserver2 <15>1 2020-03-31T20:52:24.703Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="log.PerfLogger" level="DEBUG" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] hiveserver2 <11>1 2020-03-31T20:52:24.711Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="operation.Operation" level="ERROR" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] Error running hive query: org.apache.hive.service.cli.HiveSQLException: Illegal Operation state transition from CANCELED to FINISHED at org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:97) at org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:103) at org.apache.hive.service.cli.operation.Operation.setState(Operation.java:161) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:248) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) hiveserver2 2020-03-31 20:52:24,710 Log4j2-TF-1-AsyncLogger[AsyncContext@18b4aac2]-1 ERROR /tmp/hive/operation_logs/94e0ab1a-e5ca-4237-9713-235b5dd2559a/hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e was closed
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072499#comment-17072499 ] Miklos Gergely commented on HIVE-23110: --- [~prasanth_j] do you have a log for 2) ? Should we run failure hooks on abort? Right now we don't do that (explicitly not), and I thought it was intentional as if a query is aborted, it is not a failure. Should we run them? > Prevent NPE in ReExecDriver if the processing is aborted > > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-23110.01.patch > > > In case of abort the context would be null, and thus the planMapper can not > be obtained from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072040#comment-17072040 ] Prasanth Jayachandran commented on HIVE-23110: -- I was seeing couple of issues when aborting a query 1) NPE in ReExecDriver 2) Invalid state transition (CLOSED to FINISHED) As long as these are not causing issues for the clients (incorrect exit codes, not running failure hooks on abort) it is ok to ignore these exceptions. I would say handling and ignoring the exception with logging is better than not handling at all. I was trying to eliminate 1) in my quest to debug why certain failure hook is not executed. > Prevent NPE in ReExecDriver if the processing is aborted > > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-23110.01.patch > > > In case of abort the context would be null, and thus the planMapper can not > be obtained from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071872#comment-17071872 ] Hive QA commented on HIVE-23110: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12998330/HIVE-23110.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 18126 tests executed *Failed tests:* {noformat} TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=162) [unionDistinct_1.q,table_nonprintable.q,file_with_header_footer_aggregation.q,orc_llap_counters1.q,mm_cttas.q,whroot_external1.q,global_limit.q,rcfile_createas1.q,dynamic_partition_pruning_2.q,intersect_merge.q,parquet_struct_type_vectorization.q,results_cache_diff_fs.q,parallel_colstats.q,load_hdfs_file_with_space_in_the_name.q,orc_merge3.q] org.apache.hadoop.hive.ql.parse.TestReplWithJsonMessageFormat.testIncrementalLoadFailAndRetry (batchId=260) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalLoadFailAndRetry (batchId=270) org.apache.hive.jdbc.TestTriggersNoTezSessionPool.testTriggerVertexTotalTasks (batchId=290) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerDagTotalTasks (batchId=292) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerSlowQueryElapsedTime (batchId=292) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/21352/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/21352/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-21352/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12998330 - PreCommit-HIVE-Build > Prevent NPE in ReExecDriver if the processing is aborted > > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-23110.01.patch > > > In case of abort the context would be null, and thus the planMapper can not > be obtained from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071830#comment-17071830 ] Miklos Gergely commented on HIVE-23110: --- I've discussed the issue with [~kgyrtkirk] and in fact it would be difficult to ensure that no exception is thrown in case of an abort. By applying the fix that I've attached things will be fine if ReExecDriver is used, but if not then we should handle a lot of exceptions. Probably it would be easier just to assume that there may be exceptions if we abort. [~prasanth_j] what do you think? > Prevent NPE in ReExecDriver if the processing is aborted > > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-23110.01.patch > > > In case of abort the context would be null, and thus the planMapper can not > be obtained from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071821#comment-17071821 ] Hive QA commented on HIVE-23110: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 35s{color} | {color:blue} ql in master has 1529 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 23m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-21352/dev-support/hive-personality.sh | | git revision | master / 3de2dc1 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-21352/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Prevent NPE in ReExecDriver if the processing is aborted > > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-23110.01.patch > > > In case of abort the context would be null, and thus the planMapper can not > be obtained from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted and thus the context is null
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071717#comment-17071717 ] Zoltan Haindrich commented on HIVE-23110: - I see that you've changed the the ticket's summary - the description is the place where it says: "Click to add description" Wouldn't returning null as CommandProcessorResponse cause any trouble? I think if this method returns w/o an exception it means that it have finished correctly... Shouldn't we lower the log priority to debug or trace here: https://github.com/apache/hive/blob/d2ad5b061706a1d3cd55e59c769ed4f2af01cdbe/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java#L239 note that the current solution will change the state of sqlop to finished from canceled - not sure what that could cause (I guess nothing - but for an aborted query I think canceled might be better) > Prevent NPE in ReExecDriver if the processing is aborted and thus the context > is null > - > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Attachments: HIVE-23110.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted and thus the context is null
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071660#comment-17071660 ] Miklos Gergely commented on HIVE-23110: --- [~kgyrtkirk] modified the description. I've considered to use the driverState, but it is not visible to the ReExecDriver, so it would require a new function in Driver, so we have to add new code, and for me adding a closed field seemed to be cleaner. > Prevent NPE in ReExecDriver if the processing is aborted and thus the context > is null > - > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Attachments: HIVE-23110.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071644#comment-17071644 ] Zoltan Haindrich commented on HIVE-23110: - [~mgergely] could you please add the exception trace to the description? Which is the case: * the planMapper is null * or the context ? couldn't we use the "driverState" to conclude to not to do stuff; instead of adding a new "closed" boolean? > Prevent NPE in ReExecDriver if the processing is aborted > > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Attachments: HIVE-23110.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071600#comment-17071600 ] Miklos Gergely commented on HIVE-23110: --- [~prasanth_j] slightly modified, please take a look. > Prevent NPE in ReExecDriver if the processing is aborted > > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Attachments: HIVE-23110.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071313#comment-17071313 ] Prasanth Jayachandran commented on HIVE-23110: -- +1, pending tests > Prevent NPE in ReExecDriver if the processing is aborted > > > Key: HIVE-23110 > URL: https://issues.apache.org/jira/browse/HIVE-23110 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Attachments: HIVE-23110.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)