[jira] [Created] (HIVE-16456) Kill spark job when InterruptedException happens or driverContext.isShutdown is true.

2017-04-14 Thread zhihai xu (JIRA)
zhihai xu created HIVE-16456:


 Summary: Kill spark job when InterruptedException happens or 
driverContext.isShutdown is true.
 Key: HIVE-16456
 URL: https://issues.apache.org/jira/browse/HIVE-16456
 Project: Hive
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Kill spark job when InterruptedException happens or driverContext.isShutdown is 
true. If the InterruptedException happened in RemoteSparkJobMonitor and 
LocalSparkJobMonitor, it will be better to kill the job. Also there is a race 
condition between submit the spark job and query/operation cancellation, it 
will be better to check driverContext.isShutdown right after submit the spark 
job. This will guarantee the job being killed no matter when shutdown is 
called. It is similar as HIVE-15997.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16433) Not nullify rj to avoid NPE due to race condition in ExecDriver.

2017-04-12 Thread zhihai xu (JIRA)
zhihai xu created HIVE-16433:


 Summary: Not nullify rj to avoid NPE due to race condition in 
ExecDriver.
 Key: HIVE-16433
 URL: https://issues.apache.org/jira/browse/HIVE-16433
 Project: Hive
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Not nullify rj to avoid NPE due to race condition in ExecDriver. currently  
{{rj}} is set to null in ExecDriver.shutdown which is called from other thread 
for query cancellation. It can happen at any time. There is a potential race 
condition,  the rj is still accessed after shutdown is called. For example: if 
the following is called right after ExecDriver.shutdown is called.
{code}
  this.jobID = rj.getJobID();
  updateStatusInQueryDisplay();
  returnVal = jobExecHelper.progress(rj, jc, ctx);
{code}
Currently the purpose of nullifying  rj is mainly to make sure {{rj.killJob()}} 
is only called once.
I will add a flag jobKilled to make sure {{rj.killJob()}} will be only called 
once.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16430) Add log to show the cancelled query id when cancelOperation is called.

2017-04-12 Thread zhihai xu (JIRA)
zhihai xu created HIVE-16430:


 Summary: Add log to show the cancelled query id when 
cancelOperation is called.
 Key: HIVE-16430
 URL: https://issues.apache.org/jira/browse/HIVE-16430
 Project: Hive
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial


Add log to show the cancelled query id when cancelOperation is called.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16429) Should call invokeFailureHooks in handleInterruption to track failed query execution due to interrupted command.

2017-04-12 Thread zhihai xu (JIRA)
zhihai xu created HIVE-16429:


 Summary: Should call invokeFailureHooks in handleInterruption to 
track failed query execution due to interrupted command.
 Key: HIVE-16429
 URL: https://issues.apache.org/jira/browse/HIVE-16429
 Project: Hive
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Should call invokeFailureHooks in handleInterruption to track failed query 
execution due to interrupted command.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16422) Should kill running Spark Jobs when a query is cancelled.

2017-04-11 Thread zhihai xu (JIRA)
zhihai xu created HIVE-16422:


 Summary: Should kill running Spark Jobs when a query is cancelled.
 Key: HIVE-16422
 URL: https://issues.apache.org/jira/browse/HIVE-16422
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 2.1.0
Reporter: zhihai xu
Assignee: zhihai xu


Should kill running Spark Jobs when a query is cancelled. When a query is 
cancelled, Driver.releaseDriverContext will be called by Driver.close. 
releaseDriverContext will call DriverContext.shutdown which will call all the 
running tasks' shutdown.
{code}
  public synchronized void shutdown() {
LOG.debug("Shutting down query " + ctx.getCmd());
shutdown = true;
for (TaskRunner runner : running) {
  if (runner.isRunning()) {
Task task = runner.getTask();
LOG.warn("Shutting down task : " + task);
try {
  task.shutdown();
} catch (Exception e) {
  console.printError("Exception on shutting down task " + task.getId() 
+ ": " + e);
}
Thread thread = runner.getRunner();
if (thread != null) {
  thread.interrupt();
}
  }
}
running.clear();
  }
{code}
since SparkTask didn't implement shutdown method to kill the running spark job, 
the spark job may be still running after the query is cancelled. So it will be 
good to kill the spark job in SparkTask.shutdown to save cluster resource.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16368) Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView Operation for hive on MR.

2017-04-03 Thread zhihai xu (JIRA)
zhihai xu created HIVE-16368:


 Summary: Unexpected java.lang.ArrayIndexOutOfBoundsException from 
query with LaterView Operation for hive on MR.
 Key: HIVE-16368
 URL: https://issues.apache.org/jira/browse/HIVE-16368
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: zhihai xu
Assignee: zhihai xu


Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened in 
LaterView Operation. It happened for hive-on-mr. The reason is because the 
column prune change the column order in LaterView operation, for back-back 
reducesink operators using MR engine, FileSinkOperator and TableScanOperator 
are added before the second ReduceSink operator, The serialization column order 
used by FileSinkOperator in LazyBinarySerDe of previous reducer is different 
from deserialization column order from table desc used by 
MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper.
The serialization is decided by the outputObjInspector from 
LateralViewJoinOperator,
{code}
ArrayList fieldNames = conf.getOutputInternalColNames();
outputObjInspector = ObjectInspectorFactory
.getStandardStructObjectInspector(fieldNames, ois);
{code}
So the column order for serialization is decided by getOutputInternalColNames 
in LateralViewJoinOperator.

The deserialization is decided by TableScanOperator which is created at  
GenMapRedUtils.splitTasks. 
{code}
TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils
.getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol"));
// Create the temporary file, its corresponding FileSinkOperaotr, and
// its corresponding TableScanOperator.
TableScanOperator tableScanOp =
createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx);
{code}
The column order for deserialization is decided by rowSchema of 
LateralViewJoinOperator.
But ColumnPrunerLateralViewJoinProc changed the order of outputInternalColNames 
but still keep the original order of rowSchema,
Which cause the mismatch between serialization and deserialization for two 
back-to-back MR jobs.
Similar issue for ColumnPrunerLateralViewForwardProc which change the column 
order of its child selector colList but not rowSchema.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15772) set the exception into SparkJobStatus if exception happened in RemoteSparkJobMonitor and LocalSparkJobMonitor

2017-01-31 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15772:


 Summary: set the exception into SparkJobStatus if exception 
happened in RemoteSparkJobMonitor and LocalSparkJobMonitor
 Key: HIVE-15772
 URL: https://issues.apache.org/jira/browse/HIVE-15772
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 2.2.0
 Environment: set the exception into SparkJobStatus if exception 
happened in RemoteSparkJobMonitor and LocalSparkJobMonitor. Add function 
setError in SparkJobStatus.
Reporter: zhihai xu
Assignee: zhihai xu






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15662) check startTime in SparkTask to make sure startTime is not less than submitTime

2017-01-18 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15662:


 Summary: check startTime in SparkTask to make sure startTime is 
not less than submitTime
 Key: HIVE-15662
 URL: https://issues.apache.org/jira/browse/HIVE-15662
 Project: Hive
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Check startTime in SparkTask to make sure startTime is not less than 
submitTime. We saw a corner case when the sparkTask is finished in less than 1 
second, the startTime may not be set because RemoteSparkJobMonitor will sleep 
for 1 second then check the state, in this case, right after sleep for one 
second, the spark job is already completed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15630) add operation handle before operation.run instead of after operation.run

2017-01-14 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15630:


 Summary: add operation handle before operation.run instead of 
after operation.run
 Key: HIVE-15630
 URL: https://issues.apache.org/jira/browse/HIVE-15630
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.2.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Add operation handle before operation.run instead of after operation.run. So 
when session is closed, all the running operations from {{operation.run}} can 
also be closed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15629) Set DDLTask’s exception with its subtask’s exception

2017-01-13 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15629:


 Summary: Set DDLTask’s exception with its subtask’s exception
 Key: HIVE-15629
 URL: https://issues.apache.org/jira/browse/HIVE-15629
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 2.2.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Set DDLTask’s exception with its subtask’s exception, So the exception from 
subtask can be propagated to TaskRunner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15564) set task's jobID with hadoop map reduce job ID for PartialScanTask, MergeFileTask and ColumnTruncateTask.

2017-01-08 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15564:


 Summary: set task's jobID with hadoop map reduce job ID for 
PartialScanTask, MergeFileTask and ColumnTruncateTask.
 Key: HIVE-15564
 URL: https://issues.apache.org/jira/browse/HIVE-15564
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: zhihai xu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15563) Ignore Illegal Operation state transition exception in SQLOperation.runQuery to expose real exception.

2017-01-08 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15563:


 Summary: Ignore Illegal Operation state transition exception in 
SQLOperation.runQuery to expose real exception.
 Key: HIVE-15563
 URL: https://issues.apache.org/jira/browse/HIVE-15563
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Ignore Illegal Operation state transition exception in SQLOperation.runQuery to 
expose real exception.
setState may create Illegal Operation state transition exception which may  
hide the real exception. we see the following exception happened from 
{{setState(OperationState.ERROR);}} in SQLOperation.runQuery
{code}
org.apache.hive.service.cli.operation.Operation: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Illegal Operation state 
transition from CLOSED to ERROR
at 
org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:91)
at 
org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:97)
at 
org.apache.hive.service.cli.operation.Operation.setState(Operation.java:154)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:241)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:82)
at 
org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:288)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at 
org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:301)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15528) Expose Spark job error in SparkTask

2017-01-01 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15528:


 Summary: Expose Spark job error in SparkTask
 Key: HIVE-15528
 URL: https://issues.apache.org/jira/browse/HIVE-15528
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 2.2.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Expose Spark job error in SparkTask by propagating Spark job error to task 
exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15494) Create perfLogger in method execute instead class initialization for SparkTask

2016-12-21 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15494:


 Summary: Create perfLogger in method execute instead class 
initialization for SparkTask
 Key: HIVE-15494
 URL: https://issues.apache.org/jira/browse/HIVE-15494
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 2.2.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Create perfLogger in method execute instead class initialization for SparkTask,
so perfLogger can be shared with SparkJobMonitor in the same thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15470) Catch Throwable instead of Exception in driver.execute.

2016-12-19 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15470:


 Summary: Catch Throwable instead of Exception in driver.execute.
 Key: HIVE-15470
 URL: https://issues.apache.org/jira/browse/HIVE-15470
 Project: Hive
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Catch Throwable instead of Exception in driver.execute. So the failed query 
with Throwable not Exception will also be logged and reported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor

2016-12-07 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15386:


 Summary: Expose Spark task counts and stage Ids information in 
SparkTask from SparkJobMonitor
 Key: HIVE-15386
 URL: https://issues.apache.org/jira/browse/HIVE-15386
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 2.2.0
Reporter: zhihai xu
Assignee: zhihai xu


Expose Spark task counts and stage Ids information in SparkTask from 
SparkJobMonitor. So these information can be used by hive hook to monitor spark 
jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15301) Expose SparkStatistics information in SparkTask

2016-11-28 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15301:


 Summary: Expose SparkStatistics information in SparkTask
 Key: HIVE-15301
 URL: https://issues.apache.org/jira/browse/HIVE-15301
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Expose SparkStatistics information in SparkTask. So we can get SparkStatistics 
in Hook.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15171) set SparkTask's jobID with application id

2016-11-09 Thread zhihai xu (JIRA)
zhihai xu created HIVE-15171:


 Summary: set SparkTask's jobID with application id
 Key: HIVE-15171
 URL: https://issues.apache.org/jira/browse/HIVE-15171
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 2.1.0
Reporter: zhihai xu
Assignee: zhihai xu


set SparkTask's jobID with application id, The information will be useful to 
monitor the Spark Application in hook



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.

2016-08-17 Thread zhihai xu (JIRA)
zhihai xu created HIVE-14564:


 Summary: Column Pruning generates out of order columns in 
SelectOperator which cause ArrayIndexOutOfBoundsException.
 Key: HIVE-14564
 URL: https://issues.apache.org/jira/browse/HIVE-14564
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 2.1.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical


Column Pruning generates out of order columns in SelectOperator which cause 
ArrayIndexOutOfBoundsException.

{code}
2016-07-26 21:49:24,390 FATAL [main] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{"_col0":null,"_col1":0,"_col2":36,"_col3":"499ec44-6dd2-4709-a019-33d6d484ed90�\u0001U5�\u001c��\t\u001b�\u","_col4":"5264db53-d650-4678-9261-cdd51efab8bb","_col5":"cb5233dd-214a-4b0b-b43e-0f41befb5c5c","_col6":"","_col8":48,"_col9":null,"_col10":"1befb5c5c�\u00192016-06-09T15:31:15+00:00\u0002\u0005Rider\u0011svc-dash","_col11":64,"_col12":null,"_col13":null,"_col14":"ber.com�\u0001U5ߨP�\u0001U5ᷨider)
 - 
1000\u0005Rider\u0011svc-d...@uber.com�\u0001U4�;x�\u0001U5\u0004��\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u","_col15":"","_col16":null}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:550)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:377)
... 13 more
{code}

The exception is because the serialization and deserialization doesn't match.
The serialization by LazyBinarySerDe from previous MapReduce job used different 
order of columns. When the current MapReduce job deserialized the intermediate 
sequence file generated by previous MapReduce job, it will get corrupted data 
from the deserialization using wrong order of columns by LazyBinaryStruct. The 
unmatched columns between  serialization and deserialization caused by 
SelectOperator's Column Pruning {{ColumnPrunerSelectProc}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14368) ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message.

2016-07-27 Thread zhihai xu (JIRA)
zhihai xu created HIVE-14368:


 Summary: ThriftCLIService.GetOperationStatus should include 
exception's stack trace to the error message.
 Key: HIVE-14368
 URL: https://issues.apache.org/jira/browse/HIVE-14368
 Project: Hive
  Issue Type: Improvement
  Components: Thrift API
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


ThriftCLIService.GetOperationStatus should include exception's stack trace to 
the error message. The stack trace will be really helpful for client to debug 
failed queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14331) Task should set exception for failed map reduce job.

2016-07-25 Thread zhihai xu (JIRA)
zhihai xu created HIVE-14331:


 Summary: Task should set exception for failed map reduce job.
 Key: HIVE-14331
 URL: https://issues.apache.org/jira/browse/HIVE-14331
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 2.1.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


Task should set exception for failed map reduce job. So the exception can be 
seen in HookContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14303) CommonJoinOperator.checkAndGenObject should return directly in CLOSE state to avoid NPE if ExecReducer.close is called twice.

2016-07-20 Thread zhihai xu (JIRA)
zhihai xu created HIVE-14303:


 Summary: CommonJoinOperator.checkAndGenObject should return 
directly in CLOSE state to avoid NPE if ExecReducer.close is called twice.
 Key: HIVE-14303
 URL: https://issues.apache.org/jira/browse/HIVE-14303
 Project: Hive
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.1.0


CommonJoinOperator.checkAndGenObject should return directly in CLOSE state to 
avoid NPE if ExecReducer.close is called twice. ExecReducer.close implements 
Closeable interface and ExecReducer.close can be called multiple time. We saw 
the following NPE which hide the real exception due to this bug.
{code}
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: 
null

at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)

at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)

at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Caused by: java.lang.NullPointerException

at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)

at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)

at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)

... 8 more
{code}
The code from ReduceTask.runOldReducer:
{code}
  reducer.close(); //line 453
  reducer = null;
  
  out.close(reporter);
  out = null;
} finally {
  IOUtils.cleanup(LOG, reducer);// line 459
  closeQuietly(out, reporter);
}
{code}
Based on the above stack trace and code, reducer.close() is called twice 
because the exception happened when reducer.close() is called for the first 
time at line 453, the code exit before reducer was set to null. 
NullPointerException is triggered when reducer.close() is called for the second 
time in IOUtils.cleanup. NullPointerException hide the real exception which 
happened when reducer.close() is called for the first time at line 453.
The reason for NPE is:
The first reducer.close called CommonJoinOperator.closeOp which clear 
{{storage}}
{code}
Arrays.fill(storage, null);
{code}
the second reduce.close generated NPE due to null {{storage[alias]}} which is 
set to null by first reducer.close.
The following reducer log can give more proof:
{code}
2016-07-14 22:24:51,016 INFO [main] 
org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
2016-07-14 22:24:51,016 INFO [main] 
org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
2016-07-14 22:24:51,016 INFO [main] 
org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
2016-07-14 22:24:51,016 INFO [main] 
org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... 
2016-07-14 22:24:51,016 INFO [main] 
org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... 
2016-07-14 22:24:51,016 INFO [main] 
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 
2016-07-14 22:24:51,016 INFO [main] 
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... 
2016-07-14 22:24:51,016 INFO [main] 
org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[4]: records written - 53466
2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing 
operators - failing tree
2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.RuntimeException: Hive Runtime Error while 
closing operators: null
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
 

[jira] [Created] (HIVE-14258) Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too long to finish without reporting progress

2016-07-15 Thread zhihai xu (JIRA)
zhihai xu created HIVE-14258:


 Summary: Reduce task timed out because 
CommonJoinOperator.genUniqueJoinObject took too long to finish without 
reporting progress
 Key: HIVE-14258
 URL: https://issues.apache.org/jira/browse/HIVE-14258
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.1.0
Reporter: zhihai xu
Assignee: zhihai xu


Reduce task timed out because CommonJoinOperator.genUniqueJoinObject took too 
long to finish without reporting progress.
This timeout happened when reducer.close() is called in ReduceTask.java.
CommonJoinOperator.genUniqueJoinObject() called by reducer.close() will loop 
over every row in the AbstractRowContainer. This can take a long time if there 
are a large number or rows, and during this time, it does not report progress. 
If this runs for long enough more than "mapreduce.task.timeout", 
ApplicationMaster will kill the task for failing to report progress.
we configured "mapreduce.task.timeout" as 10 minutes. I captured the stack 
trace in the 10 minutes before AM killed the reduce task at 2016-07-15 07:19:11.
The following three stack traces can prove it:
at 2016-07-15 07:09:42:
{code}
"main" prio=10 tid=0x7f90ec017000 nid=0xd193 runnable [0x7f90f62e5000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:272)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:154)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0x0007deecefb0> (a 
org.apache.hadoop.fs.BufferedFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:252)
at 
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276)
at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:214)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:232)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196)
- locked <0x0007deecb978> (a 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at 
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
at 
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2359)
- locked <0x0007deec8f70> (a 
org.apache.hadoop.io.SequenceFile$Reader)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2491)
- locked <0x0007deec8f70> (a 
org.apache.hadoop.io.SequenceFile$Reader)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
- locked <0x0007deec82f0> (a 
org.apache.hadoop.mapred.SequenceFileRecordReader)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:267)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.next(RowContainer.java:74)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:750)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
{code}
at 2016-07-15 07:15:35
{code}
"main" prio=10 tid=0x7f90ec017000 nid=0xd193 runnable [0x7f90f62e5000]
   java.lang.Thread.State: RUNNABLE
at java.util.zip.CRC32.updateBytes(Native Method)
at java.util.zip.CRC32.update(CRC32.java:65)
at 
org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:316)
at 
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:279)
 

[jira] [Created] (HIVE-14094) Remove unused function closeFs from Warehouse.java

2016-06-25 Thread zhihai xu (JIRA)
zhihai xu created HIVE-14094:


 Summary: Remove unused function closeFs from Warehouse.java
 Key: HIVE-14094
 URL: https://issues.apache.org/jira/browse/HIVE-14094
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial


Remove unused function closeFs from Warehouse.java
after HIVE-10922, no one will call Warehouse.closeFs. It will be good to delete 
this function to prevent people from using it. Normally closing FileSystem is 
not safe because most of the time FileSystem will be shared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14067) Rename pendingCount to activeCalls in HiveSessionImpl for easier understanding.

2016-06-20 Thread zhihai xu (JIRA)
zhihai xu created HIVE-14067:


 Summary: Rename pendingCount to activeCalls in HiveSessionImpl  
for easier understanding.
 Key: HIVE-14067
 URL: https://issues.apache.org/jira/browse/HIVE-14067
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial


Rename pendingCount to activeCalls in HiveSessionImpl  for easier understanding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13960) Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for back-to-back synchronous operations.

2016-06-06 Thread zhihai xu (JIRA)
zhihai xu created HIVE-13960:


 Summary: Session timeout may happen before 
HIVE_SERVER2_IDLE_SESSION_TIMEOUT for back-to-back synchronous operations.
 Key: HIVE-13960
 URL: https://issues.apache.org/jira/browse/HIVE-13960
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: zhihai xu
Assignee: zhihai xu


Session timeout may happen before 
HIVE_SERVER2_IDLE_SESSION_TIMEOUT(hive.server2.idle.session.timeout) for 
back-to-back synchronous operations.
This issue can happen with the following two operations op1 and op2: op2 is a 
synchronous long running operation, op2 is running right after op1 is closed.
 
1. closeOperation(op1) is called:
this will set {{lastIdleTime}} with value System.currentTimeMillis() because 
{{opHandleSet}} becomes empty after {{closeOperation}} remove op1 from 
{{opHandleSet}}.

2. op2 is running for long time by calling {{executeStatement}} right after 
closeOperation(op1) is called.
If op2 is running for more than HIVE_SERVER2_IDLE_SESSION_TIMEOUT, then the 
session will timeout even when op2 is still running.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13760) Add a HIVE_QUERY_TIMEOUT configuration to kill a query if a query is running for more than the configured timeout value.

2016-05-13 Thread zhihai xu (JIRA)
zhihai xu created HIVE-13760:


 Summary: Add a HIVE_QUERY_TIMEOUT configuration to kill a query if 
a query is running for more than the configured timeout value.
 Key: HIVE-13760
 URL: https://issues.apache.org/jira/browse/HIVE-13760
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 2.0.0
Reporter: zhihai xu


Add a HIVE_QUERY_TIMEOUT configuration to kill a query if a query is running 
for more than the configured timeout value. The default value will be -1 , 
which means no timeout. This will be useful for  user to manage queries with 
SLA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13629) Expose Merge-File task and Column-Truncate task from DDLTask

2016-04-27 Thread zhihai xu (JIRA)
zhihai xu created HIVE-13629:


 Summary: Expose Merge-File task and Column-Truncate task from 
DDLTask
 Key: HIVE-13629
 URL: https://issues.apache.org/jira/browse/HIVE-13629
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 2.0.0
Reporter: zhihai xu
Assignee: zhihai xu


DDLTask will create subtask in mergeFiles and truncateTable to support 
HiveOperation.TRUNCATETABLE, HiveOperation.ALTERTABLE_MERGEFILES and 
HiveOperation.ALTERPARTITION_MERGEFILES.
It will be better to expose the tasks which are created at function mergeFiles 
and truncateTable from class DDLTask to users.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)