[jira] [Created] (HIVE-21955) SearchArgumentImpl generates wrong ExpressionTree in some cases which might result in loss of data
Zihao Ye created HIVE-21955: --- Summary: SearchArgumentImpl generates wrong ExpressionTree in some cases which might result in loss of data Key: HIVE-21955 URL: https://issues.apache.org/jira/browse/HIVE-21955 Project: Hive Issue Type: Bug Components: Hive, ORC Reporter: Zihao Ye ExpressionBuilder applies `pushDownNot`, `foldMaybe`, `flatten`, `convertToCNF`, `flatten` and `buildLeafList` in order to form a non-normalized expression into a CNF expression with the unique leaves. After an expression is converted to CNF, there might be more than one non-leaf node which are exactly the same object in the expression tree. If this happens, those non-leaf node will be visited more than once in `buildLeafList` function. As a result, a wrong ExpressionTree is generated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21486) FinalSelectOps is empty in lineage index if there is a script operator(transform)
Zihao Ye created HIVE-21486: --- Summary: FinalSelectOps is empty in lineage index if there is a script operator(transform) Key: HIVE-21486 URL: https://issues.apache.org/jira/browse/HIVE-21486 Project: Hive Issue Type: Bug Components: lineage Affects Versions: 2.3.4, 2.1.1 Reporter: Zihao Ye SQL pattern: create table t1 as select transform(c1) using '/bin/python script.py' as (c2) from t2; Lineage dependencies are correct. But the SelectOperator is not added to the finalSelectOps in Lineage Index. So that index.getDependencies(finalSelOp) got null in this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20912) Output data might be duplicated while speculation is enabled
Zihao Ye created HIVE-20912: --- Summary: Output data might be duplicated while speculation is enabled Key: HIVE-20912 URL: https://issues.apache.org/jira/browse/HIVE-20912 Project: Hive Issue Type: Bug Components: Hive, Operators Affects Versions: 1.2.1 Environment: Hive 1.2.1 Hadoop 2.7.3 Tez 0.7.0 Reporter: Zihao Ye Attachments: image-2018-11-14-17-48-59-826.png, image-2018-11-14-17-53-13-191.png, image-2018-11-14-17-53-50-171.png, image-2018-11-14-19-28-18-924.png The file merge stage had two tasks, which should create two files, but there was three files created. !image-2018-11-14-19-28-18-924.png! By tracing the log, we found that there were two task attempts(one of them was a speculation) finished in one second by such a coincidence. Although the later one received a kill signal from AM, the rename operation was already done at that time, which cause the data duplication. The rename operation was done at _AbstractFileMergeOperator.closeOp()_, the __ final path name was determined by the task attempt id rather than the task id. In this case, the final path ended with '00_0' and '00_1' rather than '00'. IMHO, by making the final path name ended with task id without task attempt id, one task can only generate at most one file, which could solve this issue. But I don't know the side effects for changing the final path name. This issue also affects other operators related to file renaming like JoinOperator and FileSinkOperator. !image-2018-11-14-17-53-13-191.png! !image-2018-11-14-17-53-50-171.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)