[jira] [Comment Edited] (HIVE-11394) Enhance EXPLAIN display for vectorization
[ https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589439#comment-15589439 ] Sergey Shelukhin edited comment on HIVE-11394 at 10/19/16 6:18 PM: --- I keep merging stuff into a feature branch and every time I do, this patch is in a different state. It's a Schrodinger patch, you never know if it's committed or reverted until you try to merge. was (Author: sershe): I keep merging stuff into feature branch and every time this patch is in a different state. It's a Schrodinger patch, you never know if it's committed or reverted until you try to merge. > Enhance EXPLAIN display for vectorization > - > > Key: HIVE-11394 > URL: https://issues.apache.org/jira/browse/HIVE-11394 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, > HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, > HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, > HIVE-11394.09.patch, HIVE-11394.091.patch, HIVE-11394.092.patch, > HIVE-11394.093.patch > > > Add detail to the EXPLAIN output showing why a Map and Reduce work is not > vectorized. > New syntax is: EXPLAIN VECTORIZATION \[ONLY\] > \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\] > The ONLY option suppresses most non-vectorization elements. > SUMMARY shows vectorization information for the PLAN (is vectorization > enabled) and a summary of Map and Reduce work. > OPERATOR shows vectorization information for operators. E.g. Filter > Vectorization. It includes all information of SUMMARY, too. > EXPRESSION shows vectorization information for expressions. E.g. > predicateExpression. It includes all information of SUMMARY and OPERATOR, > too. > DETAIL shows very vectorization information. > It includes all information of SUMMARY, OPERATOR, and EXPRESSION too. > The optional clause defaults are not ONLY and SUMMARY. > --- > Here are some examples: > EXPLAIN VECTORIZATION example: > (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization > sections) > Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION > SUMMARY. > Under Reducer 3’s "Reduce Vectorization:" you’ll see > notVectorizedReason: Aggregation Function UDF avg parameter expression for > GROUPBY operator: Data type struct of > Column\[VALUE._col2\] not supported > For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": > "false" which says a node has a GROUP BY with an AVG or some other aggregator > that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators > are row-mode. I.e. not vector output. > If "usesVectorUDFAdaptor:": "false" were true, it would say there was at > least one vectorized expression is using VectorUDFAdaptor. > And, "allNative:": "false" will be true when all operators are native. > Today, GROUP BY and FILE SINK are not native. MAP JOIN and REDUCE SINK are > conditionally native. FILTER and SELECT are native. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > ... > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > ... > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: alltypesorc > Statistics: Num rows: 12288 Data size: 36696 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: cint (type: int) > outputColumnNames: cint > Statistics: Num rows: 12288 Data size: 36696 Basic stats: > COMPLETE Column stats: COMPLETE > Group By Operator > keys: cint (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 5775 Data size: 17248 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 5775 Data size: 17248 Basic > stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized, llap > LLAP IO: all
[jira] [Comment Edited] (HIVE-11394) Enhance EXPLAIN display for vectorization
[ https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576789#comment-15576789 ] Siddharth Seth edited comment on HIVE-11394 at 10/14/16 10:49 PM: -- To clarify my last comment: bq. Matt McCline - the TestMiniLlapLocal failures no longer exist. Likely caused by some changes in the way the tests were running, which I have reverted. The test failures after the revert were fixed. The test still hangs with this patch. was (Author: sseth): To clarify my last comment: bq. Matt McCline - the TestMiniLlapLocal failures no longer exist. Likely caused by some changes in the way the tests were running, which I have reverted. The test failures after the revert were fixed. The test still fails with this patch. > Enhance EXPLAIN display for vectorization > - > > Key: HIVE-11394 > URL: https://issues.apache.org/jira/browse/HIVE-11394 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, > HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, > HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, > HIVE-11394.09.patch, HIVE-11394.091.patch, HIVE-11394.092.patch, > HIVE-11394.093.patch > > > Add detail to the EXPLAIN output showing why a Map and Reduce work is not > vectorized. > New syntax is: EXPLAIN VECTORIZATION \[ONLY\] > \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\] > The ONLY option suppresses most non-vectorization elements. > SUMMARY shows vectorization information for the PLAN (is vectorization > enabled) and a summary of Map and Reduce work. > OPERATOR shows vectorization information for operators. E.g. Filter > Vectorization. It includes all information of SUMMARY, too. > EXPRESSION shows vectorization information for expressions. E.g. > predicateExpression. It includes all information of SUMMARY and OPERATOR, > too. > DETAIL shows very vectorization information. > It includes all information of SUMMARY, OPERATOR, and EXPRESSION too. > The optional clause defaults are not ONLY and SUMMARY. > --- > Here are some examples: > EXPLAIN VECTORIZATION example: > (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization > sections) > Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION > SUMMARY. > Under Reducer 3’s "Reduce Vectorization:" you’ll see > notVectorizedReason: Aggregation Function UDF avg parameter expression for > GROUPBY operator: Data type struct of > Column\[VALUE._col2\] not supported > For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": > "false" which says a node has a GROUP BY with an AVG or some other aggregator > that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators > are row-mode. I.e. not vector output. > If "usesVectorUDFAdaptor:": "false" were true, it would say there was at > least one vectorized expression is using VectorUDFAdaptor. > And, "allNative:": "false" will be true when all operators are native. > Today, GROUP BY and FILE SINK are not native. MAP JOIN and REDUCE SINK are > conditionally native. FILTER and SELECT are native. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > ... > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > ... > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: alltypesorc > Statistics: Num rows: 12288 Data size: 36696 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: cint (type: int) > outputColumnNames: cint > Statistics: Num rows: 12288 Data size: 36696 Basic stats: > COMPLETE Column stats: COMPLETE > Group By Operator > keys: cint (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 5775 Data size: 17248 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) >
[jira] [Comment Edited] (HIVE-11394) Enhance EXPLAIN display for vectorization
[ https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576059#comment-15576059 ] Siddharth Seth edited comment on HIVE-11394 at 10/14/16 6:10 PM: - This would indicate that some task is still running. Since everything runs inline, the AM logs show up in the same log file. was (Author: sseth): This would indicate that some task is still running. > Enhance EXPLAIN display for vectorization > - > > Key: HIVE-11394 > URL: https://issues.apache.org/jira/browse/HIVE-11394 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, > HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, > HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, > HIVE-11394.09.patch, HIVE-11394.091.patch, HIVE-11394.092.patch, > HIVE-11394.093.patch > > > Add detail to the EXPLAIN output showing why a Map and Reduce work is not > vectorized. > New syntax is: EXPLAIN VECTORIZATION \[ONLY\] > \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\] > The ONLY option suppresses most non-vectorization elements. > SUMMARY shows vectorization information for the PLAN (is vectorization > enabled) and a summary of Map and Reduce work. > OPERATOR shows vectorization information for operators. E.g. Filter > Vectorization. It includes all information of SUMMARY, too. > EXPRESSION shows vectorization information for expressions. E.g. > predicateExpression. It includes all information of SUMMARY and OPERATOR, > too. > DETAIL shows very vectorization information. > It includes all information of SUMMARY, OPERATOR, and EXPRESSION too. > The optional clause defaults are not ONLY and SUMMARY. > --- > Here are some examples: > EXPLAIN VECTORIZATION example: > (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization > sections) > Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION > SUMMARY. > Under Reducer 3’s "Reduce Vectorization:" you’ll see > notVectorizedReason: Aggregation Function UDF avg parameter expression for > GROUPBY operator: Data type struct of > Column\[VALUE._col2\] not supported > For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": > "false" which says a node has a GROUP BY with an AVG or some other aggregator > that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators > are row-mode. I.e. not vector output. > If "usesVectorUDFAdaptor:": "false" were true, it would say there was at > least one vectorized expression is using VectorUDFAdaptor. > And, "allNative:": "false" will be true when all operators are native. > Today, GROUP BY and FILE SINK are not native. MAP JOIN and REDUCE SINK are > conditionally native. FILTER and SELECT are native. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > ... > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > ... > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: alltypesorc > Statistics: Num rows: 12288 Data size: 36696 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: cint (type: int) > outputColumnNames: cint > Statistics: Num rows: 12288 Data size: 36696 Basic stats: > COMPLETE Column stats: COMPLETE > Group By Operator > keys: cint (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 5775 Data size: 17248 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 5775 Data size: 17248 Basic > stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true >
[jira] [Comment Edited] (HIVE-11394) Enhance EXPLAIN display for vectorization
[ https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574478#comment-15574478 ] Siddharth Seth edited comment on HIVE-11394 at 10/14/16 7:06 AM: - bq. TestMiniLlapLocalCliDriver-orc_llap.q-delete_where_non_partitioned.q-vector_groupby_mapjoin.q-and-27-more - did not produce a TEST-*.xml file [~mmccline] - this patch causes MiniLlapLocal tests to fail. Digging deeper (diff between batches on subsequent runs) - the specific test that is failing is orc_llap.q. Can you please revert the patch, or provide a fix. To clarify - messages like these are generated when tests time out. They won't show up on the jenkins test report, since there's no test XML file available to process. orc_llap.q times out after this patch. was (Author: sseth): bq. TestMiniLlapLocalCliDriver-orc_llap.q-delete_where_non_partitioned.q-vector_groupby_mapjoin.q-and-27-more - did not produce a TEST-*.xml file [~mmccline] - this patch causes MiniLlapLocal tests to fail. Digging deeper (diff between batches on subsequent runs) - the specific test that is failing is orc_llap.q. Can you please revert the patch, or provide a fix. > Enhance EXPLAIN display for vectorization > - > > Key: HIVE-11394 > URL: https://issues.apache.org/jira/browse/HIVE-11394 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, > HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, > HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, > HIVE-11394.09.patch, HIVE-11394.091.patch, HIVE-11394.092.patch, > HIVE-11394.093.patch > > > Add detail to the EXPLAIN output showing why a Map and Reduce work is not > vectorized. > New syntax is: EXPLAIN VECTORIZATION \[ONLY\] > \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\] > The ONLY option suppresses most non-vectorization elements. > SUMMARY shows vectorization information for the PLAN (is vectorization > enabled) and a summary of Map and Reduce work. > OPERATOR shows vectorization information for operators. E.g. Filter > Vectorization. It includes all information of SUMMARY, too. > EXPRESSION shows vectorization information for expressions. E.g. > predicateExpression. It includes all information of SUMMARY and OPERATOR, > too. > DETAIL shows very vectorization information. > It includes all information of SUMMARY, OPERATOR, and EXPRESSION too. > The optional clause defaults are not ONLY and SUMMARY. > --- > Here are some examples: > EXPLAIN VECTORIZATION example: > (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization > sections) > Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION > SUMMARY. > Under Reducer 3’s "Reduce Vectorization:" you’ll see > notVectorizedReason: Aggregation Function UDF avg parameter expression for > GROUPBY operator: Data type struct of > Column\[VALUE._col2\] not supported > For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": > "false" which says a node has a GROUP BY with an AVG or some other aggregator > that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators > are row-mode. I.e. not vector output. > If "usesVectorUDFAdaptor:": "false" were true, it would say there was at > least one vectorized expression is using VectorUDFAdaptor. > And, "allNative:": "false" will be true when all operators are native. > Today, GROUP BY and FILE SINK are not native. MAP JOIN and REDUCE SINK are > conditionally native. FILTER and SELECT are native. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > ... > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > ... > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: alltypesorc > Statistics: Num rows: 12288 Data size: 36696 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: cint (type: int) > outputColumnNames: cint > Statistics: Num rows: 12288 Data size: 36696 Basic stats: > COMPLETE Column stats: COMPLETE > Group By Operator > keys: cint (type: int) > mode: hash >