[jira] [Commented] (HIVE-13125) Support masking and filtering of rows/columns
[ https://issues.apache.org/jira/browse/HIVE-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197183#comment-15197183 ] Hive QA commented on HIVE-13125: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793423/HIVE-13125.02.patch {color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9821 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-vector_decimal_round.q-cbo_windowing.q-tez_schema_evolution.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_coltype_literals {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7281/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7281/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7281/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793423 - PreCommit-HIVE-TRUNK-Build > Support masking and filtering of rows/columns > - > > Key: HIVE-13125 > URL: https://issues.apache.org/jira/browse/HIVE-13125 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13125.01.patch, HIVE-13125.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10067) LLAP: read file ID when generating splits to avoid extra NN call in the tasks
[ https://issues.apache.org/jira/browse/HIVE-10067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197066#comment-15197066 ] Lefty Leverenz commented on HIVE-10067: --- More doc notes: *hive.orc.splits.include.fileid* was committed to master for release 2.0.0 by HIVE-11542, and a typo in the parameter description was fixed for release 2.1.0 by HIVE-11675. > LLAP: read file ID when generating splits to avoid extra NN call in the tasks > - > > Key: HIVE-10067 > URL: https://issues.apache.org/jira/browse/HIVE-10067 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: llap > > Attachments: HIVE-10067.01.patch, HIVE-10067.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11542) port fileId support on shims and splits from llap branch
[ https://issues.apache.org/jira/browse/HIVE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197061#comment-15197061 ] Lefty Leverenz commented on HIVE-11542: --- HIVE-11675 fixes the typo in the description of *hive.orc.splits.include.fileid* for release 2.1.0. By the way, this parameter was originally introduced in the llap branch by HIVE-10067. > port fileId support on shims and splits from llap branch > > > Key: HIVE-11542 > URL: https://issues.apache.org/jira/browse/HIVE-11542 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11542.patch > > > This is helpful for any kind of file-based cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy
[ https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197056#comment-15197056 ] Lefty Leverenz commented on HIVE-11675: --- Doc note: This adds configuration parameter *hive.orc.splits.ms.footer.cache.ppd.enabled* to HiveConf.java, so it needs to be documented in the ORC section of Configuration Properties for release 2.1.0. * [Configuration Properties -- ORC File Format | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat] This also fixes a typo in the description of *hive.orc.splits.include.fileid*, which was added to the llap branch by HIVE-10067 and to master for release 2.0.0 by HIVE-11542. > make use of file footer PPD API in ETL strategy or separate strategy > > > Key: HIVE-11675 > URL: https://issues.apache.org/jira/browse/HIVE-11675 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, > HIVE-11675.03.patch, HIVE-11675.04.patch, HIVE-11675.05.patch, > HIVE-11675.06.patch, HIVE-11675.07.patch, HIVE-11675.08.patch, > HIVE-11675.09.patch, HIVE-11675.10.patch, HIVE-11675.11.patch, > HIVE-11675.12.patch, HIVE-11675.13.patch, HIVE-11675.14.patch, > HIVE-11675.patch, HIVE-11675.premature.opti.patch > > > Need to take a look at the best flow. It won't be much different if we do > filtering metastore call for each partition. So perhaps we'd need the custom > sync point/batching after all. > Or we can make it opportunistic and not fetch any footers unless it can be > pushed down to metastore or fetched from local cache, that way the only slow > threaded op is directory listings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11527) bypass HiveServer2 thrift interface for query results
[ https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197055#comment-15197055 ] Takanobu Asanuma commented on HIVE-11527: - Hi [~sershe], [~vgumashta] Thanks to the discussion with [~jingzhao], I implemented features for handling HA. I uploaded a WIP patch on RB. Please review it. I will continue the rest of the work. ・Until now, I assumed that intermediate results' format is the text format. But we need to make jdbc clients decode other file formats. That is even more important since sequence file is the default format for intermediate results, which was currently implemented in HIVE-1608. ・Considering multiple intermediate files. ・Adding some unit tests. HA tests depend on HIVE-13268 (Could you also review this?). Thanks. > bypass HiveServer2 thrift interface for query results > - > > Key: HIVE-11527 > URL: https://issues.apache.org/jira/browse/HIVE-11527 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sergey Shelukhin >Assignee: Takanobu Asanuma > Attachments: HIVE-11527.WIP.patch > > > Right now, HS2 reads query results and returns them to the caller via its > thrift API. > There should be an option for HS2 to return some pointer to results (an HDFS > link?) and for the user to read the results directly off HDFS inside the > cluster, or via something like WebHDFS outside the cluster > Review board link: https://reviews.apache.org/r/40867 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy
[ https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-11675: -- Labels: TODOC2.1 (was: ) > make use of file footer PPD API in ETL strategy or separate strategy > > > Key: HIVE-11675 > URL: https://issues.apache.org/jira/browse/HIVE-11675 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, > HIVE-11675.03.patch, HIVE-11675.04.patch, HIVE-11675.05.patch, > HIVE-11675.06.patch, HIVE-11675.07.patch, HIVE-11675.08.patch, > HIVE-11675.09.patch, HIVE-11675.10.patch, HIVE-11675.11.patch, > HIVE-11675.12.patch, HIVE-11675.13.patch, HIVE-11675.14.patch, > HIVE-11675.patch, HIVE-11675.premature.opti.patch > > > Need to take a look at the best flow. It won't be much different if we do > filtering metastore call for each partition. So perhaps we'd need the custom > sync point/batching after all. > Or we can make it opportunistic and not fetch any footers unless it can be > pushed down to metastore or fetched from local cache, that way the only slow > threaded op is directory listings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
[ https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197017#comment-15197017 ] Lefty Leverenz commented on HIVE-7926: -- Nudge: The LLAP design doc still needs to be added to the wiki. * https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf > long-lived daemons for query fragment execution, I/O and caching > > > Key: HIVE-7926 > URL: https://issues.apache.org/jira/browse/HIVE-7926 > Project: Hive > Issue Type: New Feature >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: LLAPdesigndocument.pdf > > > We are proposing a new execution model for Hive that is a combination of > existing process-based tasks and long-lived daemons running on worker nodes. > These nodes can take care of efficient I/O, caching and query fragment > execution, while heavy lifting like most joins, ordering, etc. can be handled > by tasks. > The proposed model is not a 2-system solution for small and large queries; > neither it is a separate execution engine like MR or Tez. It can be used by > any Hive execution engine, if support is added; in future even external > products (e.g. Pig) can use it. > The document with high-level design we are proposing will be attached shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12995) LLAP: Synthetic file ids need collision checks
[ https://issues.apache.org/jira/browse/HIVE-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197014#comment-15197014 ] Lefty Leverenz commented on HIVE-12995: --- Doc note: This adds configuration parameter *hive.orc.splits.allow.synthetic.fileid* to HiveConf.java, so it will need to be documented in the ORC section of Configuration Properties for release 2.1.0. * [Configuration Properties -- ORC File Format | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat] Should it also be mentioned in the llap documentation? (The parameter description doesn't say anything about llap.) * [LLAP design document | https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf] attached to HIVE-7926 > LLAP: Synthetic file ids need collision checks > -- > > Key: HIVE-12995 > URL: https://issues.apache.org/jira/browse/HIVE-12995 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-12995.01.patch, HIVE-12995.02.patch, > HIVE-12995.03.patch, HIVE-12995.04.patch, HIVE-12995.patch > > > LLAP synthetic file ids do not have any way of checking whether a collision > occurs other than a data-error. > Synthetic file-ids have only been used with unit tests so far - but they will > be needed to add cache mechanisms to non-HDFS filesystems. > In case of Synthetic file-ids, it is recommended that we track the full-tuple > (path, mtime, len) in the cache so that a cache-hit for the synthetic file-id > can be compared against the parameters & only accepted if those match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12995) LLAP: Synthetic file ids need collision checks
[ https://issues.apache.org/jira/browse/HIVE-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-12995: -- Labels: TODOC2.1 (was: ) > LLAP: Synthetic file ids need collision checks > -- > > Key: HIVE-12995 > URL: https://issues.apache.org/jira/browse/HIVE-12995 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-12995.01.patch, HIVE-12995.02.patch, > HIVE-12995.03.patch, HIVE-12995.04.patch, HIVE-12995.patch > > > LLAP synthetic file ids do not have any way of checking whether a collision > occurs other than a data-error. > Synthetic file-ids have only been used with unit tests so far - but they will > be needed to add cache mechanisms to non-HDFS filesystems. > In case of Synthetic file-ids, it is recommended that we track the full-tuple > (path, mtime, len) in the cache so that a cache-hit for the synthetic file-id > can be compared against the parameters & only accepted if those match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196970#comment-15196970 ] Hive QA commented on HIVE-13235: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793406/HIVE-13235.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9829 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_constprog_semijoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7280/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7280/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7280/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793406 - PreCommit-HIVE-TRUNK-Build > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13183) More logs in operation logs
[ https://issues.apache.org/jira/browse/HIVE-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196931#comment-15196931 ] Rajat Khandelwal commented on HIVE-13183: - Taking patch from reviewboard and attaching > More logs in operation logs > --- > > Key: HIVE-13183 > URL: https://issues.apache.org/jira/browse/HIVE-13183 > Project: Hive > Issue Type: Improvement >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Attachments: HIVE-13183.02.patch, HIVE-13183.03.patch, > HIVE-13183.04.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13183) More logs in operation logs
[ https://issues.apache.org/jira/browse/HIVE-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajat Khandelwal updated HIVE-13183: Attachment: HIVE-13183.04.patch > More logs in operation logs > --- > > Key: HIVE-13183 > URL: https://issues.apache.org/jira/browse/HIVE-13183 > Project: Hive > Issue Type: Improvement >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Attachments: HIVE-13183.02.patch, HIVE-13183.03.patch, > HIVE-13183.04.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13084: Attachment: HIVE-13084.04.patch > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > HIVE-13084.03.patch, HIVE-13084.04.patch, vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13084: Status: Patch Available (was: In Progress) > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > HIVE-13084.03.patch, HIVE-13084.04.patch, vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13084: Status: In Progress (was: Patch Available) > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > HIVE-13084.03.patch, vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196909#comment-15196909 ] Matt McCline commented on HIVE-13084: - Ok, next patch has bug fixes for LongColLongScalar LongColEqualLongScalar.java LongColGreaterEqualLongScalar.java LongColGreaterLongScalar.java LongColLessEqualLongScalar.java LongColLessLongScalar.java LongColNotEqualLongScalar.java These change need to be put into a separate JIRA. > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > HIVE-13084.03.patch, vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13293) Query occurs performance degradation after enabling parallel order by for Hive on sprak
[ https://issues.apache.org/jira/browse/HIVE-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lifeng Wang updated HIVE-13293: --- Description: I use TPCx-BB to do some performance test on Hive on Spark engine. And found query 10 has performance degradation when enabling parallel order by. It seems that sampling cost much time before running the real query. was:I use TPCx-BB to do some performance test on Hive on Spark engine. And found query 10 has performance degradation when enabling parallel order by. > Query occurs performance degradation after enabling parallel order by for > Hive on sprak > --- > > Key: HIVE-13293 > URL: https://issues.apache.org/jira/browse/HIVE-13293 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Lifeng Wang > > I use TPCx-BB to do some performance test on Hive on Spark engine. And found > query 10 has performance degradation when enabling parallel order by. > It seems that sampling cost much time before running the real query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13293) Query occurs performance degradation after enabling parallel order by for Hive on sprak
[ https://issues.apache.org/jira/browse/HIVE-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lifeng Wang updated HIVE-13293: --- Assignee: (was: Xuefu Zhang) > Query occurs performance degradation after enabling parallel order by for > Hive on sprak > --- > > Key: HIVE-13293 > URL: https://issues.apache.org/jira/browse/HIVE-13293 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Lifeng Wang > > I use TPCx-BB to do some performance test on Hive on Spark engine. And found > query 10 has performance degradation when enabling parallel order by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196821#comment-15196821 ] Matt McCline commented on HIVE-13084: - For selectedInUse (and !noNulls) it does this to copy the isNull array to the output vector: {code} System.arraycopy(nullPos, 0, outNulls, 0, n); {code} but this is wrong because the selected array could by n=5, but it contents could be {7,20,21,104,900} and the copy will not copy the right isNull values. > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > HIVE-13084.03.patch, vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13292) Different DOUBLE type precision issue between Spark and MR engine
[ https://issues.apache.org/jira/browse/HIVE-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196819#comment-15196819 ] Sergey Shelukhin commented on HIVE-13292: - With double type, it's usually by design > Different DOUBLE type precision issue between Spark and MR engine > - > > Key: HIVE-13292 > URL: https://issues.apache.org/jira/browse/HIVE-13292 > Project: Hive > Issue Type: Bug > Environment: Apache Hive 2.0.0 > Apache Spark 1.6.0 >Reporter: Xin Hao > > Different DOUBLE type precision issue between Spark and MR engine. > Found when executing the TPC-H query5 with scale factor 2 (2GB data size). > More details are as below. > (1)The MR engine output: > MOZAMBIQUE,1.0646195910990009E8 > ETHIOPIA,1.0108856206629996E8 > ALGERIA,9.987582690420012E7 > MOROCCO,9.785484184850013E7 > KENYA,9.412388077690017E7 > (2)The Spark engine output: > MOZAMBIQUE,1.064619591099E8 > ETHIOPIA,1.0108856206630005E8 > ALGERIA,9.987582690419997E7 > MOROCCO,9.785484184850003E7 > KENYA,9.412388077690002E7 > (3)Detail SQL used: > drop table if exists ${env:RESULT_TABLE}; > create table ${env:RESULT_TABLE} ( > pid1 STRING, > pid2 DOUBLE > ) > row format delimited fields terminated by ',' lines terminated by '\n' > stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location > '${env:RESULT_DIR}'; > insert into table ${env:RESULT_TABLE} > select > n_name, > sum(l_extendedprice * (1 - l_discount)) as revenue > from > customer, > orders, > lineitem, > supplier, > nation, > region > where > c_custkey = o_custkey > and l_orderkey = o_orderkey > and l_suppkey = s_suppkey > and c_nationkey = s_nationkey > and s_nationkey = n_nationkey > and n_regionkey = r_regionkey > and r_name = 'AFRICA' > and o_orderdate >= '1993-01-01' > and o_orderdate < '1994-01-01' > group by > n_name > order by > revenue desc; > (4)Similar issue also exists even after we simplified original query to a > simpler one as below: > drop table if exists ${env:RESULT_TABLE}; > create table ${env:RESULT_TABLE} ( > pid2 DOUBLE > ) > row format delimited fields terminated by ',' lines terminated by '\n' > stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location > '${env:RESULT_DIR}'; > insert into table ${env:RESULT_TABLE} > select > sum(l_extendedprice * (1 - l_discount)) as revenue > from > lineitem > group by > l_orderkey > order by > revenue; -- This message was sent by Atlassian JIRA (v6.3.4#6332)