[jira] [Commented] (HIVE-13125) Support masking and filtering of rows/columns

2016-03-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197183#comment-15197183
 ] 

Hive QA commented on HIVE-13125:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12793423/HIVE-13125.02.patch

{color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9821 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-vector_decimal_round.q-cbo_windowing.q-tez_schema_evolution.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_coltype_literals
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7281/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7281/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7281/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12793423 - PreCommit-HIVE-TRUNK-Build

> Support masking and filtering of rows/columns
> -
>
> Key: HIVE-13125
> URL: https://issues.apache.org/jira/browse/HIVE-13125
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13125.01.patch, HIVE-13125.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10067) LLAP: read file ID when generating splits to avoid extra NN call in the tasks

2016-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197066#comment-15197066
 ] 

Lefty Leverenz commented on HIVE-10067:
---

More doc notes:  *hive.orc.splits.include.fileid* was committed to master for 
release 2.0.0 by HIVE-11542, and a typo in the parameter description was fixed 
for release 2.1.0 by HIVE-11675.

> LLAP: read file ID when generating splits to avoid extra NN call in the tasks
> -
>
> Key: HIVE-10067
> URL: https://issues.apache.org/jira/browse/HIVE-10067
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
> Attachments: HIVE-10067.01.patch, HIVE-10067.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11542) port fileId support on shims and splits from llap branch

2016-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197061#comment-15197061
 ] 

Lefty Leverenz commented on HIVE-11542:
---

HIVE-11675 fixes the typo in the description of 
*hive.orc.splits.include.fileid* for release 2.1.0.

By the way, this parameter was originally introduced in the llap branch by 
HIVE-10067.

> port fileId support on shims and splits from llap branch
> 
>
> Key: HIVE-11542
> URL: https://issues.apache.org/jira/browse/HIVE-11542
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11542.patch
>
>
> This is helpful for any kind of file-based cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy

2016-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197056#comment-15197056
 ] 

Lefty Leverenz commented on HIVE-11675:
---

Doc note:  This adds configuration parameter 
*hive.orc.splits.ms.footer.cache.ppd.enabled* to HiveConf.java, so it needs to 
be documented in the ORC section of Configuration Properties for release 2.1.0.

* [Configuration Properties -- ORC File Format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat]

This also fixes a typo in the description of *hive.orc.splits.include.fileid*, 
which was added to the llap branch by HIVE-10067 and to master for release 
2.0.0 by HIVE-11542.

> make use of file footer PPD API in ETL strategy or separate strategy
> 
>
> Key: HIVE-11675
> URL: https://issues.apache.org/jira/browse/HIVE-11675
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, 
> HIVE-11675.03.patch, HIVE-11675.04.patch, HIVE-11675.05.patch, 
> HIVE-11675.06.patch, HIVE-11675.07.patch, HIVE-11675.08.patch, 
> HIVE-11675.09.patch, HIVE-11675.10.patch, HIVE-11675.11.patch, 
> HIVE-11675.12.patch, HIVE-11675.13.patch, HIVE-11675.14.patch, 
> HIVE-11675.patch, HIVE-11675.premature.opti.patch
>
>
> Need to take a look at the best flow. It won't be much different if we do 
> filtering metastore call for each partition. So perhaps we'd need the custom 
> sync point/batching after all.
> Or we can make it opportunistic and not fetch any footers unless it can be 
> pushed down to metastore or fetched from local cache, that way the only slow 
> threaded op is directory listings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11527) bypass HiveServer2 thrift interface for query results

2016-03-16 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197055#comment-15197055
 ] 

Takanobu Asanuma commented on HIVE-11527:
-

Hi [~sershe], [~vgumashta]

Thanks to the discussion with [~jingzhao], I implemented features for handling 
HA. I uploaded a WIP patch on RB. Please review it.

I will continue the rest of the work.
・Until now, I assumed that intermediate results' format is the text format. But 
we need to make jdbc clients decode other file formats. That is even more 
important since sequence file is the default format for intermediate results, 
which was currently implemented in HIVE-1608.
・Considering multiple intermediate files.
・Adding some unit tests. HA tests depend on HIVE-13268 (Could you also review 
this?). 

Thanks.

> bypass HiveServer2 thrift interface for query results
> -
>
> Key: HIVE-11527
> URL: https://issues.apache.org/jira/browse/HIVE-11527
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sergey Shelukhin
>Assignee: Takanobu Asanuma
> Attachments: HIVE-11527.WIP.patch
>
>
> Right now, HS2 reads query results and returns them to the caller via its 
> thrift API.
> There should be an option for HS2 to return some pointer to results (an HDFS 
> link?) and for the user to read the results directly off HDFS inside the 
> cluster, or via something like WebHDFS outside the cluster
> Review board link: https://reviews.apache.org/r/40867



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy

2016-03-16 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-11675:
--
Labels: TODOC2.1  (was: )

> make use of file footer PPD API in ETL strategy or separate strategy
> 
>
> Key: HIVE-11675
> URL: https://issues.apache.org/jira/browse/HIVE-11675
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, 
> HIVE-11675.03.patch, HIVE-11675.04.patch, HIVE-11675.05.patch, 
> HIVE-11675.06.patch, HIVE-11675.07.patch, HIVE-11675.08.patch, 
> HIVE-11675.09.patch, HIVE-11675.10.patch, HIVE-11675.11.patch, 
> HIVE-11675.12.patch, HIVE-11675.13.patch, HIVE-11675.14.patch, 
> HIVE-11675.patch, HIVE-11675.premature.opti.patch
>
>
> Need to take a look at the best flow. It won't be much different if we do 
> filtering metastore call for each partition. So perhaps we'd need the custom 
> sync point/batching after all.
> Or we can make it opportunistic and not fetch any footers unless it can be 
> pushed down to metastore or fetched from local cache, that way the only slow 
> threaded op is directory listings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching

2016-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197017#comment-15197017
 ] 

Lefty Leverenz commented on HIVE-7926:
--

Nudge:  The LLAP design doc still needs to be added to the wiki.

* 
https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf

> long-lived daemons for query fragment execution, I/O and caching
> 
>
> Key: HIVE-7926
> URL: https://issues.apache.org/jira/browse/HIVE-7926
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: LLAPdesigndocument.pdf
>
>
> We are proposing a new execution model for Hive that is a combination of 
> existing process-based tasks and long-lived daemons running on worker nodes. 
> These nodes can take care of efficient I/O, caching and query fragment 
> execution, while heavy lifting like most joins, ordering, etc. can be handled 
> by tasks.
> The proposed model is not a 2-system solution for small and large queries; 
> neither it is a separate execution engine like MR or Tez. It can be used by 
> any Hive execution engine, if support is added; in future even external 
> products (e.g. Pig) can use it.
> The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12995) LLAP: Synthetic file ids need collision checks

2016-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197014#comment-15197014
 ] 

Lefty Leverenz commented on HIVE-12995:
---

Doc note:  This adds configuration parameter 
*hive.orc.splits.allow.synthetic.fileid* to HiveConf.java, so it will need to 
be documented in the ORC section of Configuration Properties for release 2.1.0.

* [Configuration Properties -- ORC File Format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat]

Should it also be mentioned in the llap documentation?  (The parameter 
description doesn't say anything about llap.)

* [LLAP design document | 
https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf]
 attached to HIVE-7926

> LLAP: Synthetic file ids need collision checks
> --
>
> Key: HIVE-12995
> URL: https://issues.apache.org/jira/browse/HIVE-12995
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-12995.01.patch, HIVE-12995.02.patch, 
> HIVE-12995.03.patch, HIVE-12995.04.patch, HIVE-12995.patch
>
>
> LLAP synthetic file ids do not have any way of checking whether a collision 
> occurs other than a data-error.
> Synthetic file-ids have only been used with unit tests so far - but they will 
> be needed to add cache mechanisms to non-HDFS filesystems.
> In case of Synthetic file-ids, it is recommended that we track the full-tuple 
> (path, mtime, len) in the cache so that a cache-hit for the synthetic file-id 
> can be compared against the parameters & only accepted if those match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12995) LLAP: Synthetic file ids need collision checks

2016-03-16 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-12995:
--
Labels: TODOC2.1  (was: )

> LLAP: Synthetic file ids need collision checks
> --
>
> Key: HIVE-12995
> URL: https://issues.apache.org/jira/browse/HIVE-12995
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-12995.01.patch, HIVE-12995.02.patch, 
> HIVE-12995.03.patch, HIVE-12995.04.patch, HIVE-12995.patch
>
>
> LLAP synthetic file ids do not have any way of checking whether a collision 
> occurs other than a data-error.
> Synthetic file-ids have only been used with unit tests so far - but they will 
> be needed to add cache mechanisms to non-HDFS filesystems.
> In case of Synthetic file-ids, it is recommended that we track the full-tuple 
> (path, mtime, len) in the cache so that a cache-hit for the synthetic file-id 
> can be compared against the parameters & only accepted if those match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on

2016-03-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196970#comment-15196970
 ] 

Hive QA commented on HIVE-13235:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12793406/HIVE-13235.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9829 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_constprog_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7280/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7280/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7280/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12793406 - PreCommit-HIVE-TRUNK-Build

> Insert from select generates incorrect result when 
> hive.optimize.constant.propagation is on
> ---
>
> Key: HIVE-13235
> URL: https://issues.apache.org/jira/browse/HIVE-13235
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch
>
>
> The following query returns incorrect result when constant optimization is 
> turned on. The subquery happens to have an alias p1 to be the same as the 
> input partition name. Constant optimizer will optimize it incorrectly as the 
> constant.
> When constant optimizer is turned off, we will get the correct result.
> {noformat}
> set hive.cbo.enable=false;
> set hive.optimize.constant.propagation = true;
> create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string);
> create table t2(p1 double, c2 string);
> insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0);
> INSERT OVERWRITE TABLE t2  select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from 
> t1 where c1 = 'c1' and p1 = '40';
> select * from t2;
> 40   2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13183) More logs in operation logs

2016-03-16 Thread Rajat Khandelwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196931#comment-15196931
 ] 

Rajat Khandelwal commented on HIVE-13183:
-

Taking patch from reviewboard and attaching

> More logs in operation logs
> ---
>
> Key: HIVE-13183
> URL: https://issues.apache.org/jira/browse/HIVE-13183
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13183.02.patch, HIVE-13183.03.patch, 
> HIVE-13183.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13183) More logs in operation logs

2016-03-16 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-13183:

Attachment: HIVE-13183.04.patch

> More logs in operation logs
> ---
>
> Key: HIVE-13183
> URL: https://issues.apache.org/jira/browse/HIVE-13183
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13183.02.patch, HIVE-13183.03.patch, 
> HIVE-13183.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-16 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13084:

Attachment: HIVE-13084.04.patch

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> HIVE-13084.03.patch, HIVE-13084.04.patch, vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-16 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13084:

Status: Patch Available  (was: In Progress)

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> HIVE-13084.03.patch, HIVE-13084.04.patch, vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-16 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13084:

Status: In Progress  (was: Patch Available)

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> HIVE-13084.03.patch, vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-16 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196909#comment-15196909
 ] 

Matt McCline commented on HIVE-13084:
-

Ok, next patch has bug fixes for LongColLongScalar

LongColEqualLongScalar.java
LongColGreaterEqualLongScalar.java
LongColGreaterLongScalar.java
LongColLessEqualLongScalar.java
LongColLessLongScalar.java
LongColNotEqualLongScalar.java

These change need to be put into a separate JIRA.

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> HIVE-13084.03.patch, vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13293) Query occurs performance degradation after enabling parallel order by for Hive on sprak

2016-03-16 Thread Lifeng Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lifeng Wang updated HIVE-13293:
---
Description: 
I use TPCx-BB to do some performance test on Hive on Spark engine. And found 
query 10 has performance degradation when enabling parallel order by.
It seems that sampling cost much time before running the real query.

  was:I use TPCx-BB to do some performance test on Hive on Spark engine. And 
found query 10 has performance degradation when enabling parallel order by.


> Query occurs performance degradation after enabling parallel order by for 
> Hive on sprak
> ---
>
> Key: HIVE-13293
> URL: https://issues.apache.org/jira/browse/HIVE-13293
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.0.0
>Reporter: Lifeng Wang
>
> I use TPCx-BB to do some performance test on Hive on Spark engine. And found 
> query 10 has performance degradation when enabling parallel order by.
> It seems that sampling cost much time before running the real query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13293) Query occurs performance degradation after enabling parallel order by for Hive on sprak

2016-03-16 Thread Lifeng Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lifeng Wang updated HIVE-13293:
---
Assignee: (was: Xuefu Zhang)

> Query occurs performance degradation after enabling parallel order by for 
> Hive on sprak
> ---
>
> Key: HIVE-13293
> URL: https://issues.apache.org/jira/browse/HIVE-13293
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.0.0
>Reporter: Lifeng Wang
>
> I use TPCx-BB to do some performance test on Hive on Spark engine. And found 
> query 10 has performance degradation when enabling parallel order by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-16 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196821#comment-15196821
 ] 

Matt McCline commented on HIVE-13084:
-

For selectedInUse (and !noNulls) it does this to copy the isNull array to the 
output vector:

{code}
System.arraycopy(nullPos, 0, outNulls, 0, n);
{code}

but this is wrong because the selected array could by n=5, but it contents 
could be {7,20,21,104,900} and the copy will not copy the right isNull 
values.

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> HIVE-13084.03.patch, vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13292) Different DOUBLE type precision issue between Spark and MR engine

2016-03-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196819#comment-15196819
 ] 

Sergey Shelukhin commented on HIVE-13292:
-

With double type, it's usually by design

> Different DOUBLE type precision issue between Spark and MR engine
> -
>
> Key: HIVE-13292
> URL: https://issues.apache.org/jira/browse/HIVE-13292
> Project: Hive
>  Issue Type: Bug
> Environment: Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>
> Different DOUBLE type precision issue between Spark and MR engine.
> Found when executing the TPC-H query5 with scale factor 2 (2GB data size). 
> More details are as below.
> (1)The MR engine output:
> MOZAMBIQUE,1.0646195910990009E8
> ETHIOPIA,1.0108856206629996E8
> ALGERIA,9.987582690420012E7
> MOROCCO,9.785484184850013E7
> KENYA,9.412388077690017E7
> (2)The Spark engine output:
> MOZAMBIQUE,1.064619591099E8
> ETHIOPIA,1.0108856206630005E8
> ALGERIA,9.987582690419997E7
> MOROCCO,9.785484184850003E7
> KENYA,9.412388077690002E7
> (3)Detail SQL used:
> drop table if exists ${env:RESULT_TABLE};
> create table ${env:RESULT_TABLE} (
>   pid1 STRING,
>   pid2 DOUBLE
> )
> row format delimited fields terminated by ',' lines terminated by '\n'
> stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location 
> '${env:RESULT_DIR}';
> insert into table ${env:RESULT_TABLE}
> select
> n_name,
> sum(l_extendedprice * (1 - l_discount)) as revenue
> from
> customer,
> orders,
> lineitem,
> supplier,
> nation,
> region
> where
> c_custkey = o_custkey
> and l_orderkey = o_orderkey
> and l_suppkey = s_suppkey
> and c_nationkey = s_nationkey
> and s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = 'AFRICA'
> and o_orderdate >= '1993-01-01'
> and o_orderdate < '1994-01-01'
> group by
> n_name
> order by
> revenue desc;
> (4)Similar issue also exists even after we simplified original query to a 
> simpler one as below:
> drop table if exists ${env:RESULT_TABLE};
> create table ${env:RESULT_TABLE} (
>   pid2 DOUBLE
> )
> row format delimited fields terminated by ',' lines terminated by '\n'
> stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location 
> '${env:RESULT_DIR}';
> insert into table ${env:RESULT_TABLE}
> select
> sum(l_extendedprice * (1 - l_discount)) as revenue
> from
> lineitem
> group by
> l_orderkey
> order by
> revenue;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)