date:20161009

[jira] [Commented] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

2016-10-09 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561385#comment-15561385
 ] 

Ferdinand Xu commented on HIVE-14919:
-

cc [~kellyzly] [~dapengsun]

> Improve the performance of Hive on Spark 2.0.0
> --
>
> Key: HIVE-14919
> URL: https://issues.apache.org/jira/browse/HIVE-14919
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: benchmark.xlsx
>
>
> In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
> BigBench[1] to run benchmark with Spark 2.0 over 10 GB data set comparing 
> with Spark 1.6. We can see quite some performance degradation for most of the 
> queries for BigBench. For detailed information, please see the attached file 
> for detailed information. This JIRA is the umbrella ticket addressing those 
> performance issues.
> [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

2016-10-09 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14919:

Description: 
In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
BigBench[1] to run benchmark with Spark 2.0 over 10 GB data set comparing with 
Spark 1.6. We can see quite some performance degradation for most of the 
queries for BigBench. For detailed information, please see the attached file 
for detailed information. This JIRA is the umbrella ticket addressing those 
performance issues.

[1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench

  was:
In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
BigBench[1] to run benchmark over 10 GB data set comparing with Spark 1.6. We 
can see quite some performance degradations for all the queries of BigBench. 
For detailed information, please see the attached files. This JIRA is the 
umbrella ticket addressing those performance issues.

[1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench


> Improve the performance of Hive on Spark 2.0.0
> --
>
> Key: HIVE-14919
> URL: https://issues.apache.org/jira/browse/HIVE-14919
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: benchmark.xlsx
>
>
> In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
> BigBench[1] to run benchmark with Spark 2.0 over 10 GB data set comparing 
> with Spark 1.6. We can see quite some performance degradation for most of the 
> queries for BigBench. For detailed information, please see the attached file 
> for detailed information. This JIRA is the umbrella ticket addressing those 
> performance issues.
> [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

2016-10-09 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14919:

Attachment: benchmark.xlsx

> Improve the performance of Hive on Spark 2.0.0
> --
>
> Key: HIVE-14919
> URL: https://issues.apache.org/jira/browse/HIVE-14919
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: benchmark.xlsx
>
>
> In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
> BigBench[1] to run benchmark over 10 GB data set comparing with Spark 1.6. We 
> can see quite some performance degradations for all the queries of BigBench. 
> For detailed information, please see the attached files. This JIRA is the 
> umbrella ticket addressing those performance issues.
> [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13873) Column pruning for nested fields

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561282#comment-15561282
 ] 

Hive QA commented on HIVE-13873:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832398/HIVE-13873.1.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 612 failed/errored test(s), 10668 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_globallimit]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_join]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization_partition]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization_project]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_create_temp_table]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_view_1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_view_3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_view_4]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_view_disable_cbo_1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_view_disable_cbo_3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_view_disable_cbo_4]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_7]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join0]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join15]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join18]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join18_multi_distinct]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join20]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join27]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join30]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join31]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_filters]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_nulls]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_reordering_values]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_smb_mapjoin_14]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_10]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_6]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_9]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ba_table3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ba_table_union]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_groupby]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin4]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin5]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_const]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_gby]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_gby_empty]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_input26]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_join]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_limit]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby_empty]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_join1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_join]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_limit]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_lineage2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_semijoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_simple_select]

[jira] [Updated] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14917:
---
Status: Open  (was: Patch Available)

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch, HIVE-14917.02.patch, 
> HIVE-14917.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14917:
---
Status: Patch Available  (was: Open)

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch, HIVE-14917.02.patch, 
> HIVE-14917.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14917:
---
Attachment: HIVE-14917.03.patch

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch, HIVE-14917.02.patch, 
> HIVE-14917.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561219#comment-15561219
 ] 

Pengcheng Xiong commented on HIVE-14918:


[~wisgood] if you like, you are always welcome to create your own UDF to treat 
it differently.

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14815) Implement Parquet vectorization reader

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561193#comment-15561193
 ] 

Hive QA commented on HIVE-14815:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832387/HIVE-14815.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10671 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet_types]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1450/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1450/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1450/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832387 - PreCommit-HIVE-Build

> Implement Parquet vectorization reader 
> ---
>
> Key: HIVE-14815
> URL: https://issues.apache.org/jira/browse/HIVE-14815
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14815.patch
>
>
> Parquet doesn't provide a vectorized reader which can be used by Hive 
> directly. Also for Decimal Column batch, it consists of a batch of 
> HiveDecimal which is a Hive type which is unknown for Parquet. To support 
> Hive vectorization execution engine in Hive, we have to implement the 
> vectorized Parquet reader in Hive side. To limit the performance impacts, we 
> need to implement a page level vectorized reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561156#comment-15561156
 ] 

Xiaowei Wang commented on HIVE-14918:
-

Yes,It is not a bug in MySQL .I close .Thanks!


> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew

2016-10-09 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561154#comment-15561154
 ] 

Xuefu Zhang commented on HIVE-14797:


+1

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.2.patch, HIVE-14797.3.patch, HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561148#comment-15561148
 ] 

Pengcheng Xiong commented on HIVE-14918:


[~wisgood] i just took a look at the link. it was not marked as a bug in MySQL
{code}
[19 Nov 2004 14:21] Sergei Golubchik
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

According to the manual, CONCAT_WS skips any `NULL' values after the separator 
argument.
Thus CONCAT_WS(' ', NULL, NULL) has zero strings to concat, and the result, 
quite naturally, is empty string. It does not depend on the separator:

mysql> select concat('>', concat_ws('=', NULL, NULL), '<');
+--+
| concat('>', concat_ws('|', NULL, NULL), '<') |
+--+
| ><   |
+--+
{code}

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13873) Column pruning for nested fields

2016-10-09 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-13873:

Attachment: HIVE-13873.1.patch

Fix NPE

> Column pruning for nested fields
> 
>
> Key: HIVE-13873
> URL: https://issues.apache.org/jira/browse/HIVE-13873
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
> Attachments: HIVE-13873.1.patch, HIVE-13873.patch, 
> HIVE-13873.wip.patch
>
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14916) Reduce the memory requirements for Spark tests

2016-10-09 Thread Dapeng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-14916:
--
Attachment: HIVE-14916.002.patch

> Reduce the memory requirements for Spark tests
> --
>
> Key: HIVE-14916
> URL: https://issues.apache.org/jira/browse/HIVE-14916
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Dapeng Sun
> Attachments: HIVE-14916.001.patch, HIVE-14916.002.patch
>
>
> As HIVE-14887, we need to reduce the memory requirements for Spark tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14799) Query operation are not thread safe during its cancellation

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561106#comment-15561106
 ] 

Hive QA commented on HIVE-14799:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832386/HIVE-14799.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10663 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1449/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1449/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1449/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832386 - PreCommit-HIVE-Build

> Query operation are not thread safe during its cancellation
> ---
>
> Key: HIVE-14799
> URL: https://issues.apache.org/jira/browse/HIVE-14799
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14799.1.patch, HIVE-14799.2.patch, 
> HIVE-14799.3.patch, HIVE-14799.4.patch, HIVE-14799.patch
>
>
> When a query is cancelled either via Beeline (Ctrl-C) or API call 
> TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a 
> different thread from that running the query to close/destroy its 
> encapsulated Driver object. Both SQLOperation and Driver are not thread-safe 
> which could sometimes result in Runtime exceptions like NPE. The errors from 
> the running query are not handled properly therefore probably causing some 
> stuffs (files, locks etc) not being cleaned after the query termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561056#comment-15561056
 ] 

Xiaowei Wang commented on HIVE-14918:
-

It is true that concat_ws('.',NULL) of MySQL return empty. 
https://bugs.mysql.com/bug.php?id=6719  
But I and most colleagues of mine don't understand.
Regardless of MySQL aside, which do you think is  more reasonable ?
Thanks for your explanation.

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14799) Query operation are not thread safe during its cancellation

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561022#comment-15561022
 ] 

Hive QA commented on HIVE-14799:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832386/HIVE-14799.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10663 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1448/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1448/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1448/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832386 - PreCommit-HIVE-Build

> Query operation are not thread safe during its cancellation
> ---
>
> Key: HIVE-14799
> URL: https://issues.apache.org/jira/browse/HIVE-14799
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14799.1.patch, HIVE-14799.2.patch, 
> HIVE-14799.3.patch, HIVE-14799.4.patch, HIVE-14799.patch
>
>
> When a query is cancelled either via Beeline (Ctrl-C) or API call 
> TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a 
> different thread from that running the query to close/destroy its 
> encapsulated Driver object. Both SQLOperation and Driver are not thread-safe 
> which could sometimes result in Runtime exceptions like NPE. The errors from 
> the running query are not handled properly therefore probably causing some 
> stuffs (files, locks etc) not being cleaned after the query termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13589) beeline - support prompt for password with '-u' option

2016-10-09 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560992#comment-15560992
 ] 

Ferdinand Xu commented on HIVE-13589:
-

I left some comments on review board. Also please attach your patch to trigger 
the precommit. Thanks!

> beeline - support prompt for password with '-u' option
> --
>
> Key: HIVE-13589
> URL: https://issues.apache.org/jira/browse/HIVE-13589
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Thejas M Nair
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-13589.1.patch, HIVE-13589.2.patch, 
> HIVE-13589.3.patch, HIVE-13589.4.patch, HIVE-13589.5.patch, 
> HIVE-13589.6.patch, HIVE-13589.7.patch, HIVE-13589.8.patch, HIVE-13589.9.patch
>
>
> Specifying connection string using commandline options in beeline is 
> convenient, as it gets saved in shell command history, and it is easy to 
> retrieve it from there.
> However, specifying the password in command prompt is not secure as it gets 
> displayed on screen and saved in the history.
> It should be possible to specify '-p' without an argument to make beeline 
> prompt for password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14815) Implement Parquet vectorization reader

2016-10-09 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14815:

Status: Patch Available  (was: Open)

> Implement Parquet vectorization reader 
> ---
>
> Key: HIVE-14815
> URL: https://issues.apache.org/jira/browse/HIVE-14815
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14815.patch
>
>
> Parquet doesn't provide a vectorized reader which can be used by Hive 
> directly. Also for Decimal Column batch, it consists of a batch of 
> HiveDecimal which is a Hive type which is unknown for Parquet. To support 
> Hive vectorization execution engine in Hive, we have to implement the 
> vectorized Parquet reader in Hive side. To limit the performance impacts, we 
> need to implement a page level vectorized reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14815) Implement Parquet vectorization reader

2016-10-09 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14815:

Attachment: HIVE-14815.patch

> Implement Parquet vectorization reader 
> ---
>
> Key: HIVE-14815
> URL: https://issues.apache.org/jira/browse/HIVE-14815
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14815.patch
>
>
> Parquet doesn't provide a vectorized reader which can be used by Hive 
> directly. Also for Decimal Column batch, it consists of a batch of 
> HiveDecimal which is a Hive type which is unknown for Parquet. To support 
> Hive vectorization execution engine in Hive, we have to implement the 
> vectorized Parquet reader in Hive side. To limit the performance impacts, we 
> need to implement a page level vectorized reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14913) Add new unit tests

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560939#comment-15560939
 ] 

Hive QA commented on HIVE-14913:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832382/HIVE-14913.3.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10663 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[alter_merge_orc]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_0]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1447/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1447/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1447/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832382 - PreCommit-HIVE-Build

> Add new unit tests
> --
>
> Key: HIVE-14913
> URL: https://issues.apache.org/jira/browse/HIVE-14913
> Project: Hive
>  Issue Type: Task
>  Components: Tests
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14913.1.patch, HIVE-14913.2.patch, 
> HIVE-14913.3.patch
>
>
> Moving bunch of tests from system test to hive unit tests to reduce testing 
> overhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14799) Query operation are not thread safe during its cancellation

2016-10-09 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-14799:
---
Attachment: HIVE-14799.4.patch

> Query operation are not thread safe during its cancellation
> ---
>
> Key: HIVE-14799
> URL: https://issues.apache.org/jira/browse/HIVE-14799
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14799.1.patch, HIVE-14799.2.patch, 
> HIVE-14799.3.patch, HIVE-14799.4.patch, HIVE-14799.patch
>
>
> When a query is cancelled either via Beeline (Ctrl-C) or API call 
> TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a 
> different thread from that running the query to close/destroy its 
> encapsulated Driver object. Both SQLOperation and Driver are not thread-safe 
> which could sometimes result in Runtime exceptions like NPE. The errors from 
> the running query are not handled properly therefore probably causing some 
> stuffs (files, locks etc) not being cleaned after the query termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14799) Query operation are not thread safe during its cancellation

2016-10-09 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-14799:
---
Attachment: (was: HIVE-14799.4.patch)

> Query operation are not thread safe during its cancellation
> ---
>
> Key: HIVE-14799
> URL: https://issues.apache.org/jira/browse/HIVE-14799
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14799.1.patch, HIVE-14799.2.patch, 
> HIVE-14799.3.patch, HIVE-14799.4.patch, HIVE-14799.patch
>
>
> When a query is cancelled either via Beeline (Ctrl-C) or API call 
> TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a 
> different thread from that running the query to close/destroy its 
> encapsulated Driver object. Both SQLOperation and Driver are not thread-safe 
> which could sometimes result in Runtime exceptions like NPE. The errors from 
> the running query are not handled properly therefore probably causing some 
> stuffs (files, locks etc) not being cleaned after the query termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14799) Query operation are not thread safe during its cancellation

2016-10-09 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-14799:
---
Attachment: HIVE-14799.4.patch

The tests failed in my local env even without the patch, so I wonder if the 
patch is related. Reattach for another run

> Query operation are not thread safe during its cancellation
> ---
>
> Key: HIVE-14799
> URL: https://issues.apache.org/jira/browse/HIVE-14799
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14799.1.patch, HIVE-14799.2.patch, 
> HIVE-14799.3.patch, HIVE-14799.4.patch, HIVE-14799.patch
>
>
> When a query is cancelled either via Beeline (Ctrl-C) or API call 
> TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a 
> different thread from that running the query to close/destroy its 
> encapsulated Driver object. Both SQLOperation and Driver are not thread-safe 
> which could sometimes result in Runtime exceptions like NPE. The errors from 
> the running query are not handled properly therefore probably causing some 
> stuffs (files, locks etc) not being cleaned after the query termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560859#comment-15560859
 ] 

Hive QA commented on HIVE-14917:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832380/HIVE-14917.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10663 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1446/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1446/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1446/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832380 - PreCommit-HIVE-Build

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch, HIVE-14917.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

2016-10-09 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560855#comment-15560855
 ] 

Pengcheng Xiong commented on HIVE-14803:


LGTM +1, thanks for the patch!

> S3: Stats gathering for insert queries can be expensive for partitioned 
> dataset
> ---
>
> Key: HIVE-14803
> URL: https://issues.apache.org/jira/browse/HIVE-14803
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14803.1.patch
>
>
> StatsTask's aggregateStats populates stats details for all partitions by 
> checking the file sizes which turns out to be expensive when larger number of 
> partitions are inserted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14358) Add metrics for number of queries executed for each execution engine (mr, spark, tez)

2016-10-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560851#comment-15560851
 ] 

Lefty Leverenz commented on HIVE-14358:
---

[~zsombor.klara], will you have time to work on the metrics documentation?

Or should I create a new JIRA issue for documenting metrics?

cc: [~szehon]

> Add metrics for number of queries executed for each execution engine (mr, 
> spark, tez)
> -
>
> Key: HIVE-14358
> URL: https://issues.apache.org/jira/browse/HIVE-14358
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 2.1.0
>Reporter: Lenni Kuff
>Assignee: Barna Zsombor Klara
> Fix For: 2.2.0
>
> Attachments: HIVE-14358.patch
>
>
> HiveServer2 currently has a metric for the total number of queries ran since 
> last restart, but it would be useful to also have metrics for number of 
> queries ran for each execution engine. This would improve supportability by 
> allowing users to get a high-level understanding of what workloads had been 
> running on the server. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560808#comment-15560808
 ] 

Pengcheng Xiong commented on HIVE-14918:


Initially when it was developed, it was following MySQL.
{code}
mysql> SELECT concat_ws('.',NULL) FROM (SELECT 'abc', 'xyz', '8675309' from t 
WHERE t.test = 86)subq;
+-+
| concat_ws('.',NULL) |
+-+
| |
+-+
1 row in set (0.13 sec)
{code}

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560809#comment-15560809
 ] 

Pengcheng Xiong commented on HIVE-14918:


Initially when it was developed, it was following MySQL.
{code}
mysql> SELECT concat_ws('.',NULL) FROM (SELECT 'abc', 'xyz', '8675309' from t 
WHERE t.test = 86)subq;
+-+
| concat_ws('.',NULL) |
+-+
| |
+-+
1 row in set (0.13 sec)
{code}

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

2016-10-09 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560780#comment-15560780
 ] 

Rajesh Balamohan commented on HIVE-14803:
-

Thanks [~pxiong]. RB link. https://reviews.apache.org/r/52670/

> S3: Stats gathering for insert queries can be expensive for partitioned 
> dataset
> ---
>
> Key: HIVE-14803
> URL: https://issues.apache.org/jira/browse/HIVE-14803
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14803.1.patch
>
>
> StatsTask's aggregateStats populates stats details for all partitions by 
> checking the file sizes which turns out to be expensive when larger number of 
> partitions are inserted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14799) Query operation are not thread safe during its cancellation

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560773#comment-15560773
 ] 

Hive QA commented on HIVE-14799:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832379/HIVE-14799.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10663 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1445/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1445/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1445/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832379 - PreCommit-HIVE-Build

> Query operation are not thread safe during its cancellation
> ---
>
> Key: HIVE-14799
> URL: https://issues.apache.org/jira/browse/HIVE-14799
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14799.1.patch, HIVE-14799.2.patch, 
> HIVE-14799.3.patch, HIVE-14799.patch
>
>
> When a query is cancelled either via Beeline (Ctrl-C) or API call 
> TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a 
> different thread from that running the query to close/destroy its 
> encapsulated Driver object. Both SQLOperation and Driver are not thread-safe 
> which could sometimes result in Runtime exceptions like NPE. The errors from 
> the running query are not handled properly therefore probably causing some 
> stuffs (files, locks etc) not being cleaned after the query termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560739#comment-15560739
 ] 

Xiaowei Wang commented on HIVE-14918:
-

I mean, concat_ws('.',NULL) should return NULL not a empty string "" .What do 
you think？



> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14913) Add new unit tests

2016-10-09 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14913:
---
Status: Patch Available  (was: Open)

Took care of test failures

> Add new unit tests
> --
>
> Key: HIVE-14913
> URL: https://issues.apache.org/jira/browse/HIVE-14913
> Project: Hive
>  Issue Type: Task
>  Components: Tests
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14913.1.patch, HIVE-14913.2.patch, 
> HIVE-14913.3.patch
>
>
> Moving bunch of tests from system test to hive unit tests to reduce testing 
> overhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14913) Add new unit tests

2016-10-09 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14913:
---
Attachment: HIVE-14913.3.patch

> Add new unit tests
> --
>
> Key: HIVE-14913
> URL: https://issues.apache.org/jira/browse/HIVE-14913
> Project: Hive
>  Issue Type: Task
>  Components: Tests
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14913.1.patch, HIVE-14913.2.patch, 
> HIVE-14913.3.patch
>
>
> Moving bunch of tests from system test to hive unit tests to reduce testing 
> overhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560692#comment-15560692
 ] 

Hive QA commented on HIVE-11394:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832378/HIVE-11394.07.patch

{color:green}SUCCESS:{color} +1 due to 126 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10633 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver-orc_llap.q-union5.q-delete_where_non_partitioned.q-and-27-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_udf]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf1]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_udf]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1444/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1444/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1444/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832378 - PreCommit-HIVE-Build

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: cdate (type: date)
>   outputColumnNames: _col0
>   Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet:

[jira] [Updated] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14917:
---
Status: Patch Available  (was: Open)

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch, HIVE-14917.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14917:
---
Attachment: HIVE-14917.02.patch

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch, HIVE-14917.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14917:
---
Status: Open  (was: Patch Available)

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch, HIVE-14917.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14799) Query operation are not thread safe during its cancellation

2016-10-09 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-14799:
---
Attachment: HIVE-14799.3.patch

Revised the patch and use the model [~sershe] suggestged. The close operation 
will defer the resource releases to the query process if the driver is running 
(compiling/executing) the query. The resources get released once the query is 
finished (or interrupted). Otherwise the close releases the driver resource by 
itself. So there will be no waiting for the close (or cancel) operation. 
[~sershe] Could you review it? I have also uploaded the new patch to RB  
https://reviews.apache.org/r/52559/

> Query operation are not thread safe during its cancellation
> ---
>
> Key: HIVE-14799
> URL: https://issues.apache.org/jira/browse/HIVE-14799
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14799.1.patch, HIVE-14799.2.patch, 
> HIVE-14799.3.patch, HIVE-14799.patch
>
>
> When a query is cancelled either via Beeline (Ctrl-C) or API call 
> TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a 
> different thread from that running the query to close/destroy its 
> encapsulated Driver object. Both SQLOperation and Driver are not thread-safe 
> which could sometimes result in Runtime exceptions like NPE. The errors from 
> the running query are not handled properly therefore probably causing some 
> stuffs (files, locks etc) not being cleaned after the query termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14913) Add new unit tests

2016-10-09 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14913:
---
Status: Open  (was: Patch Available)

> Add new unit tests
> --
>
> Key: HIVE-14913
> URL: https://issues.apache.org/jira/browse/HIVE-14913
> Project: Hive
>  Issue Type: Task
>  Components: Tests
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14913.1.patch, HIVE-14913.2.patch
>
>
> Moving bunch of tests from system test to hive unit tests to reduce testing 
> overhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-09 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Attachment: HIVE-11394.07.patch

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: cdate (type: date)
>   outputColumnNames: _col0
>   Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map

[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-09 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Status: In Progress  (was: Patch Available)

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: cdate (type: date)
>   outputColumnNames: _col0
>   Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
>

[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-09 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Status: Patch Available  (was: In Progress)

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: cdate (type: date)
>   outputColumnNames: _col0
>   Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
>

[jira] [Commented] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560409#comment-15560409
 ] 

Hive QA commented on HIVE-14917:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832370/HIVE-14917.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10663 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1443/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1443/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1443/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832370 - PreCommit-HIVE-Build

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560316#comment-15560316
 ] 

Pengcheng Xiong commented on HIVE-14918:


[~wisgood], i tried on current master, it is working fine. If you have problem 
on 2.1, maybe we should backport some patches.
{code}
hive> create table d (c1 string, c2 string, c3 string);
OK
hive> FROM src INSERT OVERWRITE TABLE d SELECT 'abc', 'xyz', '8675309' WHERE 
src.key = 86;
Query ID = pxiong_20161009101753_605845db-aeb9-4dab-87d6-5ad51fab1f79
Total jobs = 1
hive> select * from d;
OK
abc xyz 8675309
Time taken: 0.127 seconds, Fetched: 1 row(s)
hive> SELECT concat_ws('.',NULL) FROM d;
OK

Time taken: 0.096 seconds, Fetched: 1 row(s)
{code}

And also, a rewritten query also works fine
{code}
hive> SELECT concat_ws('.',NULL) FROM (SELECT 'abc', 'xyz', '8675309' from src 
WHERE src.key = 86)subq;
OK

Time taken: 0.272 seconds, Fetched: 1 row(s)
{code}


> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14917:
---
Status: Open  (was: Patch Available)

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14917:
---
Status: Patch Available  (was: Open)

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14917:
---
Attachment: HIVE-14917.01.patch

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14917.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14917) explainanalyze_2.q fails after HIVE-14861

2016-10-09 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14917:
---
Attachment: (was: HIVE-14917.01.patch)

> explainanalyze_2.q fails after HIVE-14861
> -
>
> Key: HIVE-14917
> URL: https://issues.apache.org/jira/browse/HIVE-14917
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14662) Wrong Class Instance When Using Custom SERDE

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559867#comment-15559867
 ] 

Hive QA commented on HIVE-14662:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832341/HIVE-14662.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10663 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testTaskStatus
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1442/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1442/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1442/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832341 - PreCommit-HIVE-Build

> Wrong Class Instance When Using Custom SERDE
> 
>
> Key: HIVE-14662
> URL: https://issues.apache.org/jira/browse/HIVE-14662
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-14662.patch
>
>
> Using  [SERDE for 
> mongoDB|https://github.com/mongodb/mongo-hadoop/blob/master/hive/src/main/java/com/mongodb/hadoop/hive/BSONSerDe.java]
> DDL
> {noformat}
> create external table mytable (ID STRING..) 
> ROW FORMAT SERDE  'com.mongodb.hadoop.hive.BSONSerDe' 
> WITH SERDEPROPERTIES('mongo.columns.mapping'='{"ID":"_id",.. }')
> STORED AS INPUTFORMAT 'com.mongodb.hadoop.mapred.BSONFileInputFormat'
> OUTPUTFORMAT 'com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat'
> LOCATION 'hdfs:///mypath'; 
> {noformat}
> Open beeline and run the following query ,and then open another beeline,run 
> this again.Then fails.
> {noformat}
> add jar hdfs:///tmp/mongo-hadoop-hive-1.4.2_new.jar;
> add jar hdfs:///tmp/mongo-java-driver-3.0.4.jar;
> add jar hdfs:///tmp/mongo-hadoop-core-1.4.2_new.jar;
> select * from mytable limit 1;
> {noformat}
> Error log :
> {noformat}
> 2016-08-25 09:30:34,475 | WARN  | HiveServer2-Handler-Pool: Thread-11972 | 
> Error fetching results:  | 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1058)
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.serde2.SerDeException: class 
> com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass 
> com.mongodb.hadoop.io.BSONWritable
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:366)
> at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:251)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:710)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> at com.sun.proxy.$Proxy20.fetchResults(Unknown Source)
> at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1049)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at

[jira] [Updated] (HIVE-14662) Wrong Class Instance When Using Custom SERDE

2016-10-09 Thread Nemon Lou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14662:
-
Attachment: HIVE-14662.patch

> Wrong Class Instance When Using Custom SERDE
> 
>
> Key: HIVE-14662
> URL: https://issues.apache.org/jira/browse/HIVE-14662
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-14662.patch
>
>
> Using  [SERDE for 
> mongoDB|https://github.com/mongodb/mongo-hadoop/blob/master/hive/src/main/java/com/mongodb/hadoop/hive/BSONSerDe.java]
> DDL
> {noformat}
> create external table mytable (ID STRING..) 
> ROW FORMAT SERDE  'com.mongodb.hadoop.hive.BSONSerDe' 
> WITH SERDEPROPERTIES('mongo.columns.mapping'='{"ID":"_id",.. }')
> STORED AS INPUTFORMAT 'com.mongodb.hadoop.mapred.BSONFileInputFormat'
> OUTPUTFORMAT 'com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat'
> LOCATION 'hdfs:///mypath'; 
> {noformat}
> Open beeline and run the following query ,and then open another beeline,run 
> this again.Then fails.
> {noformat}
> add jar hdfs:///tmp/mongo-hadoop-hive-1.4.2_new.jar;
> add jar hdfs:///tmp/mongo-java-driver-3.0.4.jar;
> add jar hdfs:///tmp/mongo-hadoop-core-1.4.2_new.jar;
> select * from mytable limit 1;
> {noformat}
> Error log :
> {noformat}
> 2016-08-25 09:30:34,475 | WARN  | HiveServer2-Handler-Pool: Thread-11972 | 
> Error fetching results:  | 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1058)
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.serde2.SerDeException: class 
> com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass 
> com.mongodb.hadoop.io.BSONWritable
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:366)
> at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:251)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:710)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> at com.sun.proxy.$Proxy20.fetchResults(Unknown Source)
> at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1049)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: 
> class com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, 
> notclass com.mongodb.hadoop.io.BSONWritable
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1756)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:361)
> ... 24 more
> Caused by:

[jira] [Updated] (HIVE-14662) Wrong Class Instance When Using Custom SERDE

2016-10-09 Thread Nemon Lou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14662:
-
Status: Patch Available  (was: Open)

> Wrong Class Instance When Using Custom SERDE
> 
>
> Key: HIVE-14662
> URL: https://issues.apache.org/jira/browse/HIVE-14662
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-14662.patch
>
>
> Using  [SERDE for 
> mongoDB|https://github.com/mongodb/mongo-hadoop/blob/master/hive/src/main/java/com/mongodb/hadoop/hive/BSONSerDe.java]
> DDL
> {noformat}
> create external table mytable (ID STRING..) 
> ROW FORMAT SERDE  'com.mongodb.hadoop.hive.BSONSerDe' 
> WITH SERDEPROPERTIES('mongo.columns.mapping'='{"ID":"_id",.. }')
> STORED AS INPUTFORMAT 'com.mongodb.hadoop.mapred.BSONFileInputFormat'
> OUTPUTFORMAT 'com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat'
> LOCATION 'hdfs:///mypath'; 
> {noformat}
> Open beeline and run the following query ,and then open another beeline,run 
> this again.Then fails.
> {noformat}
> add jar hdfs:///tmp/mongo-hadoop-hive-1.4.2_new.jar;
> add jar hdfs:///tmp/mongo-java-driver-3.0.4.jar;
> add jar hdfs:///tmp/mongo-hadoop-core-1.4.2_new.jar;
> select * from mytable limit 1;
> {noformat}
> Error log :
> {noformat}
> 2016-08-25 09:30:34,475 | WARN  | HiveServer2-Handler-Pool: Thread-11972 | 
> Error fetching results:  | 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1058)
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.serde2.SerDeException: class 
> com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass 
> com.mongodb.hadoop.io.BSONWritable
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:366)
> at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:251)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:710)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> at com.sun.proxy.$Proxy20.fetchResults(Unknown Source)
> at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1049)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: 
> class com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, 
> notclass com.mongodb.hadoop.io.BSONWritable
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1756)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:361)
> ... 24 more
>

[jira] [Commented] (HIVE-14916) Reduce the memory requirements for Spark tests

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559690#comment-15559690
 ] 

Hive QA commented on HIVE-14916:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832331/HIVE-14916.001.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10663 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[constprog_semijoin]
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[index_bitmap3]
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[index_bitmap_auto]
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[infer_bucket_sort_map_operators]
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[infer_bucket_sort_reducers_power_two]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1441/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1441/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1441/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832331 - PreCommit-HIVE-Build

> Reduce the memory requirements for Spark tests
> --
>
> Key: HIVE-14916
> URL: https://issues.apache.org/jira/browse/HIVE-14916
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Dapeng Sun
> Attachments: HIVE-14916.001.patch
>
>
> As HIVE-14887, we need to reduce the memory requirements for Spark tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-10-09 Thread Jianguo Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Comment: was deleted

(was: Hive JDBC client)

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-10-09 Thread Jianguo Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Comment: was deleted

(was: The "initFile" option in JDBC URL could be seen on the wiki.)

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-10-09 Thread Jianguo Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559594#comment-15559594
 ] 

Jianguo Tian commented on HIVE-5867:


I have added "initFile=" option in the JDBC URL, now you can see some 
changes about "Connection URL Format" and "Connection URL for Remote or 
Embedded Mode".

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-10-09 Thread Jianguo Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559594#comment-15559594
 ] 

Jianguo Tian edited comment on HIVE-5867 at 10/9/16 8:35 AM:
-

I have added "initFile=" option in the JDBC URL, now you can see some 
changes on wiki about "Connection URL Format" and "Connection URL for Remote or 
Embedded Mode".


was (Author: jonnyr):
I have added "initFile=" option in the JDBC URL, now you can see some 
changes about "Connection URL Format" and "Connection URL for Remote or 
Embedded Mode".

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559558#comment-15559558
 ] 

Hive QA commented on HIVE-14918:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832319/HIVE-14918.0.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10663 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1440/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1440/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1440/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832319 - PreCommit-HIVE-Build

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14916) Reduce the memory requirements for Spark tests

2016-10-09 Thread Dapeng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-14916:
--
Status: Patch Available  (was: Open)

Uploaded an inital patch

> Reduce the memory requirements for Spark tests
> --
>
> Key: HIVE-14916
> URL: https://issues.apache.org/jira/browse/HIVE-14916
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Dapeng Sun
> Attachments: HIVE-14916.001.patch
>
>
> As HIVE-14887, we need to reduce the memory requirements for Spark tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14916) Reduce the memory requirements for Spark tests

2016-10-09 Thread Dapeng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-14916:
--
Attachment: HIVE-14916.001.patch

> Reduce the memory requirements for Spark tests
> --
>
> Key: HIVE-14916
> URL: https://issues.apache.org/jira/browse/HIVE-14916
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Dapeng Sun
> Attachments: HIVE-14916.001.patch
>
>
> As HIVE-14887, we need to reduce the memory requirements for Spark tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14916) Reduce the memory requirements for Spark tests

2016-10-09 Thread Dapeng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun reassigned HIVE-14916:
-

Assignee: Dapeng Sun

> Reduce the memory requirements for Spark tests
> --
>
> Key: HIVE-14916
> URL: https://issues.apache.org/jira/browse/HIVE-14916
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Dapeng Sun
>
> As HIVE-14887, we need to reduce the memory requirements for Spark tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559471#comment-15559471
 ] 

Xiaowei Wang commented on HIVE-14918:
-

Is this a problem ?
[~pxiong] [~speleato] [~ashutoshc] [~prasanth_j] [~thejas]

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-14918:

Status: Patch Available  (was: Open)

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.0.1, 2.1.0, 2.0.0, 1.1.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-14918:

Attachment: HIVE-14918.0.patch

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14632) beeline outputformat needs better documentation

2016-10-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559375#comment-15559375
 ] 

Lefty Leverenz edited comment on HIVE-14632 at 10/9/16 6:19 AM:


Good documentation, thanks [~kuczoram]!  I made some minor edits.  Very cool 
expandable examples -- I hadn't realized we can do that.

One question:  is the misalignment of the 'comment' column in the tsv example 
accurate?  I assume it's due to the tab stops because the 'value' column has 
values longer than the column name, but just wanted to check.

+1 but a review by [~michaelthoward] would also be good, as well as a technical 
review by [~szehon] or [~thejas].


was (Author: le...@hortonworks.com):
Good documentation, thanks [~kuczoram]!  I made some minor edits.  Very cool 
expandable examples -- I hadn't realized we can do that.

One question:  is the misalignment of the 'comment' column in the tsv example 
accurate?  I assume it's due to the tab stops because the 'value' column has 
values longer than the column name, but just wanted to check.

+1 but a technical review by [~michaelthoward] or [~thejas] would also be good.

> beeline outputformat needs better documentation
> ---
>
> Key: HIVE-14632
> URL: https://issues.apache.org/jira/browse/HIVE-14632
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 0.14.0
> Environment: Hive HiveServer2 wiki
>Reporter: Michael Howard
>Assignee: Marta Kuczora
>
> SUMMARY
> * need better wiki page doc for beeline outputformat option
> * should explicitly say that "double quote characters" are used to enclose 
> fields which need enclosing. 
> * Should describe the treatment of embedded double quote chars as "doubled"
> DETAIL
> The page at:
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Separated-ValueOutputFormats
> describes separated value outputformats csv/tsv/csv2/tsv2, etc. 
> I found doc to be inadequate and terminology to be confusing. 
> > These conform better to standard CSV convention, which adds quotes around a 
> > cell value 
> What kind of quotes? The only reference to quotes in this section refers to 
> single quotes for the deprecated csv/tsv format. 
> The JIRA at 
> https://issues.apache.org/jira/browse/HIVE-8615
> clarifies a bit:
> - Old format quoted every field. New format quotes only fields that contain a 
> delimiter or the quoting char. 
> - Old format quoted using single quotes, new format quotes using double 
> quotes 
> - Old format didn't escape quotes in a field (a bug). New format does escape 
> the quotes
> However, neither this JIRA page nor the wiki page doc define what is meant by 
> "escaping the quotes". 
> Q: In this context, does escaping mean "backslash escaping" or "double 
> embedded double quotes" or something else? 
> Investigation of source code reveals that this is using SuperCSV. 
> SuperCSV does not support backslash-escape of embedded quotes. See last line 
> of:
> https://super-csv.github.io/super-csv/csv_specification.html
> THE END



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8615) beeline csv,tsv outputformat needs backward compatibility mode

2016-10-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559379#comment-15559379
 ] 

Lefty Leverenz commented on HIVE-8615:
--

[~kuczoram] improved the documentation (for HIVE-14632).

* [HiveServer2 Clients -- Beeline -- Output Formats | 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-OutputFormats]

> beeline csv,tsv outputformat needs backward compatibility mode
> --
>
> Key: HIVE-8615
> URL: https://issues.apache.org/jira/browse/HIVE-8615
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 0.14.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8615.1.patch, HIVE-8615.2.patch
>
>
> Changes in HIVE-7390 break backward compatibility for beeline csv and tsv 
> formats.
> This can cause problems for users upgrading to hive 0.14, if they have code 
> for parsing the old output format. Instead of removing the old format in this 
> release, we should consider it deprecated and support it in a few releases 
> before removing it completely.
> Incompatible Changes in the tsv and csv formats-
> - Old format quoted every field. New format quotes only fields that contain a 
> delimiter or the quoteing char.
> - Old format quoted using single quotes, new format quotes using double quotes
> - Old format didn't escape quotes in a field (a bug). New format does escape 
> the quotes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14632) beeline outputformat needs better documentation

2016-10-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559375#comment-15559375
 ] 

Lefty Leverenz commented on HIVE-14632:
---

Good documentation, thanks [~kuczoram]!  I made some minor edits.  Very cool 
expandable examples -- I hadn't realized we can do that.

One question:  is the misalignment of the 'comment' column in the tsv example 
accurate?  I assume it's due to the tab stops because the 'value' column has 
values longer than the column name, but just wanted to check.

+1 but a technical review by [~michaelthoward] or [~thejas] would also be good.

> beeline outputformat needs better documentation
> ---
>
> Key: HIVE-14632
> URL: https://issues.apache.org/jira/browse/HIVE-14632
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 0.14.0
> Environment: Hive HiveServer2 wiki
>Reporter: Michael Howard
>Assignee: Marta Kuczora
>
> SUMMARY
> * need better wiki page doc for beeline outputformat option
> * should explicitly say that "double quote characters" are used to enclose 
> fields which need enclosing. 
> * Should describe the treatment of embedded double quote chars as "doubled"
> DETAIL
> The page at:
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Separated-ValueOutputFormats
> describes separated value outputformats csv/tsv/csv2/tsv2, etc. 
> I found doc to be inadequate and terminology to be confusing. 
> > These conform better to standard CSV convention, which adds quotes around a 
> > cell value 
> What kind of quotes? The only reference to quotes in this section refers to 
> single quotes for the deprecated csv/tsv format. 
> The JIRA at 
> https://issues.apache.org/jira/browse/HIVE-8615
> clarifies a bit:
> - Old format quoted every field. New format quotes only fields that contain a 
> delimiter or the quoting char. 
> - Old format quoted using single quotes, new format quotes using double 
> quotes 
> - Old format didn't escape quotes in a field (a bug). New format does escape 
> the quotes
> However, neither this JIRA page nor the wiki page doc define what is meant by 
> "escaping the quotes". 
> Q: In this context, does escaping mean "backslash escaping" or "double 
> embedded double quotes" or something else? 
> Investigation of source code reveals that this is using SuperCSV. 
> SuperCSV does not support backslash-escape of embedded quotes. See last line 
> of:
> https://super-csv.github.io/super-csv/csv_specification.html
> THE END



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

67 matches

Mail list logo