[jira] [Comment Edited] (HIVE-11995) Remove repetitively setting permissions in insert/load overwrite partition

2020-04-22 Thread Jianguo Tian (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-11995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089575#comment-17089575
 ] 

Jianguo Tian edited comment on HIVE-11995 at 4/22/20, 11:29 AM:


{code:java}
// first call FileUtils.mkdir to make sure that destf directory exists, if not, 
it creates
// destf with inherited permissions
boolean destfExist = FileUtils.mkdir(destFs, destf, true, conf);
{code}
 

Why did you set inheritPerms param to true in this line, it makes 
hive.warehouse.subdir.inherit.perms didn't work.

If the table hdfs dir permission is 770, and I want partition dir permission is 
755(use default umask 022), so I need to set 
hive.warehouse.subdir.inherit.perms=false.

But it has been set to true in a hard code way, so 
hive.warehouse.subdir.inherit.perms=false didn't work, the partition dir 
permission will also be 770. 

How do u think about this scenario. THX!


was (Author: jonnyr):
{code:java}
// first call FileUtils.mkdir to make sure that destf directory exists, if not, 
it creates
// destf with inherited permissions
boolean destfExist = FileUtils.mkdir(destFs, destf, true, conf);
{code}
 

Why did you set inheritPerms param to true in this line, it makes 
hive.warehouse.subdir.inherit.perms didn't work.

If the table hdfs dir permission is 770, and I want partition dir permission is 
755, so I need to set hive.warehouse.subdir.inherit.perms=false.

But it has been set to true in hard code way, so 
hive.warehouse.subdir.inherit.perms=false didn't work.

How do u think about this scenario. THX!

> Remove repetitively setting permissions in insert/load overwrite partition
> --
>
> Key: HIVE-11995
> URL: https://issues.apache.org/jira/browse/HIVE-11995
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HIVE-11995.patch
>
>
> When hive.warehouse.subdir.inherit.perms is set to true, insert/load 
> overwrite .. partition set table and partition permissions repetitively which 
> is not necessary and causing performance issue especially in the cases where 
> there are multiple levels of partitions involved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-11995) Remove repetitively setting permissions in insert/load overwrite partition

2020-04-22 Thread Jianguo Tian (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-11995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089575#comment-17089575
 ] 

Jianguo Tian commented on HIVE-11995:
-

{code:java}
// first call FileUtils.mkdir to make sure that destf directory exists, if not, 
it creates
// destf with inherited permissions
boolean destfExist = FileUtils.mkdir(destFs, destf, true, conf);
{code}
 

Why did you set inheritPerms param to true in this line, it makes 
hive.warehouse.subdir.inherit.perms didn't work.

If the table hdfs dir permission is 770, and I want partition dir permission is 
755, so I need to set hive.warehouse.subdir.inherit.perms=false.

But it has been set to true in hard code way, so 
hive.warehouse.subdir.inherit.perms=false didn't work.

How do u think about this scenario. THX!

> Remove repetitively setting permissions in insert/load overwrite partition
> --
>
> Key: HIVE-11995
> URL: https://issues.apache.org/jira/browse/HIVE-11995
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HIVE-11995.patch
>
>
> When hive.warehouse.subdir.inherit.perms is set to true, insert/load 
> overwrite .. partition set table and partition permissions repetitively which 
> is not necessary and causing performance issue especially in the cases where 
> there are multiple levels of partitions involved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18685) Add catalogs to Hive

2019-04-12 Thread Jianguo Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816074#comment-16816074
 ] 

Jianguo Tian commented on HIVE-18685:
-

How to specify catalog in Hive CLI or Beeline?

> Add catalogs to Hive
> 
>
> Key: HIVE-18685
> URL: https://issues.apache.org/jira/browse/HIVE-18685
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Parser, Security, SQL
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
> Attachments: HMS Catalog Design Doc.pdf
>
>
> SQL supports two levels of namespaces, called in the spec catalogs and 
> schemas (with schema being equivalent to Hive's database).  I propose to add 
> the upper level of catalog.  The attached design doc covers the use cases, 
> requirements, and brief discussion of how it will be implemented in a 
> backwards compatible way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21166) Keyword as column name in DBS table of Hive metastore

2019-04-01 Thread Jianguo Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807359#comment-16807359
 ] 

Jianguo Tian edited comment on HIVE-21166 at 4/2/19 2:22 AM:
-

You can query DESC column like this: 
{code:sql}
select `DESC` from DBS limit 10;
{code}


was (Author: jonnyr):
You can query DESC column like this: 
{code:java}
// select `DESC` from DBS limit 10;
{code}

> Keyword as column name in DBS table of Hive metastore
> -
>
> Key: HIVE-21166
> URL: https://issues.apache.org/jira/browse/HIVE-21166
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Vamsi UCSS
>Priority: Blocker
>
> The table "DBS" in hive schema (metastore) has a column called "DESC" which 
> is a Hive keyword. This is causing any queries on this table to result in a 
> syntax error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21166) Keyword as column name in DBS table of Hive metastore

2019-04-01 Thread Jianguo Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807359#comment-16807359
 ] 

Jianguo Tian commented on HIVE-21166:
-

You can query DESC column like this: 
{code:java}
// select `DESC` from DBS limit 10;
{code}

> Keyword as column name in DBS table of Hive metastore
> -
>
> Key: HIVE-21166
> URL: https://issues.apache.org/jira/browse/HIVE-21166
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Vamsi UCSS
>Priority: Blocker
>
> The table "DBS" in hive schema (metastore) has a column called "DESC" which 
> is a Hive keyword. This is causing any queries on this table to result in a 
> syntax error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-13 Thread Jianguo Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-20304:

Description: 
`When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
and the execution engine is set to mr, same stage of a query may launch twice 
due to the wrong generated plan. If hive.exec.parallel is also true, the same 
stage will launch at the same time and the job will failed due to the first 
completed stage clear the map.xml/reduce.xml file stored in the hdfs.

use following sql to reproduce the issue:
{code:java}
CREATE TABLE `tbl1`(
  `fence` string);

CREATE TABLE `tbl2`(
  `order_id` string,
  `phone` string,
  `search_id` string
)
PARTITIONED BY (
  `dt` string);


CREATE TABLE `tbl3`(
  `order_id` string,
  `platform` string)
PARTITIONED BY (
  `dt` string);


CREATE TABLE `tbl4`(
  `groupname` string,
  `phone` string)
PARTITIONED BY (
  `dt` string);


CREATE TABLE `tbl5`(
  `search_id` string,
  `fence` string)
PARTITIONED BY (
  `dt` string);

SET hive.exec.parallel = TRUE;

SET hive.auto.convert.join = TRUE;

SET hive.optimize.skewjoin = TRUE;


SELECT dt,
   platform,
   groupname,
   count(1) as cnt
FROM
(SELECT dt,
platform,
groupname
 FROM
 (SELECT fence
  FROM tbl1)ta
   JOIN
   (SELECT a0.dt,
   a1.platform,
   a2.groupname,
   a3.fence
FROM
(SELECT dt,
order_id,
phone,
search_id
 FROM tbl2
 WHERE dt =20180703 )a0
  JOIN
  (SELECT order_id,
  platform,
  dt
   FROM tbl3
   WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
  INNER JOIN
  (SELECT groupname,
  phone,
  dt
   FROM tbl4
   WHERE dt =20180703 )a2 ON a0.phone = a2.phone
  LEFT JOIN
  (SELECT search_id,
  fence,
  dt
   FROM tbl5
   WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
ta.fence = t0.fence)t11
GROUP BY dt,
 platform,
 groupname;

DROP TABLE tbl1;
DROP TABLE tbl2;
DROP TABLE tbl3;
DROP TABLE tbl4;
DROP TABLE tbl5;

{code}
We will get some error message like this:

Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
job_1531284442065_3637

Task with the most failures(4):

Task ID:
 task_1531284442065_3637_m_00

URL:
 
[http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637=task_1531284442065_3637_m_00]

Diagnostic Messages for this Task:
 File does not exist: 
hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
 java.io.FileNotFoundException: File does not exist: 
hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml

Looking into the plan by executing explain, I found that the Stage-4 and 
Stage-5 can reached from multi root tasks.
{code:java}
Explain
STAGE DEPENDENCIES:
  Stage-21 is a root stage , consists of Stage-34, Stage-5
  Stage-34 has a backup stage: Stage-5
  Stage-20 depends on stages: Stage-34
  Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
Stage-32, Stage-33, Stage-1
  Stage-32 has a backup stage: Stage-1
  Stage-15 depends on stages: Stage-32
  Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of 
Stage-31, Stage-2
  Stage-31
  Stage-9 depends on stages: Stage-31
  Stage-2 depends on stages: Stage-9
  Stage-33 has a backup stage: Stage-1
  Stage-16 depends on stages: Stage-33
  Stage-1
  Stage-5
  Stage-27 is a root stage , consists of Stage-37, Stage-38, Stage-4
  Stage-37 has a backup stage: Stage-4
  Stage-25 depends on stages: Stage-37
  Stage-12 depends on stages: Stage-4, Stage-22, Stage-23, Stage-25, Stage-26 , 
consists of Stage-36, Stage-5
  Stage-36
  Stage-11 depends on stages: Stage-36
  Stage-19 depends on stages: Stage-11 , consists of Stage-35, Stage-5
  Stage-35 has a backup stage: Stage-5
  Stage-18 depends on stages: Stage-35
  Stage-38 has a backup stage: Stage-4
  Stage-26 depends on stages: Stage-38
  Stage-4
  Stage-30 is a root stage , consists of Stage-42, Stage-43, Stage-3
  Stage-42 has a backup stage: Stage-3
  Stage-28 depends on stages: Stage-42
  Stage-14 depends on stages: Stage-3, Stage-28, Stage-29 , consists of 
Stage-41, Stage-4
  Stage-41
  Stage-13 depends on stages: Stage-41
  Stage-24 depends on stages: Stage-13 , 

[jira] [Assigned] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-04 Thread Jianguo Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian reassigned HIVE-20304:
---

Assignee: Hui Huang

> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 1.2.1, 2.3.3
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage of a query may launch twice 
> due to the wrong generated plan. If hive.exec.parallel is also true, the same 
> stage will launch at the same time and the job will failed due to the first 
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>platform,
>groupname,
>count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
>  FROM
>  (SELECT fence
>   FROM tbl1)ta
>JOIN
>(SELECT a0.dt,
>a1.platform,
>a2.groupname,
>a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
>  FROM tbl2
>  WHERE dt =20180703 )a0
>   JOIN
>   (SELECT order_id,
>   platform,
>   dt
>FROM tbl3
>WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>   INNER JOIN
>   (SELECT groupname,
>   phone,
>   dt
>FROM tbl4
>WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>   LEFT JOIN
>   (SELECT search_id,
>   fence,
>   dt
>FROM tbl5
>WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
> ta.fence = t0.fence)t11
> GROUP BY dt,
>  platform,
>  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
> job_1531284442065_3637
> Task with the most failures(4):
> -
> Task ID:
>   task_1531284442065_3637_m_00
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637=task_1531284442065_3637_m_00
> -
> Diagnostic Messages for this Task:
> File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> java.io.FileNotFoundException: File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> Looking into the plan by executing explain, I found that the Stage-4 and 
> Stage-5 can reached from multi root tasks.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
>   Stage-21 is a root stage , consists of Stage-34, Stage-5
>   Stage-34 has a backup stage: Stage-5
>   Stage-20 depends on stages: Stage-34
>   Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
> Stage-32, Stage-33, Stage-1
>   Stage-32 has a backup stage: Stage-1
>   Stage-15 depends on stages: Stage-32
>   Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of 
> Stage-31, Stage-2
>   Stage-31
>   Stage-9 depends on stages: Stage-31
>   Stage-2 

[jira] [Assigned] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-04 Thread Jianguo Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian reassigned HIVE-20304:
---

Assignee: Jianguo Tian  (was: Hui Huang)

> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Jianguo Tian
>Priority: Major
> Fix For: 1.2.1, 2.3.3
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage of a query may launch twice 
> due to the wrong generated plan. If hive.exec.parallel is also true, the same 
> stage will launch at the same time and the job will failed due to the first 
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>platform,
>groupname,
>count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
>  FROM
>  (SELECT fence
>   FROM tbl1)ta
>JOIN
>(SELECT a0.dt,
>a1.platform,
>a2.groupname,
>a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
>  FROM tbl2
>  WHERE dt =20180703 )a0
>   JOIN
>   (SELECT order_id,
>   platform,
>   dt
>FROM tbl3
>WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>   INNER JOIN
>   (SELECT groupname,
>   phone,
>   dt
>FROM tbl4
>WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>   LEFT JOIN
>   (SELECT search_id,
>   fence,
>   dt
>FROM tbl5
>WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
> ta.fence = t0.fence)t11
> GROUP BY dt,
>  platform,
>  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
> job_1531284442065_3637
> Task with the most failures(4):
> -
> Task ID:
>   task_1531284442065_3637_m_00
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637=task_1531284442065_3637_m_00
> -
> Diagnostic Messages for this Task:
> File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> java.io.FileNotFoundException: File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> Looking into the plan by executing explain, I found that the Stage-4 and 
> Stage-5 can reached from multi root tasks.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
>   Stage-21 is a root stage , consists of Stage-34, Stage-5
>   Stage-34 has a backup stage: Stage-5
>   Stage-20 depends on stages: Stage-34
>   Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
> Stage-32, Stage-33, Stage-1
>   Stage-32 has a backup stage: Stage-1
>   Stage-15 depends on stages: Stage-32
>   Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of 
> Stage-31, Stage-2
>   Stage-31
>   Stage-9 depends on 

[jira] [Assigned] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-04 Thread Jianguo Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian reassigned HIVE-20304:
---

Assignee: (was: Jianguo Tian)

> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Priority: Major
> Fix For: 1.2.1, 2.3.3
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage of a query may launch twice 
> due to the wrong generated plan. If hive.exec.parallel is also true, the same 
> stage will launch at the same time and the job will failed due to the first 
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>platform,
>groupname,
>count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
>  FROM
>  (SELECT fence
>   FROM tbl1)ta
>JOIN
>(SELECT a0.dt,
>a1.platform,
>a2.groupname,
>a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
>  FROM tbl2
>  WHERE dt =20180703 )a0
>   JOIN
>   (SELECT order_id,
>   platform,
>   dt
>FROM tbl3
>WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>   INNER JOIN
>   (SELECT groupname,
>   phone,
>   dt
>FROM tbl4
>WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>   LEFT JOIN
>   (SELECT search_id,
>   fence,
>   dt
>FROM tbl5
>WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
> ta.fence = t0.fence)t11
> GROUP BY dt,
>  platform,
>  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
> job_1531284442065_3637
> Task with the most failures(4):
> -
> Task ID:
>   task_1531284442065_3637_m_00
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637=task_1531284442065_3637_m_00
> -
> Diagnostic Messages for this Task:
> File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> java.io.FileNotFoundException: File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> Looking into the plan by executing explain, I found that the Stage-4 and 
> Stage-5 can reached from multi root tasks.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
>   Stage-21 is a root stage , consists of Stage-34, Stage-5
>   Stage-34 has a backup stage: Stage-5
>   Stage-20 depends on stages: Stage-34
>   Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
> Stage-32, Stage-33, Stage-1
>   Stage-32 has a backup stage: Stage-1
>   Stage-15 depends on stages: Stage-32
>   Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of 
> Stage-31, Stage-2
>   Stage-31
>   Stage-9 depends on stages: Stage-31
>   Stage-2 depends on stages: 

[jira] [Updated] (HIVE-17010) Fix the overflow problem of Long type in SetSparkReducerParallelism

2017-07-02 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-17010:

Description: 
[link title|http://example.com] We use 
[numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
 to collect the numberOfBytes of sibling of specified RS. We use Long type and 
it happens overflow when the data is too big. After happening this situation, 
the parallelism is decided by 
[sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184]
 if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond is 
a dymamic value which is decided by spark runtime. For example, the value of 
sparkMemoryAndCores.getSecond is 5 or 15 randomly. There is possibility that 
the value may be 1. The may problem here is the overflow of addition of Long 
type.  You can reproduce the overflow problem by following code
{code}
public static void main(String[] args) {
  long a1= 9223372036854775807L;
  long a2=1022672;

  long res = a1+a2;
  System.out.println(res);  //-9223372036853753137

  BigInteger b1= BigInteger.valueOf(a1);
  BigInteger b2 = BigInteger.valueOf(a2);

  BigInteger bigRes = b1.add(b2);

  System.out.println(bigRes); //9223372036855798479

}
{code}

  was:
 We use 
[numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
 to collect the numberOfBytes of sibling of specified RS. We use Long type and 
it happens overflow when the data is too big. After happening this situation, 
the parallelism is decided by 
[sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184]
 if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond is 
a dymamic value which is decided by spark runtime. For example, the value of 
sparkMemoryAndCores.getSecond is 5 or 15 randomly. There is possibility that 
the value may be 1. The may problem here is the overflow of addition of Long 
type.  You can reproduce the overflow problem by following code
{code}
public static void main(String[] args) {
  long a1= 9223372036854775807L;
  long a2=1022672;

  long res = a1+a2;
  System.out.println(res);  //-9223372036853753137

  BigInteger b1= BigInteger.valueOf(a1);
  BigInteger b2 = BigInteger.valueOf(a2);

  BigInteger bigRes = b1.add(b2);

  System.out.println(bigRes); //9223372036855798479

}
{code}


> Fix the overflow problem of Long type in SetSparkReducerParallelism
> ---
>
> Key: HIVE-17010
> URL: https://issues.apache.org/jira/browse/HIVE-17010
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
>
> [link title|http://example.com] We use 
> [numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
>  to collect the numberOfBytes of sibling of specified RS. We use Long type 
> and it happens overflow when the data is too big. After happening this 
> situation, the parallelism is decided by 
> [sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184]
>  if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond 
> is a dymamic value which is decided by spark runtime. For example, the value 
> of sparkMemoryAndCores.getSecond is 5 or 15 randomly. There is possibility 
> that the value may be 1. The may problem here is the overflow of addition of 
> Long type.  You can reproduce the overflow problem by following code
> {code}
> public static void main(String[] args) {
>   long a1= 9223372036854775807L;
>   long a2=1022672;
>   long res = a1+a2;
>   System.out.println(res);  //-9223372036853753137
>   BigInteger b1= BigInteger.valueOf(a1);
>   BigInteger b2 = BigInteger.valueOf(a2);
>   BigInteger bigRes = b1.add(b2);
>   System.out.println(bigRes); //9223372036855798479
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-06-06 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-16823:

Description: 
spark_vectorized_dynamic_partition_pruning.q
{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.vectorized.execution.enabled=true;
set hive.strict.checks.cartesian.product=false;

-- parent is reduce tasks
select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
{code}

The exceptions are as follows:
{code}
2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
spark.SparkReduceRecordHandler: Fatal error: 
org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing vector 
batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
["2008-04-08", "2008-04-08"]
org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing vector 
batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
["2008-04-08", "2008-04-08"]
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
~[scala-library-2.11.8.jar:?]
at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
~[scala-library-2.11.8.jar:?]
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
~[scala-library-2.11.8.jar:?]
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
 ~[spark-core_2.11-2.0.0.jar:2.0.0]
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
 ~[spark-core_2.11-2.0.0.jar:2.0.0]
at 
org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
~[spark-core_2.11-2.0.0.jar:2.0.0]
at 
org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
~[spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
~[spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.scheduler.Task.run(Task.scala:85) 
~[spark-core_2.11-2.0.0.jar:2.0.0]
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
~[spark-core_2.11-2.0.0.jar:2.0.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[?:1.8.0_112]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[?:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1035)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:400)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
... 17 more
2017-06-05T09:20:31,472 ERROR [Executor task launch worker-0] 
executor.Executor: Exception in task 2.0 in stage 1.0 (TID 8)
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Error while processing vector batch (tag=0) Column vector types: 0:BYTES, 
1:BYTES
["2008-04-08", "2008-04-08"]
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:315)
 

[jira] [Comment Edited] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-06-06 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039995#comment-16039995
 ] 

Jianguo Tian edited comment on HIVE-16823 at 6/7/17 1:58 AM:
-

Hi, [~mmccline]. This exception was indeed triggered by 
[-HIVE-16273-|https://issues.apache.org/jira/browse/HIVE-16273], if I build 
Hive with code before this patch, this exception won't occur. Any comments and 
suggestion will be appreciated. Thx!
And in my opinion, it would be better to add some detailed description for  
[-HIVE-16273-|https://issues.apache.org/jira/browse/HIVE-16273].


was (Author: jonnyr):
Hi, [~mmccline]. This exception was indeed triggered by 
[-HIVE-16273-|https://issues.apache.org/jira/browse/HIVE-16273], if I build 
Hive with code before this patch, this exception won't occur. Any comments and 
suggestion will be appreciated. Thx!

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>
> script.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> -- set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> 

[jira] [Commented] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-06-06 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039995#comment-16039995
 ] 

Jianguo Tian commented on HIVE-16823:
-

Hi, [~mmccline]. This exception was indeed triggered by 
[-HIVE-16273-|https://issues.apache.org/jira/browse/HIVE-16273], if I build 
Hive with code before this patch, this exception won't occur. Any comments and 
suggestion will be appreciated. Thx!

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>
> script.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> -- set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> 

[jira] [Issue Comment Deleted] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-04 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-11297:

Comment: was deleted

(was: [~csun]: thanks for review, reply you on review board.)

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-04 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036540#comment-16036540
 ] 

Jianguo Tian commented on HIVE-11297:
-

[~csun]: thanks for review, reply you on review board.

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-05-25 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-11297:

Attachment: (was: HIVE-11297.1.patch)

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: Jianguo Tian
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-05-25 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-11297:

Attachment: HIVE-11297.1.patch

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: Jianguo Tian
> Attachments: HIVE-11297.1.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-05-25 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian reassigned HIVE-11297:
---

Assignee: Jianguo Tian  (was: liyunzhang_intel)

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: Jianguo Tian
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-23 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-14679:

Attachment: HIVE-14769.2 .patch

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
> Attachments: HIVE-14769.1.patch, HIVE-14769.2 .patch
>
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-20 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593556#comment-15593556
 ] 

Jianguo Tian edited comment on HIVE-14679 at 10/21/16 1:05 AM:
---

I have updated the latest patch on the [Review 
Board|https://reviews.apache.org/r/52981/], [~brocknoland], [~kennethmac2000], 
[~ngangam], could you please help me review this latest patch? Looking forward 
to your precious opinion. Thanks a lot!


was (Author: jonnyr):
I have updated latest patch on the [Review 
Board|https://reviews.apache.org/r/52981/], [~brocknoland], [~kennethmac2000], 
[~ngangam], could you please help me review this latest patch? Looking forward 
to your precious opinion. Thanks a lot!

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
> Attachments: HIVE-14769.1.patch
>
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-20 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593556#comment-15593556
 ] 

Jianguo Tian edited comment on HIVE-14679 at 10/21/16 1:05 AM:
---

I have updated latest patch on the [Review 
Board|https://reviews.apache.org/r/52981/], [~brocknoland], [~kennethmac2000], 
[~ngangam], could you please help me review this latest patch? Looking forward 
to your precious opinion. Thanks a lot!


was (Author: jonnyr):
I have updated latest patch on the Review Board, [~brocknoland], 
[~kennethmac2000], [~ngangam], could you please help me review this latest 
patch? Looking forward to your precious opinion. Thanks a lot!

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
> Attachments: HIVE-14769.1.patch
>
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-20 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593556#comment-15593556
 ] 

Jianguo Tian commented on HIVE-14679:
-

I have updated latest patch on the Review Board, [~brocknoland], 
[~kennethmac2000], [~ngangam], could you please help me review this latest 
patch? Looking forward to your precious opinion. Thanks a lot!

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
> Attachments: HIVE-14769.1.patch
>
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-20 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-14679:

Comment: was deleted

(was: I have fixed this issue, you can check the code as below:
{code:borderStyle=solid}
unquotedCsvPreference = new CsvPreference.Builder('\u0020', separator, 
"").surroundingSpacesNeedQuotes(true).build();
{code}
And accordingto the API of *CsvPreference.Builder*, method 
*surroundingSpacesNeedQuotes*'s parameter is "indicating whether spaces at the 
beginning or end of a cell should be ignored if they're not surrounded by 
quotes".
)

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
> Attachments: HIVE-14769.1.patch
>
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-19 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-14679:

Attachment: HIVE-14769.1.patch

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
> Attachments: HIVE-14769.1.patch
>
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-19 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588106#comment-15588106
 ] 

Jianguo Tian commented on HIVE-14679:
-

I have fixed this issue, you can check the code as below:
{code:borderStyle=solid}
unquotedCsvPreference = new CsvPreference.Builder('\u0020', separator, 
"").surroundingSpacesNeedQuotes(true).build();
{code}
And accordingto the API of *CsvPreference.Builder*, method 
*surroundingSpacesNeedQuotes*'s parameter is "indicating whether spaces at the 
beginning or end of a cell should be ignored if they're not surrounded by 
quotes".


> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-18 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587185#comment-15587185
 ] 

Jianguo Tian commented on HIVE-14679:
-

Agree. It really looks confusing and strange with null character. Let me find a 
suitable solution. Thanks.

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-18 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584710#comment-15584710
 ] 

Jianguo Tian commented on HIVE-14679:
-

What you said about "not affect the csv2/tsv2 formats" is correct, and that is 
exactly what I'm working forward to. Thanks for your opinion! Please wait for 
my patch which will be updated. 

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-18 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584702#comment-15584702
 ] 

Jianguo Tian edited comment on HIVE-14679 at 10/18/16 7:13 AM:
---

Hi, Kenneth MacArthur. It looks difficult to implement "there should simply be 
no quote character at all when quoting is disabled". As we can see from the 
below code, the first parameter of *Builder* method is a character, but 
unfortunately we can't implement an empty character in java as *""* in String.
{code:borderStyle=solid}
unquotedCsvPreference = new CsvPreference.Builder('\0', separator, "").build();
{code}
How do you think about this above?


was (Author: jonnyr):
Hi, [~Kenneth MacArthur]. It looks difficult to implement "there should simply 
be no quote character at all when quoting is disabled". As we can see from the 
below code, the first parameter of *Builder* method is a character, but 
unfortunately we can't implement an empty character in java as *""* in String.
{code:borderStyle=solid}
unquotedCsvPreference = new CsvPreference.Builder('\0', separator, "").build();
{code}
How do you think about this above?

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-18 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584702#comment-15584702
 ] 

Jianguo Tian commented on HIVE-14679:
-

Hi, [~Kenneth MacArthur]. It looks difficult to implement "there should simply 
be no quote character at all when quoting is disabled". As we can see from the 
below code, the first parameter of *Builder* method is a character, but 
unfortunately we can't implement an empty character in java as *""* in String.
{code:borderStyle=solid}
unquotedCsvPreference = new CsvPreference.Builder('\0', separator, "").build();
{code}
How do you think about this above?

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-18 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580939#comment-15580939
 ] 

Jianguo Tian edited comment on HIVE-14679 at 10/18/16 6:57 AM:
---

Thanks for your suggestions. I have finished the part of "Disabling quoting 
should be possible using a beeline argument". Next, I'll resolve your 3rd 
suggestion.



was (Author: jonnyr):
Thanks for your suggestions. I have finished the part of "Disabling quoting 
should be possible using a beeline argument". Next, I'll resolved your 3rd 
suggestion.

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-16 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580939#comment-15580939
 ] 

Jianguo Tian commented on HIVE-14679:
-

Thanks for your suggestions. I have finished the part of "Disabling quoting 
should be possible using a beeline argument". Next, I'll resolved your 3rd 
suggestion.

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-10-09 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Comment: was deleted

(was: Hive JDBC client)

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-10-09 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Comment: was deleted

(was: The "initFile" option in JDBC URL could be seen on the wiki.)

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-10-09 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559594#comment-15559594
 ] 

Jianguo Tian commented on HIVE-5867:


I have added "initFile=" option in the JDBC URL, now you can see some 
changes about "Connection URL Format" and "Connection URL for Remote or 
Embedded Mode".

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-10-09 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559594#comment-15559594
 ] 

Jianguo Tian edited comment on HIVE-5867 at 10/9/16 8:35 AM:
-

I have added "initFile=" option in the JDBC URL, now you can see some 
changes on wiki about "Connection URL Format" and "Connection URL for Remote or 
Embedded Mode".


was (Author: jonnyr):
I have added "initFile=" option in the JDBC URL, now you can see some 
changes about "Connection URL Format" and "Connection URL for Remote or 
Embedded Mode".

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-29 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532015#comment-15532015
 ] 

Jianguo Tian commented on HIVE-5867:


Thanks for reminding. But I don't have wiki edit privilege now, could you 
please help me update this section of wiki, of course, I'll provide you it's 
draft. Or maybe I should request to access writing wiki. So how do you think?

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-28 Thread Jianguo Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528822#comment-15528822
 ] 

Jianguo Tian commented on HIVE-5867:


OK, I'll update Wiki of this JIRA ASAP.

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-26 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Attachment: HIVE-5867.3 .patch

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-23 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Attachment: HIVE-5867.2.patch

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-09-12 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian reassigned HIVE-14679:
---

Assignee: Jianguo Tian

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-07 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Status: Patch Available  (was: Open)

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Attachments: HIVE-5867.1.patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-07 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Attachment: HIVE-5867.1.patch

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Attachments: HIVE-5867.1.patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-07 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Attachment: (was: HIVE-5867.1.patch)

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-07 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Attachment: HIVE-5867.1.patch

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Attachments: HIVE-5867.1.patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-08-31 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian reassigned HIVE-5867:
--

Assignee: Jianguo Tian

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)