[jira] [Assigned] (HIVE-18047) Support dynamic service discovery for HiveMetaStore

2017-11-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-18047:
--


> Support dynamic service discovery for HiveMetaStore
> ---
>
> Key: HIVE-18047
> URL: https://issues.apache.org/jira/browse/HIVE-18047
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Bing Li
>Assignee: Bing Li
>
> Similar like what Hive does on HiveServer2 (HIVE-7935), a HiveMetaStore 
> client can dynamically resolve an HiveMetaStore service to connect to via 
> ZooKeeper.
> *High Level Design:*
> Whether dynamic service discovery is supported or not can be configured by 
> setting
> HIVE_METASTORE_SUPPORT_DYNAMIC_SERVICE_DISCOVERY.  
> * This property should ONLY work when HiveMetaStrore service is in remote 
> mode.
> * When an instance of HiveMetaStore comes up, it adds itself as a znode to 
> Zookeeper under a configurable namespace (HIVE_METASTORE_ZOOKEEPER_NAMESPACE, 
> e.g. hivemetastore).
> * A thrift client specifies the ZooKeeper ensemble in its connection string, 
> instead of pointing to a specific HiveMetaStore instance. The ZooKeeper 
> ensemble will pick an instance of HiveMetaStore to connect for the session.
> * When an instance is removed from ZooKeeper, the existing client sessions 
> continue till completion. When the last client session completes, the 
> instance shuts down.
> * All new client connection pick one of the available HiveMetaStore uris from 
> ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-10567) partial scan for rcfile table doesn't work for dynamic partition

2017-09-05 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154766#comment-16154766
 ] 

Bing Li commented on HIVE-10567:


[~ashutoshc] I found that the previous patch could work on the latest master 
branch. So I just added the test case without any other code changes.

> partial scan for rcfile table doesn't work for dynamic partition
> 
>
> Key: HIVE-10567
> URL: https://issues.apache.org/jira/browse/HIVE-10567
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.14.0, 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Bing Li
>Priority: Minor
>  Labels: rcfile
> Attachments: HIVE-10567.1.patch, HIVE-10567.2.patch
>
>
> HIVE-3958 added support for partial scan for RCFile. This works fine for 
> static partitions (for example: analyze table analyze_srcpart_partial_scan 
> PARTITION(ds='2008-04-08',hr=11) compute statistics partialscan).
> For dynamic partition, the analyze files with an IOException 
> "java.io.IOException: No input paths specified in job":
> hive> ANALYZE TABLE testtable PARTITION(col_varchar) COMPUTE STATISTICS 
> PARTIALSCAN;
> java.io.IOException: No input paths specified in job
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:459)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-10567) partial scan for rcfile table doesn't work for dynamic partition

2017-09-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-10567:
---
Attachment: HIVE-10567.2.patch

Add the test case based on the HIVE-10567.1.patch

> partial scan for rcfile table doesn't work for dynamic partition
> 
>
> Key: HIVE-10567
> URL: https://issues.apache.org/jira/browse/HIVE-10567
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.14.0, 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Bing Li
>Priority: Minor
>  Labels: rcfile
> Attachments: HIVE-10567.1.patch, HIVE-10567.2.patch
>
>
> HIVE-3958 added support for partial scan for RCFile. This works fine for 
> static partitions (for example: analyze table analyze_srcpart_partial_scan 
> PARTITION(ds='2008-04-08',hr=11) compute statistics partialscan).
> For dynamic partition, the analyze files with an IOException 
> "java.io.IOException: No input paths specified in job":
> hive> ANALYZE TABLE testtable PARTITION(col_varchar) COMPUTE STATISTICS 
> PARTIALSCAN;
> java.io.IOException: No input paths specified in job
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:459)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-10567) partial scan for rcfile table doesn't work for dynamic partition

2017-07-20 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-10567:
--

Assignee: Bing Li  (was: Thomas Friedrich)

> partial scan for rcfile table doesn't work for dynamic partition
> 
>
> Key: HIVE-10567
> URL: https://issues.apache.org/jira/browse/HIVE-10567
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.14.0, 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Bing Li
>Priority: Minor
>  Labels: rcfile
> Attachments: HIVE-10567.1.patch
>
>
> HIVE-3958 added support for partial scan for RCFile. This works fine for 
> static partitions (for example: analyze table analyze_srcpart_partial_scan 
> PARTITION(ds='2008-04-08',hr=11) compute statistics partialscan).
> For dynamic partition, the analyze files with an IOException 
> "java.io.IOException: No input paths specified in job":
> hive> ANALYZE TABLE testtable PARTITION(col_varchar) COMPUTE STATISTICS 
> PARTIALSCAN;
> java.io.IOException: No input paths specified in job
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:459)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-10567) partial scan for rcfile table doesn't work for dynamic partition

2017-07-20 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094519#comment-16094519
 ] 

Bing Li commented on HIVE-10567:


Hi, [~ashutoshc]
The fix doesn't affect because the variable partitions is empty.

{code:java}
  case DYNAMIC_PARTITION:
for (Partition 
dynPart:tableScanOp.getConf().getTableMetadata().getTableSpec().partitions) {
  inputPaths.add(dynPart.getDataLocation());
}
break;
{code}

The "partitions" is set in Hive.getPartitionsByNames().

{code:java}
  public List getPartitionsByNames(Table tbl,
  Map partialPartSpec)
  throws HiveException {

if (!tbl.isPartitioned()) {
  throw new HiveException(ErrorMsg.TABLE_NOT_PARTITIONED, 
tbl.getTableName());
}

   // the size of names is 0;
List names = getPartitionNames(tbl.getDbName(), tbl.getTableName(),
partialPartSpec, (short)-1);

List partitions = getPartitionsByNames(tbl, names);
return partitions;
  }
{code}



 

> partial scan for rcfile table doesn't work for dynamic partition
> 
>
> Key: HIVE-10567
> URL: https://issues.apache.org/jira/browse/HIVE-10567
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.14.0, 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
>  Labels: rcfile
> Attachments: HIVE-10567.1.patch
>
>
> HIVE-3958 added support for partial scan for RCFile. This works fine for 
> static partitions (for example: analyze table analyze_srcpart_partial_scan 
> PARTITION(ds='2008-04-08',hr=11) compute statistics partialscan).
> For dynamic partition, the analyze files with an IOException 
> "java.io.IOException: No input paths specified in job":
> hive> ANALYZE TABLE testtable PARTITION(col_varchar) COMPUTE STATISTICS 
> PARTIALSCAN;
> java.io.IOException: No input paths specified in job
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:459)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-10567) partial scan for rcfile table doesn't work for dynamic partition

2017-07-19 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093552#comment-16093552
 ] 

Bing Li commented on HIVE-10567:


[~ashutoshc] and [~tfriedr] I tried to apply the current patch to master and 
ran the following query:

analyze table analyze_srcpart_partial_scan partition(ds, hr) compute statistics 
partialscan;

It still reported the ERROR:

{code:java}
java.io.IOException: No input paths specified in job
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:472)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:502)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
{code}

I will take a look at the reason later.


> partial scan for rcfile table doesn't work for dynamic partition
> 
>
> Key: HIVE-10567
> URL: https://issues.apache.org/jira/browse/HIVE-10567
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.14.0, 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
>  Labels: rcfile
> Attachments: HIVE-10567.1.patch
>
>
> HIVE-3958 added support for partial scan for RCFile. This works fine for 
> static partitions (for example: analyze table analyze_srcpart_partial_scan 
> PARTITION(ds='2008-04-08',hr=11) compute statistics partialscan).
> For dynamic partition, the analyze files with an IOException 
> "java.io.IOException: No input paths specified in job":
> hive> ANALYZE TABLE testtable PARTITION(col_varchar) COMPUTE STATISTICS 
> PARTIALSCAN;
> java.io.IOException: No input paths specified in job
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:459)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-17 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089608#comment-16089608
 ] 

Bing Li commented on HIVE-16922:


Thank you for the reviewing, [~lirui]. The latest test failures should not 
caused by patch#4.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
>  Labels: breaking_change
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch, 
> HIVE-16922.3.patch, HIVE-16922.4.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-16 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: HIVE-16922.4.patch

Update scripts for MSSQL.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
>  Labels: breaking_change
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch, 
> HIVE-16922.3.patch, HIVE-16922.4.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-16 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088935#comment-16088935
 ] 

Bing Li commented on HIVE-16922:


[~lirui] I checked the latest build results, the failures should NOT caused by 
this patch.
Thank you.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch, 
> HIVE-16922.3.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-14 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: HIVE-16922.3.patch

[~lirui] Thank you for your comments. This property will be written in 
SERDE_PARAMS table when the user specifies it. I updated the patch as version 
#3.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch, 
> HIVE-16922.3.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-13 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086900#comment-16086900
 ] 

Bing Li commented on HIVE-4577:
---

[~vgumashta] the failures in build#6010 should not be caused by this patch. 
Thanks.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2017-07-13 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14156 started by Bing Li.
--
> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-13 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085757#comment-16085757
 ] 

Bing Li edited comment on HIVE-4577 at 7/13/17 2:30 PM:


[~vgumashta], thank you for reviewing it.
The failure of HIVE-4577.7.patch is caused by the changes in query14.q.out.
I removed previous patch #6 and #7 (because the contents are similar) , and 
re-generated one based on the latest master branch, named it as patch #6.




was (Author: libing):
The failure of HIVE-4577.7.patch is caused by the changes in query14.q.out.
I removed previous patch #6 and #7, and re-generated one based on the latest 
master branch, named it as patch #6.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-13 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Attachment: HIVE-4577.6.patch

The failure of HIVE-4577.7.patch is caused by the changes in query14.q.out.
I removed previous patch #6 and #7, and re-generated one based on the latest 
master branch, named it as patch #6.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-13 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Attachment: (was: HIVE-4577.6.patch)

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-13 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Attachment: (was: HIVE-4577.7.patch)

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-13 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085461#comment-16085461
 ] 

Bing Li commented on HIVE-16907:


Sure, [~pxiong], I will work on it. Thank you.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE   

[jira] [Comment Edited] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-13 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084245#comment-16084245
 ] 

Bing Li edited comment on HIVE-16907 at 7/13/17 9:42 AM:
-

[~pxiong] and [~lirui], thank you for your comments.
I tried CREATE TABLE statement in MySQL, and found that it treats the `db.tbl` 
as the table name. And "dot" is allowed in the table name. 
e.g.

{code:java}
mysql> create table xxx (col int);
mysql> create table test.yyy (col int);
mysql> create table `test.zzz` (col int);
mysql> create table `test.test.tbl` (col int);
mysql> create table test.test.ooo (col int);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that 
corresponds to your MySQL server version for the right syntax to use near ‘.ooo 
(col int)’ at line 1

mysql> show tables;
++
| Tables_in_test |
++
| test.test.tbl  |
| test.zzz   |
| xxx|
| yyy|
++
{code}

Back to Hive, if we would like to make it having the same behavior as MySQL, we 
should change the logic of processing it.
My previous patch is NOT enough and can't handle `db.db.tbl` neither.


was (Author: libing):
[~pxiong] and [~lirui], thank you for your comments.
I tried CREATE TABLE statement in MySQL, and found that it treats the `db.tbl` 
as the table name. And "dot" is allowed in the table name. 
e.g.

{code:java}
mysql> create table xxx (col int);
mysql> create table test.yyy (col int);
mysql> create table `test.zzz` (col int);
mysql> create table `test.test.tbl` (col int);

mysql> show tables;
++
| Tables_in_test |
++
| test.test.tbl  |
| test.zzz   |
| xxx|
| yyy|
++
{code}

Back to Hive, if we would like to make it having the same behavior as MySQL, we 
should change the logic of processing it.
My previous patch is NOT enough and can't handle `db.db.tbl` neither.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>  

[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085069#comment-16085069
 ] 

Bing Li commented on HIVE-16922:


[~lirui], I can't reproduce TestMiniLlapLocalCliDriver[vector_if_expr] in my 
env, I don't think it caused by this patch.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Attachment: HIVE-4577.7.patch

Add the golden file for dfscmd.q

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, 
> HIVE-4577.6.patch, HIVE-4577.7.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16999) Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16999:
--

Assignee: Bing Li

> Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource
> 
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Assignee: Bing Li
>Priority: Critical
>
> Performance bottleneck is found in adding resource[which is lying on HDFS] to 
> the distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
> 2. ADD FILE "hdfs://some_dir/file.txt"
> {code}
> Here is the log corresponding to the archive adding operation:-
> {noformat}
>  converting to local hdfs://some_dir/archive.tar
>  Added resources: [hdfs://some_dir/archive.tar
> {noformat}
> Hive is downloading the resource to the local filesystem [shown in log by 
> "converting to local"]. 
> {color:#d04437}Ideally there is no need to bring the file to the local 
> filesystem when this operation is all about copying the file from one 
> location on HDFS to other location on HDFS[distributed cache].{color}
> This adds lot of performance bottleneck when the the resource is a big file 
> and all commands need the same resource.
> After debugging around the impacted piece of code is found to be :-
> {code:java}
> public List add_resources(ResourceType t, Collection values, 
> boolean convertToUnix)
>   throws RuntimeException {
> Set resourceSet = resourceMaps.getResourceSet(t);
> Map resourcePathMap = 
> resourceMaps.getResourcePathMap(t);
> Map reverseResourcePathMap = 
> resourceMaps.getReverseResourcePathMap(t);
> List localized = new ArrayList();
> try {
>   for (String value : values) {
> String key;
>  {color:#d04437}//get the local path of downloaded jars{color}
> List downloadedURLs = resolveAndDownload(t, value, 
> convertToUnix);
>  ;
>   .
> {code}
> {code:java}
>   List resolveAndDownload(ResourceType t, String value, boolean 
> convertToUnix) throws URISyntaxException,
>   IOException {
> URI uri = createURI(value);
> if (getURLType(value).equals("file")) {
>   return Arrays.asList(uri);
> } else if (getURLType(value).equals("ivy")) {
>   return dependencyResolver.downloadDependencies(uri);
> } else { // goes here for HDFS
>   return Arrays.asList(createURI(downloadResource(value, 
> convertToUnix))); // Here when the resource is not local it will download it 
> to the local machine.
> }
>   }
> {code}
> Here, the function resolveAndDownload() always calls the downloadResource() 
> api in case of external filesystem. It should take into consideration the 
> fact that - when the resource is on same HDFS then bringing it on local 
> machine is not a needed step and can be skipped for better performance.
> Thanks,
> Sailee



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-12 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084245#comment-16084245
 ] 

Bing Li commented on HIVE-16907:


[~pxiong] and [~lirui], thank you for your comments.
I tried CREATE TABLE statement in MySQL, and found that it treats the `db.tbl` 
as the table name. And "dot" is allowed in the table name. 
e.g.

{code:java}
mysql> create table xxx (col int);
mysql> create table test.yyy (col int);
mysql> create table `test.zzz` (col int);
mysql> create table `test.test.tbl` (col int);

mysql> show tables;
++
| Tables_in_test |
++
| test.test.tbl  |
| test.zzz   |
| xxx|
| yyy|
++
{code}

Back to Hive, if we would like to make it having the same behavior as MySQL, we 
should change the logic of processing it.
My previous patch is NOT enough and can't handle `db.db.tbl` neither.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0   

[jira] [Updated] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-13384:
---
Description: 
I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
But found that it can't new a HiveMetaStoreClient object successfully via a 
proxy user in Kerberos env.

===
15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
==

When I debugging on Hive, I found that the error came from open() method in 
HiveMetaStoreClient class.

Around line 406,
 transport = UserGroupInformation.getCurrentUser().doAs(new 
PrivilegedExceptionAction() {  //FAILED, because the current user 
doesn't have the cridential

But it will work if I change above line to
 transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
PrivilegedExceptionAction() {  //PASS

I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
submit a mapreduce job via Pig/HCatalog, it runs into the same issue again when 
initialize the object via HCatalog.

It would be better to fix this issue in Hive side.

  was:
I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
But found that it can't new a HiveMetaStoreClient object successfully via a 
proxy using in Kerberos env.

===
15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
==

When I debugging on Hive, I found that the error came from open() method in 
HiveMetaStoreClient class.

Around line 406,
 transport = UserGroupInformation.getCurrentUser().doAs(new 
PrivilegedExceptionAction() {  //FAILED, because the current user 
doesn't have the cridential

But it will work if I change above line to
 transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
PrivilegedExceptionAction() {  //PASS

I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
submit a mapreduce job via Pig/HCatalog, it runs into the same issue again when 
initialize the object via HCatalog.

It would be better to fix this issue in Hive side.


> Failed to create HiveMetaStoreClient object with proxy user when Kerberos 
> enabled
> -
>
> Key: HIVE-13384
> URL: https://issues.apache.org/jira/browse/HIVE-13384
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>
> I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
> But found that it can't new a HiveMetaStoreClient object successfully via a 
> proxy user in Kerberos env.
> ===
> 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
> ==
> When I debugging on Hive, I found that the error came from open() method in 
> HiveMetaStoreClient class.
> Around line 406,
>  transport = UserGroupInformation.getCurrentUser().doAs(new 
> PrivilegedExceptionAction() {  //FAILED, because the current user 
> doesn't have the cridential
> But it will work if I change above line to
>  transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
> PrivilegedExceptionAction() {  //PASS
> I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
> submit a mapreduce job via Pig/HCatalog, it runs into the same issue again 
> when initialize the 

[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084183#comment-16084183
 ] 

Bing Li commented on HIVE-16922:


Thank you, [~lirui]. Seems that the result page has been expired. Just 
re-submitted the patch to check.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: HIVE-16922.2.patch

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: (was: HIVE-16922.2.patch)

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084174#comment-16084174
 ] 

Bing Li commented on HIVE-4577:
---

Thank you, [~vgumashta]. I could reproduce TestPerfCliDriver [query14] in my 
env, and update its golden file. The failure of 
TestMiniLlapLocalCliDriver[vector_if_expr] and 
TestBeeLineDriver[materialized_view_create_rewrite] should not caused by this 
patch.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Attachment: HIVE-4577.6.patch

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-11 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082427#comment-16082427
 ] 

Bing Li commented on HIVE-16907:


[~pxiong] and [~ashutoshc] Could I get your comments on the patch? Thank you.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE

[jira] [Updated] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-11 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16907:
---
Status: Patch Available  (was: In Progress)

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.1.1, 1.1.0
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE |
> | table:  

[jira] [Updated] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-11 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16907:
---
Attachment: HIVE-16907.1.patch

The patch is created based on the latest master branch.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE   

[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-11 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081730#comment-16081730
 ] 

Bing Li commented on HIVE-4577:
---

[~vgumashta] HIVE-4577.5.patch is generated based on the latest master branch.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-11 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Attachment: HIVE-4577.5.patch

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-10 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081599#comment-16081599
 ] 

Bing Li commented on HIVE-4577:
---

Hi, [~vgumashta]
Yes, sure. I will rebase the patch with the latest master.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-09 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079841#comment-16079841
 ] 

Bing Li commented on HIVE-16922:


Hi, [~ashutoshc]
I updated the patch based on the testing results.
Could you review it when you're available?

Thanks a lot!

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-09 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: HIVE-16922.2.patch

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-06 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076574#comment-16076574
 ] 

Bing Li commented on HIVE-16922:


There is one failure caused by the fix, wil take a look at it tomorrow.

https://builds.apache.org/job/PreCommit-HIVE-Build/5905/testReport/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_orc_create_/
 

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-06 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076039#comment-16076039
 ] 

Bing Li edited comment on HIVE-16922 at 7/6/17 7:02 AM:


This patch file is created based on master branch.
The Thrift codes are generated by thrift-0.9.3.


was (Author: libing):
This patch file is created based on master branch.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-06 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Comment: was deleted

(was: The patch is based on master branch.)

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-06 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Status: Patch Available  (was: Open)

This patch file is created based on master branch.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: (was: HIVE-16922.1.patch)

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Status: Open  (was: Patch Available)

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Status: Patch Available  (was: In Progress)

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: HIVE-16922.1.patch

The patch is based on master branch.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16922 started by Bing Li.
--
> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16883) HBaseStorageHandler Ignores Case for HBase Table Name

2017-07-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16883:
--

Assignee: Bing Li

> HBaseStorageHandler Ignores Case for HBase Table Name
> -
>
> Key: HIVE-16883
> URL: https://issues.apache.org/jira/browse/HIVE-16883
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.2.1
> Environment: Hortonworks HDP 2.6.0.3, CentOS 7.0, VMWare ESXI
>Reporter: Shawn Weeks
>Assignee: Bing Li
>Priority: Minor
>
> Currently the HBaseStorageHandler is lower casing the HBase Table name. This 
> prevent use of the storage handler with existing HBase tables that are not 
> all lower case. Looking at the source this was done intentionally but I 
> haven't found any documentation about why on the wiki. To prevent a change in 
> the default behavior I'd suggest adding an additional property to the serde. 
> {code}
> create 'TestTable', 'd'
> create external table `TestTable` (
> id bigint,
> hash String,
> location String,
> name String
> )
> stored by "org.apache.hadoop.hive.hbase.HBaseStorageHandler"
> with serdeproperties (
> "hbase.columns.mapping" = ":key,d:hash,d:location,d:name",
> "hbase.table.name" = "TestTable"
> );
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16922:
--

Assignee: Bing Li

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-07-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16659:
---
Attachment: HIVE-16659.3.patch

refine GenSparkUtils.java based on Rui's comments on RB.

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
> Attachments: HIVE-16659.1.patch, HIVE-16659.2.patch, 
> HIVE-16659.3.patch
>
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-07-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074223#comment-16074223
 ] 

Bing Li edited comment on HIVE-16659 at 7/5/17 4:09 AM:


Refine GenSparkUtils.java based on Rui's comments on RB.
Thank you, [~ruili]


was (Author: libing):
refine GenSparkUtils.java based on Rui's comments on RB.

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
> Attachments: HIVE-16659.1.patch, HIVE-16659.2.patch, 
> HIVE-16659.3.patch
>
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16907 started by Bing Li.
--
>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE |
> | table:  
> 

[jira] [Comment Edited] (HIVE-16766) Hive query with space as filter does not give proper result

2017-07-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071054#comment-16071054
 ] 

Bing Li edited comment on HIVE-16766 at 7/4/17 8:52 AM:


Hi, [~subashprabanantham]
Which Hive version did you use? Could you post the reproduce queries as well? 

I tried it on a Hive package built from branch-2.3, and it worked for me.

My Testing
==
*hive> describe test;*
OK
col1string
col2string
Time taken: 0.057 seconds, Fetched: 2 row(s)
*hive> select * from test;*
OK
a1  a2
b1  b2
c1  c2
D
Time taken: 0.22 seconds, Fetched: 4 row(s)

*hive> select count(1) as cnt from test where col1="" and col2="D";*
Query ID = root_20170630235239_b58b7dbc-14ef-4126-b56b-fdcf187acc09
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Spark Job = f25577ce-2ed6-4c5c-a64a-6ff7419ab778
--
  STAGES   ATTEMPTSTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED
--
Stage-5  0  FINISHED  1  100
   0
Stage-6  0  FINISHED  1  100
   0
--
STAGES: 02/02[==>>] 100%  ELAPSED TIME: 1.01 s
--
Status: Finished successfully in 1.01 seconds
OK
1
Time taken: 1.436 seconds, Fetched: 1 row(s)


was (Author: libing):
Hi, Subash
Which Hive version did you use? Could you post the reproduce queries as well? 

I tried it on a Hive package built from branch-2.3, and it worked for me.

My Testing
==
*hive> describe test;*
OK
col1string
col2string
Time taken: 0.057 seconds, Fetched: 2 row(s)
*hive> select * from test;*
OK
a1  a2
b1  b2
c1  c2
D
Time taken: 0.22 seconds, Fetched: 4 row(s)

*hive> select count(1) as cnt from test where col1="" and col2="D";*
Query ID = root_20170630235239_b58b7dbc-14ef-4126-b56b-fdcf187acc09
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Spark Job = f25577ce-2ed6-4c5c-a64a-6ff7419ab778
--
  STAGES   ATTEMPTSTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED
--
Stage-5  0  FINISHED  1  100
   0
Stage-6  0  FINISHED  1  100
   0
--
STAGES: 02/02[==>>] 100%  ELAPSED TIME: 1.01 s
--
Status: Finished successfully in 1.01 seconds
OK
1
Time taken: 1.436 seconds, Fetched: 1 row(s)

> Hive query with space as filter does not give proper result
> ---
>
> Key: HIVE-16766
> URL: https://issues.apache.org/jira/browse/HIVE-16766
> Project: Hive
>  Issue Type: Bug
>Reporter: Subash
>Assignee: Bing Li
>Priority: Critical
>
> Hi Team,
> I have used the query as below format and it does not give proper results. 
> Since there is a split by \s+ in ExecuteStatementOperation class in line 48, 
> I feel something goes wrong there. Could help me with this, if i am wrong ? 
> I am using Hive JDBC version 1.1.0
> The sample query is as follows,
> select count(1) as cnt from table where col1=" " and col2="D";



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-07-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073343#comment-16073343
 ] 

Bing Li commented on HIVE-16659:


[~ruili], I updated the patch based on your comment and add the link of the 
review request. 
Thank you!

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
> Attachments: HIVE-16659.1.patch, HIVE-16659.2.patch
>
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-07-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16659:
---
Attachment: HIVE-16659.2.patch

Refine the patch with an test case.

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
> Attachments: HIVE-16659.1.patch, HIVE-16659.2.patch
>
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16950) Dropping hive database/table which was created explicitly in default database location, deletes all databases data from default database location

2017-07-03 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16950 started by Bing Li.
--
> Dropping hive database/table which was created explicitly in default database 
> location, deletes all databases data from default database location
> -
>
> Key: HIVE-16950
> URL: https://issues.apache.org/jira/browse/HIVE-16950
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Rahul Kalgunde
>Assignee: Bing Li
>Priority: Minor
>
> When database/table is created explicitly pointing to the default location, 
> dropping the database/table deletes all the data associated with the all 
> databases/tables.
> Steps to replicate: 
> in below e.g. dropping table test_db2 also deletes data of test_db1 where as 
> metastore still contains test_db1
> hive> create database test_db1;
> OK
> Time taken: 4.858 seconds
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.599 seconds, Fetched: 1 row(s)
> hive> create database test_db2 location '/apps/hive/warehouse' ;
> OK
> Time taken: 1.457 seconds
> hive> describe database test_db2;
> OK
> test_db2
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse rootUSER
> Time taken: 0.582 seconds, Fetched: 1 row(s)
> hive> drop database test_db2;
> OK
> Time taken: 1.317 seconds
> hive> dfs -ls /apps/hive/warehouse;
> ls: `/apps/hive/warehouse': No such file or directory
> Command failed with exit code = 1
> Query returned non-zero code: 1, cause: null
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.629 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16950) Dropping hive database/table which was created explicitly in default database location, deletes all databases data from default database location

2017-07-03 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071999#comment-16071999
 ] 

Bing Li commented on HIVE-16950:


>From the description, the requirement is more like an EXTERNAL database which 
>has NOT been supported by Hive yet.

But I think we could add some check when create/drop database to avoid this 
issue.
There would be two ways to do this:
1. Throw an error when the target location on HDFS already exists.
An existing empty directory is invalid as well. Because currently, Hive allows 
to create two databases with the same location.
2. ONLY drop the tables belong to the target database.
With this purpose, we should get all the tables under this database when DROP 
DATABASE is invoked. 
But it would affect the performance of DROP statement.

I prefer the #1. [~ashutoshc], any comments on this?  Thank you.


> Dropping hive database/table which was created explicitly in default database 
> location, deletes all databases data from default database location
> -
>
> Key: HIVE-16950
> URL: https://issues.apache.org/jira/browse/HIVE-16950
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Rahul Kalgunde
>Assignee: Bing Li
>Priority: Minor
>
> When database/table is created explicitly pointing to the default location, 
> dropping the database/table deletes all the data associated with the all 
> databases/tables.
> Steps to replicate: 
> in below e.g. dropping table test_db2 also deletes data of test_db1 where as 
> metastore still contains test_db1
> hive> create database test_db1;
> OK
> Time taken: 4.858 seconds
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.599 seconds, Fetched: 1 row(s)
> hive> create database test_db2 location '/apps/hive/warehouse' ;
> OK
> Time taken: 1.457 seconds
> hive> describe database test_db2;
> OK
> test_db2
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse rootUSER
> Time taken: 0.582 seconds, Fetched: 1 row(s)
> hive> drop database test_db2;
> OK
> Time taken: 1.317 seconds
> hive> dfs -ls /apps/hive/warehouse;
> ls: `/apps/hive/warehouse': No such file or directory
> Command failed with exit code = 1
> Query returned non-zero code: 1, cause: null
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.629 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16906) Hive ATSHook should check for yarn.timeline-service.enabled before connecting to ATS

2017-07-02 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16906:
--

Assignee: Bing Li

> Hive ATSHook should check for yarn.timeline-service.enabled before connecting 
> to ATS
> 
>
> Key: HIVE-16906
> URL: https://issues.apache.org/jira/browse/HIVE-16906
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.2
>Reporter: Prabhu Joseph
>Assignee: Bing Li
>
> Hive ATShook has to check yarn.timeline-service.enabled (Indicate to clients 
> whether timeline service is enabled or not. If enabled, clients will put 
> entities and events to the timeline server.) before creating TimelineClient 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-11019) Can't create an Avro table with uniontype column correctly

2017-07-02 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li resolved HIVE-11019.

Resolution: Resolved

> Can't create an Avro table with uniontype column correctly
> --
>
> Key: HIVE-11019
> URL: https://issues.apache.org/jira/browse/HIVE-11019
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> I tried the example in 
> https://cwiki.apache.org/confluence/display/Hive/AvroSerDe
> And found that it can't create an AVRO table correctly with uniontype
> hive> create table avro_union(union1 uniontype)STORED 
> AS AVRO;
> OK
> Time taken: 0.083 seconds
> hive> describe avro_union;
> OK
> union1  uniontype  
>   
> Time taken: 0.058 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16950) Dropping hive database/table which was created explicitly in default database location, deletes all databases data from default database location

2017-07-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16950:
--

Assignee: Bing Li

> Dropping hive database/table which was created explicitly in default database 
> location, deletes all databases data from default database location
> -
>
> Key: HIVE-16950
> URL: https://issues.apache.org/jira/browse/HIVE-16950
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Rahul Kalgunde
>Assignee: Bing Li
>Priority: Minor
>
> When database/table is created explicitly pointing to the default location, 
> dropping the database/table deletes all the data associated with the all 
> databases/tables.
> Steps to replicate: 
> in below e.g. dropping table test_db2 also deletes data of test_db1 where as 
> metastore still contains test_db1
> hive> create database test_db1;
> OK
> Time taken: 4.858 seconds
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.599 seconds, Fetched: 1 row(s)
> hive> create database test_db2 location '/apps/hive/warehouse' ;
> OK
> Time taken: 1.457 seconds
> hive> describe database test_db2;
> OK
> test_db2
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse rootUSER
> Time taken: 0.582 seconds, Fetched: 1 row(s)
> hive> drop database test_db2;
> OK
> Time taken: 1.317 seconds
> hive> dfs -ls /apps/hive/warehouse;
> ls: `/apps/hive/warehouse': No such file or directory
> Command failed with exit code = 1
> Query returned non-zero code: 1, cause: null
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.629 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11019) Can't create an Avro table with uniontype column correctly

2017-07-01 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071516#comment-16071516
 ] 

Bing Li commented on HIVE-11019:


It could work on branch-2.3.
Closed it.

hive> create table avro_union(union1 uniontype)STORED 
AS AVRO;
OK
Time taken: 2.04 seconds
hive> describe avro_union;
OK
union1  uniontype
Time taken: 0.165 seconds, Fetched: 1 row(s)

> Can't create an Avro table with uniontype column correctly
> --
>
> Key: HIVE-11019
> URL: https://issues.apache.org/jira/browse/HIVE-11019
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> I tried the example in 
> https://cwiki.apache.org/confluence/display/Hive/AvroSerDe
> And found that it can't create an AVRO table correctly with uniontype
> hive> create table avro_union(union1 uniontype)STORED 
> AS AVRO;
> OK
> Time taken: 0.083 seconds
> hive> describe avro_union;
> OK
> union1  uniontype  
>   
> Time taken: 0.058 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-07-01 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071514#comment-16071514
 ] 

Bing Li commented on HIVE-16659:


Hi, [~ruili]
Thank you for the review!
I checked the latest code on master branch, the current patch could be applied 
to it directly.
So I won't create a new patch file for the master branch for this Jira.
I will pay attention to it in the future.

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
> Attachments: HIVE-16659.1.patch
>
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16766) Hive query with space as filter does not give proper result

2017-07-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16766 started by Bing Li.
--
> Hive query with space as filter does not give proper result
> ---
>
> Key: HIVE-16766
> URL: https://issues.apache.org/jira/browse/HIVE-16766
> Project: Hive
>  Issue Type: Bug
>Reporter: Subash
>Assignee: Bing Li
>Priority: Critical
>
> Hi Team,
> I have used the query as below format and it does not give proper results. 
> Since there is a split by \s+ in ExecuteStatementOperation class in line 48, 
> I feel something goes wrong there. Could help me with this, if i am wrong ? 
> I am using Hive JDBC version 1.1.0
> The sample query is as follows,
> select count(1) as cnt from table where col1=" " and col2="D";



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17004) Calculating Number Of Reducers Looks At All Files

2017-07-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-17004:
--

Assignee: Bing Li

> Calculating Number Of Reducers Looks At All Files
> -
>
> Key: HIVE-17004
> URL: https://issues.apache.org/jira/browse/HIVE-17004
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: BELUGA BEHR
>Assignee: Bing Li
>
> When calculating the number of Mappers and Reducers, the two algorithms are 
> looking at different data sets.  The number of Mappers are calculated based 
> on the number of splits and the number of Reducers are based on the number of 
> files within the HDFS directory.  What you see is that if I add files to a 
> sub-directory of the HDFS directory, the number of splits remains the same 
> since I did not tell Hive to search recursively, and the number of Reducers 
> increases.  Please improve this so that Reducers are looking at the same 
> files that are considered for splits and not at files within sub-directories 
> (unless configured to do so).
> {code}
> CREATE EXTERNAL TABLE Complaints (
>   a string,
>   b string,
>   c string,
>   d string,
>   e string,
>   f string,
>   g string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> LOCATION '/user/admin/complaints';
> {code}
> {code}
> [root@host ~]# sudo -u hdfs hdfs dfs -ls -R /user/admin/complaints
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.1.csv
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.2.csv
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.3.csv
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.4.csv
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.5.csv
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.csv
> {code}
> {code}
> INFO  : Compiling 
> command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae): 
> select a, count(1) from complaints group by a limit 10
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:a, 
> type:string, comment:null), FieldSchema(name:_c1, type:bigint, 
> comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae); 
> Time taken: 0.077 seconds
> INFO  : Executing 
> command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae): 
> select a, count(1) from complaints group by a limit 10
> INFO  : Query ID = hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 11
> INFO  : In order to change the average load for a reducer (in bytes):
> INFO  :   set hive.exec.reducers.bytes.per.reducer=
> INFO  : In order to limit the maximum number of reducers:
> INFO  :   set hive.exec.reducers.max=
> INFO  : In order to set a constant number of reducers:
> INFO  :   set mapreduce.job.reduces=
> INFO  : number of splits:2
> INFO  : Submitting tokens for job: job_1493729203063_0003
> INFO  : The url to track the job: 
> http://host:8088/proxy/application_1493729203063_0003/
> INFO  : Starting Job = job_1493729203063_0003, Tracking URL = 
> http://host:8088/proxy/application_1493729203063_0003/
> INFO  : Kill Command = 
> /opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/lib/hadoop/bin/hadoop job  
> -kill job_1493729203063_0003
> INFO  : Hadoop job information for Stage-1: number of mappers: 2; number of 
> reducers: 11
> INFO  : 2017-05-02 14:20:14,206 Stage-1 map = 0%,  reduce = 0%
> INFO  : 2017-05-02 14:20:22,520 Stage-1 map = 100%,  reduce = 0%, Cumulative 
> CPU 4.48 sec
> INFO  : 2017-05-02 14:20:34,029 Stage-1 map = 100%,  reduce = 27%, Cumulative 
> CPU 15.72 sec
> INFO  : 2017-05-02 14:20:35,069 Stage-1 map = 100%,  reduce = 55%, Cumulative 
> CPU 21.94 sec
> INFO  : 2017-05-02 14:20:36,110 Stage-1 map = 100%,  reduce = 64%, Cumulative 
> CPU 23.97 sec
> INFO  : 2017-05-02 14:20:39,233 Stage-1 map = 100%,  reduce = 73%, Cumulative 
> CPU 25.26 sec
> INFO  : 2017-05-02 14:20:43,392 Stage-1 map = 100%,  reduce = 100%, 
> Cumulative CPU 30.9 sec
> INFO  : MapReduce Total cumulative CPU time: 30 seconds 900 msec
> INFO  : Ended Job = job_1493729203063_0003
> INFO  : MapReduce Jobs Launched: 
> INFO  : Stage-Stage-1: Map: 2  Reduce: 11   Cumulative CPU: 30.9 sec   HDFS 
> Read: 735691149 HDFS Write: 153 SUCCESS
> INFO  : Total 

[jira] [Commented] (HIVE-16766) Hive query with space as filter does not give proper result

2017-07-01 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071054#comment-16071054
 ] 

Bing Li commented on HIVE-16766:


Hi, Subash
Which Hive version did you use? Could you post the reproduce queries as well? 

I tried it on a Hive package built from branch-2.3, and it worked for me.

My Testing
==
*hive> describe test;*
OK
col1string
col2string
Time taken: 0.057 seconds, Fetched: 2 row(s)
*hive> select * from test;*
OK
a1  a2
b1  b2
c1  c2
D
Time taken: 0.22 seconds, Fetched: 4 row(s)

*hive> select count(1) as cnt from test where col1="" and col2="D";*
Query ID = root_20170630235239_b58b7dbc-14ef-4126-b56b-fdcf187acc09
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Spark Job = f25577ce-2ed6-4c5c-a64a-6ff7419ab778
--
  STAGES   ATTEMPTSTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED
--
Stage-5  0  FINISHED  1  100
   0
Stage-6  0  FINISHED  1  100
   0
--
STAGES: 02/02[==>>] 100%  ELAPSED TIME: 1.01 s
--
Status: Finished successfully in 1.01 seconds
OK
1
Time taken: 1.436 seconds, Fetched: 1 row(s)

> Hive query with space as filter does not give proper result
> ---
>
> Key: HIVE-16766
> URL: https://issues.apache.org/jira/browse/HIVE-16766
> Project: Hive
>  Issue Type: Bug
>Reporter: Subash
>Assignee: Bing Li
>Priority: Critical
>
> Hi Team,
> I have used the query as below format and it does not give proper results. 
> Since there is a split by \s+ in ExecuteStatementOperation class in line 48, 
> I feel something goes wrong there. Could help me with this, if i am wrong ? 
> I am using Hive JDBC version 1.1.0
> The sample query is as follows,
> select count(1) as cnt from table where col1=" " and col2="D";



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-30 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16659:
---
Status: Patch Available  (was: In Progress)

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
> Attachments: HIVE-16659.1.patch
>
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-30 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070235#comment-16070235
 ] 

Bing Li edited comment on HIVE-16659 at 6/30/17 3:11 PM:
-

This patch is based on branch-2.3.
With the above changes, I could get the explain result as below.

hive> {color:red}set hive.spark.use.groupby.shuffle=true;{color}
hive> explain select key, count(val) from t1 group by key;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:red}Reducer 2 <- Map 1 (GROUP, 2){color}
  DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

Time taken: 51.289 seconds, Fetched: 54 row(s)

hive> {color:red}set hive.spark.use.groupby.shuffle=false;{color}
hive> explain select key, count(val) from t1 group by key;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:red}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color}
  DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 

[jira] [Comment Edited] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-30 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070235#comment-16070235
 ] 

Bing Li edited comment on HIVE-16659 at 6/30/17 3:10 PM:
-

This patch is based on branch-2.3.
With the above changes, I could get the explain result as below.

hive> {color:#d04437}set hive.spark.use.groupby.shuffle=true;{color}
hive> explain select key, count(val) from t1 group by 
key;{color:#d04437}colored text{color}
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:red}Reducer 2 <- Map 1 (GROUP, 2){color}
  DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

Time taken: 51.289 seconds, Fetched: 54 row(s)

hive> {color:#d04437}set hive.spark.use.groupby.shuffle=false;{color}
hive> explain select key, count(val) from t1 group by key;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:#d04437}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color}
  DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num 

[jira] [Updated] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-30 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16659:
---
Attachment: HIVE-16659.1.patch

This patch is based on branch-2.3.
With the above changes, I could get the explain result as below.

_hive> {color:#205081}set hive.spark.use.groupby.shuffle=true;{color}
hive> explain select key, count(val) from t1 group by key;_
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:red}Reducer 2 <- Map 1 (GROUP, 2){color}
  DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

Time taken: 51.289 seconds, Fetched: 54 row(s)

_hive> {color:#205081}set hive.spark.use.groupby.shuffle=false{color};
hive> explain select key, count(val) from t1 group by key;_
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:#205081}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color}
  DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
  

[jira] [Work started] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-30 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16659 started by Bing Li.
--
> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16936) wrong result with CTAS(create table as select)

2017-06-30 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16936:
--

Assignee: Bing Li

> wrong result with CTAS(create table as select)
> --
>
> Key: HIVE-16936
> URL: https://issues.apache.org/jira/browse/HIVE-16936
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Xiaomeng Huang
>Assignee: Bing Li
>Priority: Critical
>
> 1. 
> {code}
> hive> desc abc_test_old;
> OK
> did   string
> activetimeint
> {code}
> 2. 
> {code}
> hive> select 'test' as did from abc_test_old
> > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1;
> OK
> test  
> {code}
> result is 'test'
> 3. 
> {code}
> hive> create table abc_test_12345 as
> > select 'test' as did from abc_test_old
> > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1;
> hive> select did from abc_test_12345 limit 1;
> OK
> 5FCAFD34-C124-4E13-AF65-27B675C945CC 
> {code}
> result is '5FCAFD34-C124-4E13-AF65-27B675C945CC'
> why result is not 'test'?
> 4. 
> {code}
> hive> explain
> > create table abc_test_12345 as
> > select 'test' as did from abc_test_old
> > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, Stage-4
>   Stage-3
>   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5
>   Stage-7 depends on stages: Stage-0
>   Stage-2
>   Stage-4
>   Stage-5 depends on stages: Stage-4
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abc_test_old
> Statistics: Num rows: 32 Data size: 1152 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: (did = '5FCAFD34-C124-4E13-AF65-27B675C945CC') 
> (type: boolean)
>   Statistics: Num rows: 16 Data size: 576 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> Statistics: Num rows: 16 Data size: 576 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 1
>   Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 36 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Operator Tree:
> Select Operator
>   expressions: '5FCAFD34-C124-4E13-AF65-27B675C945CC' (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Limit
> Number of rows: 1
> Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: true
>   Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>   serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>   name: default.abc_test_12345
> ..
> {code}
> why expressions is '5FCAFD34-C124-4E13-AF65-27B675C945CC'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-06-30 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16907:
--

Assignee: Bing Li

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE |
> | table:  
> 

[jira] [Assigned] (HIVE-16766) Hive query with space as filter does not give proper result

2017-06-09 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16766:
--

Assignee: Bing Li

> Hive query with space as filter does not give proper result
> ---
>
> Key: HIVE-16766
> URL: https://issues.apache.org/jira/browse/HIVE-16766
> Project: Hive
>  Issue Type: Bug
>Reporter: Subash
>Assignee: Bing Li
>Priority: Critical
>
> Hi Team,
> I have used the query as below format and it does not give proper results. 
> Since there is a split by \s+ in ExecuteStatementOperation class in line 48, 
> I feel something goes wrong there. Could help me with this, if i am wrong ? 
> I am using Hive JDBC version 1.1.0
> The sample query is as follows,
> select count(1) as cnt from table where col1=" " and col2="D";



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16615) Support Time Zone Specifiers (i.e. "at time zone X")

2017-06-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16615:
--

Assignee: Bing Li

> Support Time Zone Specifiers (i.e. "at time zone X")
> 
>
> Key: HIVE-16615
> URL: https://issues.apache.org/jira/browse/HIVE-16615
> Project: Hive
>  Issue Type: Improvement
>Reporter: Carter Shanklin
>Assignee: Bing Li
>
> HIVE-14412 introduces a timezone-aware timestamp.
> SQL has a concept of "time zone specifier" which applies to any datetime 
> value expression (which covers time/timestamp with and without timezones). 
> Hive lacks a time type so we can put that aside for a while.
> Examples:
>   a. select time_stamp_with_time_zone at time zone '-8:00';
>   b. select time_stamp_without_time_zone at time zone LOCAL;
> These statements would adjust the expression from its original timezone into 
> a known target timezone.
> Using  the time zone specifier results in a data type that has a time zone. 
> If the original expression lacked a time zone, the result has a time zone. If 
> the original expression had a time zone, the result still has a time zone, 
> possibly a different one.
> LOCAL means to use the session's original default time zone displacement.
> The standard says that dates are not supported with time zone specifiers. It 
> seems common to ignore this rule and allow this, by converting the date to a 
> timestamp and then applying the usual rule.
> The standard only requires an interval or the LOCAL keyword. Some databases 
> allow time zone identifiers like PST.
> Reference: SQL:2011 section 6.31



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16659:
--

Assignee: Bing Li  (was: Rui Li)

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-08 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042361#comment-16042361
 ] 

Bing Li commented on HIVE-16659:


Hi, [~ruili]
Could I take it over?

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-16800) Hive Metastore configuration with Mysql

2017-06-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li resolved HIVE-16800.

Resolution: Not A Bug

> Hive Metastore configuration with Mysql
> ---
>
> Key: HIVE-16800
> URL: https://issues.apache.org/jira/browse/HIVE-16800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.2
>Reporter: Vigneshwaran
>Assignee: Bing Li
>
> I'm trying to configure MySql as metastore in Hive 1.2.2 by following the 
> link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm 
> trying to run hive after all the step I'm getting the below errors:
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
> Caused by: java.lang.reflect.InvocationTargetException
> Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting 
> persistence propertiesNestedThrowables:



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16800) Hive Metastore configuration with Mysql

2017-06-08 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042352#comment-16042352
 ] 

Bing Li commented on HIVE-16800:


Hi, Vigneshwaran
I think the document you referred to is out-of-date.

Please try the following steps in your cluster (using the commands for RHEL as 
an example):
1. Install MySQL
yum -y install mysql-server mysql mysql-devel

2. Start MySQL
/etc/init.d/mysqld start

3. Link or copy mysql-connector-java.jar to hive/lib

4. Set configurations in hive-site.xml
javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
javax.jdo.option.ConnectionURL=jdbc:mysql://myhost.com/hive?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine

5. Prepare database for HiveMetastore in MySQL
mysql>create database hive;
mysql> grant all on hive.* to 'APP'@'myhost.com' identified by 'mine';

6. Verification on MySQL
mysql -u APP -h myhost.com -p
Type with "mine" as the password

7. Run Hive SchemaTool
hive/bin/schematool -dbType mysql -initSchema

8. Start HiveMetastore
hive/bin/hive --service metastore

> Hive Metastore configuration with Mysql
> ---
>
> Key: HIVE-16800
> URL: https://issues.apache.org/jira/browse/HIVE-16800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.2
>Reporter: Vigneshwaran
>Assignee: Bing Li
>
> I'm trying to configure MySql as metastore in Hive 1.2.2 by following the 
> link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm 
> trying to run hive after all the step I'm getting the below errors:
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
> Caused by: java.lang.reflect.InvocationTargetException
> Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting 
> persistence propertiesNestedThrowables:



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16800) Hive Metastore configuration with Mysql

2017-06-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16800 started by Bing Li.
--
> Hive Metastore configuration with Mysql
> ---
>
> Key: HIVE-16800
> URL: https://issues.apache.org/jira/browse/HIVE-16800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.2
>Reporter: Vigneshwaran
>Assignee: Bing Li
>
> I'm trying to configure MySql as metastore in Hive 1.2.2 by following the 
> link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm 
> trying to run hive after all the step I'm getting the below errors:
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
> Caused by: java.lang.reflect.InvocationTargetException
> Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting 
> persistence propertiesNestedThrowables:



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16614) Support "set local time zone" statement

2017-06-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16614:
--

Assignee: Bing Li

> Support "set local time zone" statement
> ---
>
> Key: HIVE-16614
> URL: https://issues.apache.org/jira/browse/HIVE-16614
> Project: Hive
>  Issue Type: Improvement
>Reporter: Carter Shanklin
>Assignee: Bing Li
>
> HIVE-14412 introduces a timezone-aware timestamp.
> SQL has a concept of default time zone displacements, which are transparently 
> applied when converting between timezone-unaware types and timezone-aware 
> types and, in Hive's case, are also used to shift a timezone aware type to a 
> different time zone, depending on configuration.
> SQL also provides that the default time zone displacement be settable at a 
> session level, so that clients can access a database simultaneously from 
> different time zones and see time values in their own time zone.
> Currently the time zone displacement is fixed and is set based on the system 
> time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be 
> more convenient for users if they have the ability to set their time zone of 
> choice.
> SQL defines "set time zone" with 2 ways of specifying the time zone, first 
> using an interval and second using the special keyword LOCAL.
> Examples:
>   • set time zone '-8:00';
>   • set time zone LOCAL;
> LOCAL means to set the current default time zone displacement to the 
> session's original default time zone displacement.
> Reference: SQL:2011 section 19.4



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16800) Hive Metastore configuration with Mysql

2017-06-07 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16800:
--

Assignee: Bing Li

> Hive Metastore configuration with Mysql
> ---
>
> Key: HIVE-16800
> URL: https://issues.apache.org/jira/browse/HIVE-16800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.2
>Reporter: Vigneshwaran
>Assignee: Bing Li
>
> I'm trying to configure MySql as metastore in Hive 1.2.2 by following the 
> link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm 
> trying to run hive after all the step I'm getting the below errors:
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
> Caused by: java.lang.reflect.InvocationTargetException
> Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting 
> persistence propertiesNestedThrowables:



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037924#comment-16037924
 ] 

Bing Li commented on HIVE-16573:


[~ruili] and [~anishek], thank you for your review.
I just submitted the patch.


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Status: Patch Available  (was: In Progress)

I verified this patch, it could work for spark engine on HiveCLI.

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Attachment: HIVE-16573.1.patch

Generate the patch file based on master branch

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Attachment: (was: HIVE-16573-branch2.3.patch)

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Attachment: HIVE-16573-branch2.3.patch

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573-branch2.3.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036565#comment-16036565
 ] 

Bing Li edited comment on HIVE-16573 at 6/5/17 5:53 AM:


Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
{quote}
String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS);
{quote}

Do you think is ok?



was (Author: libing):
Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
{quote}
{{ String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); }}
{quote}

Do you think is ok?


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036565#comment-16036565
 ] 

Bing Li edited comment on HIVE-16573 at 6/5/17 5:51 AM:


Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
{quote}
String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS);
{quote}

Do you think is ok?



was (Author: libing):
Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS);

Do you think is ok?


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036565#comment-16036565
 ] 

Bing Li edited comment on HIVE-16573 at 6/5/17 5:52 AM:


Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
{quote}
{{ String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); }}
{quote}

Do you think is ok?



was (Author: libing):
Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
{quote}
String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS);
{quote}

Do you think is ok?


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16573 started by Bing Li.
--
> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036565#comment-16036565
 ] 

Bing Li commented on HIVE-16573:


Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS);

Do you think is ok?


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16573) In-place update for HoS can't be disabled

2017-05-17 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16573:
--

Assignee: Bing Li  (was: Rui Li)

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-05-16 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012349#comment-16012349
 ] 

Bing Li commented on HIVE-16573:


[~ruili] I'm interesting on this, could I take over it? Thank you.

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-11020) support partial scan for analyze command - Avro

2017-05-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li resolved HIVE-11020.

  Resolution: Won't Fix
Release Note: 
We don't need this feature for Avro format now.
Close this Jira.

> support partial scan for analyze command - Avro
> ---
>
> Key: HIVE-11020
> URL: https://issues.apache.org/jira/browse/HIVE-11020
> Project: Hive
>  Issue Type: Improvement
>Reporter: Bing Li
>Assignee: Bing Li
>
> This is follow up on HIVE-3958.
> We already have two similar Jiras
> - support partial scan for analyze command - ORC
> https://issues.apache.org/jira/browse/HIVE-4177
> - [Parquet] Support Analyze Table with partial scan
> https://issues.apache.org/jira/browse/HIVE-9491



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-05 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362286#comment-15362286
 ] 

Bing Li commented on HIVE-14156:


Hi, [~xiaobingo]
I noticed that you fixed HIVE-8550 on windows, and mentioned that it should 
work on Linux.
I ran the similar query but failed with MySQL.

In order to make it work, besides the changes in Hive schema script, I also 
need to update MySQL's configuration file which is my.cnf.

When you ran it on windows, did you change the configuraions for the database? 
Did you have a chance to run it on Linux as well?

Thank you.


> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-14156:
---
Comment: was deleted

(was: Hi, Rui
I didn't have a chance to try other databases, like Derby, Oracle and Postgres.
But one thing I found is that in the scripts for other databases, it didn't 
specify the character set.
)

> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-05 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362092#comment-15362092
 ] 

Bing Li commented on HIVE-14156:


Hi, Rui
I didn't have a chance to try other databases, like Derby, Oracle and Postgres.
But one thing I found is that in the scripts for other databases, it didn't 
specify the character set.


> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-05 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362091#comment-15362091
 ] 

Bing Li commented on HIVE-14156:


Hi, Rui
I didn't have a chance to try other databases, like Derby, Oracle and Postgres.
But one thing I found is that in the scripts for other databases, it didn't 
specify the character set.


> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15361394#comment-15361394
 ] 

Bing Li commented on HIVE-14156:


I noticed that in the schema files under metastore/scripts/upgrade/mysql, like 
hive-schema-2.0.0.mysql.sql, the character set is latin1 for all tables instead 
of utf8.

And it could work with MySQL if I update the following columns in the schema 
script to utf8

SDS.LOCATION
PARTITIONS.PART_NAME
PARTITION_KEY_VALS.PART_KEY_VAL
1)  change the limitation of varchar(xxx) to varchar(255)
2) change "latin1" to "utf8"

In Hive's wiki and HIVE-8550, it mentioned that Hive could support unicode in 
the partition name.
Is there some special settings for MySQL to support it?

> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-14156:
--

Assignee: Bing Li

> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-06-27 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li resolved HIVE-13850.

Resolution: Not A Bug

It could be resolved with Hive ACID supported.

> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-13850-1.2.1.patch
>
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-06-27 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350471#comment-15350471
 ] 

Bing Li commented on HIVE-13850:


Hi, [~ashutoshc]
Thanks a lot for your comment. 
It worked for us to set Hive with ACID supported.

I will close this defect as well.

> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-13850-1.2.1.patch
>
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >