[jira] [Commented] (HIVE-6865) Failed to load data into Hive from Pig using HCatStorer()

2015-05-31 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566964#comment-14566964
 ] 

Bing Li commented on HIVE-6865:
---

This issue has been resolved in Hive-1.2.0.

> Failed to load data into Hive from Pig using HCatStorer()
> -
>
> Key: HIVE-6865
> URL: https://issues.apache.org/jira/browse/HIVE-6865
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Reproduce steps:
> 1. create a hive table
> hive> create table t1 (c1 int, c2 int, c3 int);
> 2. start pig shell
> grunt> register $HIVE_HOME/lib/*.jar
> grunt> register $HIVE_HOME/hcatalog/share/hcatalog/*.jar
> grunt> A = load 'pig.txt' as (c1:int, c2:int, c3:int)
> grunt> store A into 't1' using org.apache.hive.hcatalog.HCatSrorer();
> Error Message:
> ERROR [main] org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: 
> Unable to recreate exception from backend error: 
> org.apache.hcatalog.common.HCatException : 2004 : HCatOutputFormat not 
> initialized, setOutput has to be called
> at 
> org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:111)
> at 
> org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:97)
> at 
> org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:85)
> at 
> org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:75)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1000)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:963)
> at 
> java.security.AccessController.doPrivileged(AccessController.java:310)
> at javax.security.auth.Subject.doAs(Subject.java:573)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:963)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:616)
> at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> at java.lang.reflect.Method.invoke(Method.java:611)
> at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
> at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
> at java.lang.Thread.run(Thread.java:738)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null

2015-06-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-10495:
---
Attachment: HIVE-10495.1.patch

> Hive index creation code throws NPE if index table is null
> --
>
> Key: HIVE-10495
> URL: https://issues.apache.org/jira/browse/HIVE-10495
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-10495.1.patch
>
>
> The stack trace would be:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> at java.lang.reflect.Method.invoke(Method.java:611)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
> at $Proxy9.add_index(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6727) Table level stats for external tables are set incorrectly

2015-06-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-6727:
--
Attachment: HIVE-6727.3.patch

This patch is created based on the latest Hive source code and included the fix 
in test case.

> Table level stats for external tables are set incorrectly
> -
>
> Key: HIVE-6727
> URL: https://issues.apache.org/jira/browse/HIVE-6727
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Harish Butani
>Assignee: Bing Li
> Attachments: HIVE-6727.2.patch, HIVE-6727.3.patch
>
>
> if you do the following:
> {code}
> CREATE EXTERNAL TABLE anaylyze_external (a INT) LOCATION 
> 'data/files/ext_test';
> describe formatted anaylyze_external;
> {code}
> The table level stats are:
> {noformat}
> Table Parameters:
>   COLUMN_STATS_ACCURATE   true
>   EXTERNALTRUE
>   numFiles0
>   numRows 6
>   rawDataSize 6
>   totalSize   0
> {noformat}
> numFiles and totalSize is always 0.
> Issue is:
> MetaStoreUtils:updateUnpartitionedTableStatsFast attempts to set table level 
> stats from FileStatus. But it doesn't account for External tables, it always 
> calls Warehouse.getFileStatusesForUnpartitionedTable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6727) Table level stats for external tables are set incorrectly

2015-06-01 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567068#comment-14567068
 ] 

Bing Li commented on HIVE-6727:
---

Hi, [~ashutoshc]
The current test cases in Hive already cover this case but the result is wrong.
HIVE-6727.3.patch fixes the output file of the case.

Thank you for your review.

> Table level stats for external tables are set incorrectly
> -
>
> Key: HIVE-6727
> URL: https://issues.apache.org/jira/browse/HIVE-6727
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.13.0, 0.13.1, 1.2.0
>Reporter: Harish Butani
>Assignee: Bing Li
> Attachments: HIVE-6727.2.patch, HIVE-6727.3.patch
>
>
> if you do the following:
> {code}
> CREATE EXTERNAL TABLE anaylyze_external (a INT) LOCATION 
> 'data/files/ext_test';
> describe formatted anaylyze_external;
> {code}
> The table level stats are:
> {noformat}
> Table Parameters:
>   COLUMN_STATS_ACCURATE   true
>   EXTERNALTRUE
>   numFiles0
>   numRows 6
>   rawDataSize 6
>   totalSize   0
> {noformat}
> numFiles and totalSize is always 0.
> Issue is:
> MetaStoreUtils:updateUnpartitionedTableStatsFast attempts to set table level 
> stats from FileStatus. But it doesn't account for External tables, it always 
> calls Warehouse.getFileStatusesForUnpartitionedTable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-06-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Affects Version/s: 1.2.0

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-06-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Fix Version/s: 1.2.1

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Fix For: 1.2.1
>
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-06-01 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567075#comment-14567075
 ] 

Bing Li commented on HIVE-4577:
---

Hi, [~thejas]
Could you review the patch again? Thanks a lot!

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Fix For: 1.2.1
>
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2015-06-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-6990:
--
Attachment: HIVE-6990.4.patch

This patch is generated based on the latest Hive code in master branch.

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-6990.1.patch, HIVE-6990.2.patch, HIVE-6990.3.patch, 
> HIVE-6990.4.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source)
> Reproduce steps:
> 1. set the following properties in hive-site.xml
>  
>   javax.jdo.mapping.Schema
>   HIVE
>  
>  
>   javax.jdo.option.ConnectionUserName
>   user1
>  
> 2. execute hive queries
> hive> create table mytbl ( key int, value string);
> hive> load data local inpath 'examples/files/kv1.txt' overwrite into table 
> mytbl;
> hive> select * from mytbl;
> hive> create view myview partitioned on (value) as select key, value from 
> mytbl where key=98;
> hive> alter view myview add partition (value='val_98') partition 
> (value='val_xyz');
> hive> alter view myview drop partition (value='val_xyz');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-6865) Failed to load data into Hive from Pig using HCatStorer()

2015-06-03 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li resolved HIVE-6865.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

I tried the same queries in Hive 1.2.0, it could work well. 
Close it as fixed

> Failed to load data into Hive from Pig using HCatStorer()
> -
>
> Key: HIVE-6865
> URL: https://issues.apache.org/jira/browse/HIVE-6865
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Bing Li
>Assignee: Bing Li
> Fix For: 1.2.0
>
>
> Reproduce steps:
> 1. create a hive table
> hive> create table t1 (c1 int, c2 int, c3 int);
> 2. start pig shell
> grunt> register $HIVE_HOME/lib/*.jar
> grunt> register $HIVE_HOME/hcatalog/share/hcatalog/*.jar
> grunt> A = load 'pig.txt' as (c1:int, c2:int, c3:int)
> grunt> store A into 't1' using org.apache.hive.hcatalog.HCatSrorer();
> Error Message:
> ERROR [main] org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: 
> Unable to recreate exception from backend error: 
> org.apache.hcatalog.common.HCatException : 2004 : HCatOutputFormat not 
> initialized, setOutput has to be called
> at 
> org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:111)
> at 
> org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:97)
> at 
> org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:85)
> at 
> org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:75)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1000)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:963)
> at 
> java.security.AccessController.doPrivileged(AccessController.java:310)
> at javax.security.auth.Subject.doAs(Subject.java:573)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:963)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:616)
> at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> at java.lang.reflect.Method.invoke(Method.java:611)
> at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
> at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
> at java.lang.Thread.run(Thread.java:738)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6727) Table level stats for external tables are set incorrectly

2015-06-03 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571901#comment-14571901
 ] 

Bing Li commented on HIVE-6727:
---

Thank you, Ashutosh!

> Table level stats for external tables are set incorrectly
> -
>
> Key: HIVE-6727
> URL: https://issues.apache.org/jira/browse/HIVE-6727
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
>Reporter: Harish Butani
>Assignee: Bing Li
> Fix For: 1.3.0
>
> Attachments: HIVE-6727.2.patch, HIVE-6727.3.patch
>
>
> if you do the following:
> {code}
> CREATE EXTERNAL TABLE anaylyze_external (a INT) LOCATION 
> 'data/files/ext_test';
> describe formatted anaylyze_external;
> {code}
> The table level stats are:
> {noformat}
> Table Parameters:
>   COLUMN_STATS_ACCURATE   true
>   EXTERNALTRUE
>   numFiles0
>   numRows 6
>   rawDataSize 6
>   totalSize   0
> {noformat}
> numFiles and totalSize is always 0.
> Issue is:
> MetaStoreUtils:updateUnpartitionedTableStatsFast attempts to set table level 
> stats from FileStatus. But it doesn't account for External tables, it always 
> calls Warehouse.getFileStatusesForUnpartitionedTable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6727) Table level stats for external tables are set incorrectly

2015-06-03 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571902#comment-14571902
 ] 

Bing Li commented on HIVE-6727:
---

Thank you, Ashutosh!

> Table level stats for external tables are set incorrectly
> -
>
> Key: HIVE-6727
> URL: https://issues.apache.org/jira/browse/HIVE-6727
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
>Reporter: Harish Butani
>Assignee: Bing Li
> Fix For: 1.3.0
>
> Attachments: HIVE-6727.2.patch, HIVE-6727.3.patch
>
>
> if you do the following:
> {code}
> CREATE EXTERNAL TABLE anaylyze_external (a INT) LOCATION 
> 'data/files/ext_test';
> describe formatted anaylyze_external;
> {code}
> The table level stats are:
> {noformat}
> Table Parameters:
>   COLUMN_STATS_ACCURATE   true
>   EXTERNALTRUE
>   numFiles0
>   numRows 6
>   rawDataSize 6
>   totalSize   0
> {noformat}
> numFiles and totalSize is always 0.
> Issue is:
> MetaStoreUtils:updateUnpartitionedTableStatsFast attempts to set table level 
> stats from FileStatus. But it doesn't account for External tables, it always 
> calls Warehouse.getFileStatusesForUnpartitionedTable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-4401) Support quoted schema and table names in Hive /Hive JDBC Driver

2015-06-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li resolved HIVE-4401.
---
Resolution: Won't Fix

https://issues.apache.org/jira/browse/HIVE-6013 added supported for quoted 
column names using ` instead of double-quotes.
If we want to expand on the existing feature, we should open a new JIRA with a 
different description.

> Support quoted schema and table names in Hive /Hive JDBC Driver
> ---
>
> Key: HIVE-4401
> URL: https://issues.apache.org/jira/browse/HIVE-4401
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Hive driver can not handle the quoted table names and schema names, which can 
> be processed by db2, and almost all other databases. 
> e.g.
> SELECT * FROM "gosales"."branch"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10948) Slf4j warning in HiveCLI due to spark

2015-06-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-10948:
---
Description: 
The spark-assembly-1.3.1.jar is added to the Hive classpath 
./hive.distro:  export SPARK_HOME=$sparkHome
./hive.distro:  sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`
./hive.distro:  CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"

When launch HiveCLI, we could see the following message:
===
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]


The bug is similar like HIVE-9496

  was:
The spark-assembly-1.3.1.jar is added to the Hive classpath 
./hive.distro:  export SPARK_HOME=$sparkHome
./hive.distro:  sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`
./hive.distro:  CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"

When launch HiveCLI, we could see the following message:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]


> Slf4j warning in HiveCLI due to spark
> -
>
> Key: HIVE-10948
> URL: https://issues.apache.org/jira/browse/HIVE-10948
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Minor
>
> The spark-assembly-1.3.1.jar is added to the Hive classpath 
> ./hive.distro:  export SPARK_HOME=$sparkHome
> ./hive.distro:  sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`
> ./hive.distro:  CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"
> When launch HiveCLI, we could see the following message:
> ===
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> WARNING: Use "yarn jar" to launch YARN applications.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 
> The bug is similar like HIVE-9496



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11019) Can't create an Avro table with uniontype column correctly

2015-06-16 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-11019:
--

Assignee: Bing Li

> Can't create an Avro table with uniontype column correctly
> --
>
> Key: HIVE-11019
> URL: https://issues.apache.org/jira/browse/HIVE-11019
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> I tried the example in 
> https://cwiki.apache.org/confluence/display/Hive/AvroSerDe
> And found that it can't create an AVRO table correctly with uniontype
> hive> create table avro_union(union1 uniontype)STORED 
> AS AVRO;
> OK
> Time taken: 0.083 seconds
> hive> describe avro_union;
> OK
> union1  uniontype  
>   
> Time taken: 0.058 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-06-08 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320675#comment-15320675
 ] 

Bing Li commented on HIVE-13850:


Hi, [~ashutoshc]
Thank you for your comments. 
Yes, you're right. The issue hasn't been resolved by naming the target file 
with timestamp. We ran into it again...

We tried to set the following properties, but still got the error. 
Hive.support.concurrency -> true
Hive.txn.manager -> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

Are there any other properties required?

Thank you.

> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-13850-1.2.1.patch
>
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-06-26 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350471#comment-15350471
 ] 

Bing Li commented on HIVE-13850:


Hi, [~ashutoshc]
Thanks a lot for your comment. 
It worked for us to set Hive with ACID supported.

I will close this defect as well.

> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-13850-1.2.1.patch
>
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-06-26 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li resolved HIVE-13850.

Resolution: Not A Bug

It could be resolved with Hive ACID supported.

> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-13850-1.2.1.patch
>
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-14156:
--

Assignee: Bing Li

> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361394#comment-15361394
 ] 

Bing Li commented on HIVE-14156:


I noticed that in the schema files under metastore/scripts/upgrade/mysql, like 
hive-schema-2.0.0.mysql.sql, the character set is latin1 for all tables instead 
of utf8.

And it could work with MySQL if I update the following columns in the schema 
script to utf8

SDS.LOCATION
PARTITIONS.PART_NAME
PARTITION_KEY_VALS.PART_KEY_VAL
1)  change the limitation of varchar(xxx) to varchar(255)
2) change "latin1" to "utf8"

In Hive's wiki and HIVE-8550, it mentioned that Hive could support unicode in 
the partition name.
Is there some special settings for MySQL to support it?

> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362091#comment-15362091
 ] 

Bing Li commented on HIVE-14156:


Hi, Rui
I didn't have a chance to try other databases, like Derby, Oracle and Postgres.
But one thing I found is that in the scripts for other databases, it didn't 
specify the character set.


> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362092#comment-15362092
 ] 

Bing Li commented on HIVE-14156:


Hi, Rui
I didn't have a chance to try other databases, like Derby, Oracle and Postgres.
But one thing I found is that in the scripts for other databases, it didn't 
specify the character set.


> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-14156:
---
Comment: was deleted

(was: Hi, Rui
I didn't have a chance to try other databases, like Derby, Oracle and Postgres.
But one thing I found is that in the scripts for other databases, it didn't 
specify the character set.
)

> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL

2016-07-05 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362286#comment-15362286
 ] 

Bing Li commented on HIVE-14156:


Hi, [~xiaobingo]
I noticed that you fixed HIVE-8550 on windows, and mentioned that it should 
work on Linux.
I ran the similar query but failed with MySQL.

In order to make it work, besides the changes in Hive schema script, I also 
need to update MySQL's configuration file which is my.cnf.

When you ran it on windows, did you change the configuraions for the database? 
Did you have a chance to run it on Linux as well?

Thank you.


> Problem with Chinese characters as partition value when using MySQL
> ---
>
> Key: HIVE-14156
> URL: https://issues.apache.org/jira/browse/HIVE-14156
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> Steps to reproduce:
> create table t1 (name string, age int) partitioned by (city string) row 
> format delimited fields terminated by ',';
> load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 
> partition (city='北京');
> The content of /tmp/chn-partition.txt:
> 小明,20
> 小红,15
> 张三,36
> 李四,50
> When check the partition value in MySQL, it shows ?? instead of "北京".
> When run "drop table t1", it will hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled

2016-05-18 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288963#comment-15288963
 ] 

Bing Li commented on HIVE-13384:


Refer to Drill-3413, we found the method to resolve this issue in the client 
side.
The key point is that to get the delegation token for the proxy user, and 
assign it to hive.metastore.token.signature.

I tried this method in two different scenario:
1. use the proxy user to initialize an object of HiveMetaStoreClient, which is 
mentioned in the description
2. access to Hive table in Pig via HCatalog

Here are the sample codes for above two scenarios:
1. use the proxy use to create HiveMetaStoreClient object

  UserGroupInformation loginUser = UserGroupInformation.getLoginUser ();  
// in this example, the loginUser is user hive

  // the "loginuser" impersonates user hdfs
  UserGroupInformation ugi = UserGroupInformation.createProxyUser ("hdfs", 
loginUser);

  // in this example, user hive is the super user
  // which will do the login with its keytab and principle
  // user hdfs is the proxyuser
  HiveMetaStoreClient realUserClient = new HiveMetaStoreClient(new 
HiveConf());  
 // get the delegation token for proxyuser hdfs, and the owner of this 
token is hdfs as well
  String delegationTokenStr = 
realUserClient.getDelegationToken("hdfs","hdfs");  
  realUserClient.close();

  String DELEGATION_TOKEN = "DelegationTokenForHiveMetaStoreServer";

  // create a delegation token object and add it to the given UGI
  Utils.setTokenStr(ugi, delegationTokenStr, DELEGATION_TOKEN);   

  ugi.doAs (new PrivilegedExceptionAction () {
public Void run () throws Exception
{
  hiveConf = new HiveConf ();
  hiveConf.set("hive.metastore.token.signature",DELEGATION_TOKEN);
  client = new HiveMetaStoreClient (hiveConf);
  return null;
}
  });

2. In Pig Java program

HiveConf hiveConf = new HiveConf();
HCatClient client = HCatClient.create(hiveConf);
UserGroupInformation ugi =
UserGroupInformation.createProxyUser(proxyUser, 
UserGroupInformation.getLoginUser());

   // get and set the delegation token
String tokenStrForm = client.getDelegationToken(proxyUser, proxyUser);
String DELEGATION_TOKEN = "DelegationTokenForHiveMetaStoreServer";
Utils.setTokenStr(ugi, tokenStrForm, DELEGATION_TOKEN);

Properties pigProp = new Properties();
pigProp.setProperty("hive.metastore.token.signature",DELEGATION_TOKEN );

client.close();

   // initialize pigServer with the pigProperty
PigServer pigServer = new PigServer(ExecType.MAPREDUCE, pigProp);

ugi.doAs(new PrivilegedExceptionAction() {
public Void run() throws Exception {

  loadJars(pigServer);   // customize method
  runQuery(pigServer);   // customize method

  return null;

}
});

> Failed to create HiveMetaStoreClient object with proxy user when Kerberos 
> enabled
> -
>
> Key: HIVE-13384
> URL: https://issues.apache.org/jira/browse/HIVE-13384
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>
> I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
> But found that it can't new a HiveMetaStoreClient object successfully via a 
> proxy using in Kerberos env.
> ===
> 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
> ==
> When I debugging on Hive, I found that the error came from open() method in 
> HiveMetaStoreClient class.
> Around line 406,
>  transport = UserGroupInformation.getCurrentUser().doAs(new 
> PrivilegedExceptionAction() {  //FAILED, because the current user 
> doesn't have the cridential
> But it will work if I change above line to
>  transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
> PrivilegedExceptionAction() {  //PASS
> I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
> submit a mapreduce job via Pig/HCatalog, it runs into the same issue again 
> when initialize the object via HCatalog.
> It would be better to f

[jira] [Resolved] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled

2016-05-18 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li resolved HIVE-13384.

Resolution: Won't Fix

> Failed to create HiveMetaStoreClient object with proxy user when Kerberos 
> enabled
> -
>
> Key: HIVE-13384
> URL: https://issues.apache.org/jira/browse/HIVE-13384
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>
> I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
> But found that it can't new a HiveMetaStoreClient object successfully via a 
> proxy using in Kerberos env.
> ===
> 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
> ==
> When I debugging on Hive, I found that the error came from open() method in 
> HiveMetaStoreClient class.
> Around line 406,
>  transport = UserGroupInformation.getCurrentUser().doAs(new 
> PrivilegedExceptionAction() {  //FAILED, because the current user 
> doesn't have the cridential
> But it will work if I change above line to
>  transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
> PrivilegedExceptionAction() {  //PASS
> I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
> submit a mapreduce job via Pig/HCatalog, it runs into the same issue again 
> when initialize the object via HCatalog.
> It would be better to fix this issue in Hive side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled

2016-05-18 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-13384:
--

Assignee: Bing Li

> Failed to create HiveMetaStoreClient object with proxy user when Kerberos 
> enabled
> -
>
> Key: HIVE-13384
> URL: https://issues.apache.org/jira/browse/HIVE-13384
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>
> I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
> But found that it can't new a HiveMetaStoreClient object successfully via a 
> proxy using in Kerberos env.
> ===
> 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
> ==
> When I debugging on Hive, I found that the error came from open() method in 
> HiveMetaStoreClient class.
> Around line 406,
>  transport = UserGroupInformation.getCurrentUser().doAs(new 
> PrivilegedExceptionAction() {  //FAILED, because the current user 
> doesn't have the cridential
> But it will work if I change above line to
>  transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
> PrivilegedExceptionAction() {  //PASS
> I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
> submit a mapreduce job via Pig/HCatalog, it runs into the same issue again 
> when initialize the object via HCatalog.
> It would be better to fix this issue in Hive side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-05-25 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-13850:
---
Affects Version/s: 1.2.1

> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-05-26 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-13850:
---
Description: 
We have an application which connect to HiveServer2 via JDBC.
In the application, it executes "INSERT INTO" query to the same table.

If there are a lot of users running the application at the same time. Some of 
the INSERT could fail.

The root cause is that in Hive.checkPaths(), it uses the following method to 
check the existing of the file. But if there are multiple inserts running in 
parallel, it will led to the conflict.

for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
counter++) {
  itemDest = new Path(destf, name + ("_copy_" + counter) + 
filetype);
}


The Error Message
===
In hive log,
org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met  
  
adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
23_642_2056172497900766879-3321/-ext-1/00_0 to 
hdfs://node:8020/apps/hive  
/warehouse/metadata.db/scalding_stats/00_0_copy_9014
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
2719)   
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
1645)  


In hadoop log, 
WARN  hdfs.StateChange (FSDirRenameOp.java: 
unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
1/00_0 to /apps/hive/warehouse/metadata.
db/scalding_stats/00_0_copy_9014 because destination exists

  was:
We have an application which connect to HiveServer2 via JDBC.
In the application, it executes "INSERT INTO" query to the same table.

If there are a lot of users running the application at the same time. Some of 
the INSERT could fail.

In hive log,
org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met  
  
adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
23_642_2056172497900766879-3321/-ext-1/00_0 to 
hdfs://node:8020/apps/hive  
/warehouse/metadata.db/scalding_stats/00_0_copy_9014
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
2719)   
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
1645)  


In hadoop log, 
WARN  hdfs.StateChange (FSDirRenameOp.java: 
unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
1/00_0 to /apps/hive/warehouse/metadata.
db/scalding_stats/00_0_copy_9014 because destination exists


> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/

[jira] [Updated] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-05-26 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-13850:
---
Attachment: HIVE-13850-1.2.1.patch

This patch file is based on Hive-1.2.1.
It will use the time stamp to name the data file under the table directory to 
avoid the conflict.

> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-13850-1.2.1.patch
>
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6091) Empty pipeout files are created for connection create/close

2015-09-10 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-6091:
--
Attachment: HIVE-6091.2.patch

Re-generated the patch based on the latest code in master branch

> Empty pipeout files are created for connection create/close
> ---
>
> Key: HIVE-6091
> URL: https://issues.apache.org/jira/browse/HIVE-6091
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Attachments: HIVE-6091.1.patch, HIVE-6091.2.patch, HIVE-6091.patch
>
>
> Pipeout files are created when a connection is established and removed only 
> when data was produced. Instead we should create them only when data has to 
> be fetched or remove them whether data is fetched or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null

2015-09-10 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-10495:
---
Attachment: HIVE-10495.2.patch

Re-generated the patch based on the latest master branch.

> Hive index creation code throws NPE if index table is null
> --
>
> Key: HIVE-10495
> URL: https://issues.apache.org/jira/browse/HIVE-10495
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-10495.1.patch, HIVE-10495.2.patch
>
>
> The stack trace would be:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> at java.lang.reflect.Method.invoke(Method.java:611)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
> at $Proxy9.add_index(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null

2015-09-10 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-10495:
---
Affects Version/s: 1.2.1

> Hive index creation code throws NPE if index table is null
> --
>
> Key: HIVE-10495
> URL: https://issues.apache.org/jira/browse/HIVE-10495
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-10495.1.patch, HIVE-10495.2.patch
>
>
> The stack trace would be:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> at java.lang.reflect.Method.invoke(Method.java:611)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
> at $Proxy9.add_index(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver

2015-09-10 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739929#comment-14739929
 ] 

Bing Li commented on HIVE-10982:


Hi, [~pxiong]
Yes, I will start to work on this soon.
Thank you.

> Customizable the value of  java.sql.statement.setFetchSize in Hive JDBC Driver
> --
>
> Key: HIVE-10982
> URL: https://issues.apache.org/jira/browse/HIVE-10982
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Critical
>
> The current JDBC driver for Hive hard-code the value of setFetchSize to 50, 
> which will be a bottleneck for performance.
> Pentaho filed this issue as  http://jira.pentaho.com/browse/PDI-11511, whose 
> status is open.
> Also it has discussion in 
> http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform
> http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2015-09-14 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-6990:
--
Attachment: HIVE-6990.5.patch

The patch is created based on the latest code in master branch

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-6990.1.patch, HIVE-6990.2.patch, HIVE-6990.3.patch, 
> HIVE-6990.4.patch, HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source)
> Reproduce steps:
> 1. set the following properties in hive-site.xml
>  
>   javax.jdo.mapping.Schema
>   HIVE
>  
>  
>   javax.jdo.option.ConnectionUserName
>   user1
>  
> 2. execute hive queries
> hive> create table mytbl ( key int, value string);
> hive> load data local inpath 'examples/files/kv1.txt' overwrite into table 
> mytbl;
> hive> select * from mytbl;
> hive> create view myview partitioned on (value) as select key, value from 
> mytbl where key=98;
> hive> alter view myview add partition (value='val_98') partition 
> (value='val_xyz');
> hive> alter view myview drop partition (value='val_xyz');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2015-09-14 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-6990:
--
Affects Version/s: 0.14.0
   1.2.1

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.14.0, 1.2.1
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-6990.1.patch, HIVE-6990.2.patch, HIVE-6990.3.patch, 
> HIVE-6990.4.patch, HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source)
> Reproduce steps:
> 1. set the following properties in hive-site.xml
>  
>   javax.jdo.mapping.Schema
>   HIVE
>  
>  
>   javax.jdo.option.ConnectionUserName
>   user1
>  
> 2. execute hive queries
> hive> create table mytbl ( key int, value string);
> hive> load data local inpath 'examples/files/kv1.txt' overwrite into table 
> mytbl;
> hive> select * from mytbl;
> hive> create view myview partitioned on (value) as select key, value from 
> mytbl where key=98;
> hive> alter view myview add partition (value='val_98') partition 
> (value='val_xyz');
> hive> alter view myview drop partition (value='val_xyz');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9169) UT: set hive.support.concurrency to true for spark UTs

2015-09-15 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-9169:
--
Assignee: (was: Bing Li)

> UT: set hive.support.concurrency to true for spark UTs
> --
>
> Key: HIVE-9169
> URL: https://issues.apache.org/jira/browse/HIVE-9169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: spark-branch
>Reporter: Thomas Friedrich
>Priority: Minor
>
> The test cases 
> lock1
> lock2
> lock3
> lock4 
> are failing because the flag hive.support.concurrency is set to false in the 
> hive-site.xml for the spark tests.
> This value was set to true in trunk with HIVE-1293 when these test cases were 
> introduced to Hive.
> After setting the value to true and generating the output files, the test 
> cases are successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver

2015-09-23 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-10982:
---
Attachment: HIVE-10982.1.patch

> Customizable the value of  java.sql.statement.setFetchSize in Hive JDBC Driver
> --
>
> Key: HIVE-10982
> URL: https://issues.apache.org/jira/browse/HIVE-10982
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Critical
> Attachments: HIVE-10982.1.patch
>
>
> The current JDBC driver for Hive hard-code the value of setFetchSize to 50, 
> which will be a bottleneck for performance.
> Pentaho filed this issue as  http://jira.pentaho.com/browse/PDI-11511, whose 
> status is open.
> Also it has discussion in 
> http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform
> http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver

2015-09-24 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905921#comment-14905921
 ] 

Bing Li commented on HIVE-10982:


The patch is created based on the latest code in master branch

> Customizable the value of  java.sql.statement.setFetchSize in Hive JDBC Driver
> --
>
> Key: HIVE-10982
> URL: https://issues.apache.org/jira/browse/HIVE-10982
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Critical
> Attachments: HIVE-10982.1.patch
>
>
> The current JDBC driver for Hive hard-code the value of setFetchSize to 50, 
> which will be a bottleneck for performance.
> Pentaho filed this issue as  http://jira.pentaho.com/browse/PDI-11511, whose 
> status is open.
> Also it has discussion in 
> http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform
> http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver

2015-09-24 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906075#comment-14906075
 ] 

Bing Li commented on HIVE-10982:


[~pxiong] and [~vgumashta], I have upload the patch and waiting for the reply 
from the community. Thank you.

> Customizable the value of  java.sql.statement.setFetchSize in Hive JDBC Driver
> --
>
> Key: HIVE-10982
> URL: https://issues.apache.org/jira/browse/HIVE-10982
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Critical
> Attachments: HIVE-10982.1.patch
>
>
> The current JDBC driver for Hive hard-code the value of setFetchSize to 50, 
> which will be a bottleneck for performance.
> Pentaho filed this issue as  http://jira.pentaho.com/browse/PDI-11511, whose 
> status is open.
> Also it has discussion in 
> http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform
> http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver

2015-09-29 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934744#comment-14934744
 ] 

Bing Li commented on HIVE-10982:


Hi, [~vgumashta]
Thank you for your comment.

Do you mean to invoke a new property to hive-site.xml, which will control the 
max size responded by HS2 at the same time?

Do you know the current control mechanism on HS2?




> Customizable the value of  java.sql.statement.setFetchSize in Hive JDBC Driver
> --
>
> Key: HIVE-10982
> URL: https://issues.apache.org/jira/browse/HIVE-10982
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Critical
> Attachments: HIVE-10982.1.patch
>
>
> The current JDBC driver for Hive hard-code the value of setFetchSize to 50, 
> which will be a bottleneck for performance.
> Pentaho filed this issue as  http://jira.pentaho.com/browse/PDI-11511, whose 
> status is open.
> Also it has discussion in 
> http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform
> http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver

2015-11-09 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996186#comment-14996186
 ] 

Bing Li commented on HIVE-10982:


Hi, [~alangates]
Thank you for your comment. 
Yes, I still want to be able to set this property via the connection URL.
I will rebase the patch soon.

Thank you.

> Customizable the value of  java.sql.statement.setFetchSize in Hive JDBC Driver
> --
>
> Key: HIVE-10982
> URL: https://issues.apache.org/jira/browse/HIVE-10982
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Critical
> Attachments: HIVE-10982.1.patch
>
>
> The current JDBC driver for Hive hard-code the value of setFetchSize to 50, 
> which will be a bottleneck for performance.
> Pentaho filed this issue as  http://jira.pentaho.com/browse/PDI-11511, whose 
> status is open.
> Also it has discussion in 
> http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform
> http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6963) Beeline logs are printing on the console

2015-03-26 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381700#comment-14381700
 ] 

Bing Li commented on HIVE-6963:
---

Hi, Chinna
Have you uploaded the latest patch?

I tried the patch attached in this Jira, and found:
1. In order to launch bin/beeline, I need to add the following jars to 
HADOOP_CLASSPATH in bin/ext/beeline.sh

hive/lib/hive-shims-0.23.jar
hive/lib/hive-shims-common-secure.jar
hive/lib/hive-shims-common.jar

2. The log file doesn't contain much info as the one for HiveCLI

in its log file, it only has the following lines:
[biadmin@bdvs1100 biadmin]$ cat hive.log
2015-02-13 06:53:50,145 INFO  jdbc.Utils (Utils.java:parseURL(285)) - Supplied 
authorities: bdvs1100.svl.ibm.com:1
2015-02-13 06:53:50,149 INFO  jdbc.Utils (Utils.java:parseURL(372)) - Resolved 
authority: bdvs1100.svl.ibm.com:1
2015-02-13 06:53:50,184 INFO  jdbc.HiveConnection 
(HiveConnection.java:openTransport(191)) - Will try to open client transport 
with JDBC Uri: jdbc:hive2://9.123.2.21:1


Are they known issue or worked as design?

Thank you.
- Bing

> Beeline logs are printing on the console
> 
>
> Key: HIVE-6963
> URL: https://issues.apache.org/jira/browse/HIVE-6963
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-6963.patch
>
>
> beeline logs are not redirected to the log file.
> If log is redirected to log file, only required information will print on the 
> console. 
> This way it is more easy to read the output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-04-27 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Attachment: HIVE-4577.4.patch

Re-create the patch file based on the latest code in trunk

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-04-27 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513892#comment-14513892
 ] 

Bing Li commented on HIVE-4577:
---

Hi, [~thejas]
I generated a new patch for this defect, also fixed a bug in my previous patch.
Could you help to review it?

Thank you!

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-04-27 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Affects Version/s: 1.1.0
   0.14.0
   0.13.1

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null

2015-04-27 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-10495:
---
Attachment: HIVE-10495.1.patch

The patch is created based on the latest trunk.

> Hive index creation code throws NPE if index table is null
> --
>
> Key: HIVE-10495
> URL: https://issues.apache.org/jira/browse/HIVE-10495
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-10495.1.patch
>
>
> The stack trace would be:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> at java.lang.reflect.Method.invoke(Method.java:611)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
> at $Proxy9.add_index(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-04-29 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518930#comment-14518930
 ] 

Bing Li commented on HIVE-4577:
---

The failure should not related to this patch.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10495) Hive index creation code throws NPE if index table is null

2015-04-29 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518904#comment-14518904
 ] 

Bing Li commented on HIVE-10495:


The failure should not related to this patch.

> Hive index creation code throws NPE if index table is null
> --
>
> Key: HIVE-10495
> URL: https://issues.apache.org/jira/browse/HIVE-10495
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-10495.1.patch
>
>
> The stack trace would be:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> at java.lang.reflect.Method.invoke(Method.java:611)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
> at $Proxy9.add_index(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null

2015-05-07 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-10495:
---
Attachment: (was: HIVE-10495.1.patch)

> Hive index creation code throws NPE if index table is null
> --
>
> Key: HIVE-10495
> URL: https://issues.apache.org/jira/browse/HIVE-10495
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> The stack trace would be:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> at java.lang.reflect.Method.invoke(Method.java:611)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
> at $Proxy9.add_index(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11201) HCatalog is ignoring user specified avro schema in the table definition

2015-07-07 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-11201:
---
Attachment: HIVE-11201.1.patch

The patch is created based on the latest code in master branch.

> HCatalog  is ignoring user specified avro schema in the table definition
> 
>
> Key: HIVE-11201
> URL: https://issues.apache.org/jira/browse/HIVE-11201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Critical
> Attachments: HIVE-11201.1.patch
>
>
> HCatalog  is ignoring user specified avro schema in the table definition , 
> instead generating its own avro based  from hive meta store. 
> By generating its own schema  will result in mismatch names.  For exmple Avro 
> fields name are Case Sensitive.  By generating it's own schema will  result 
> in incorrect schema written to the avro file , and result   select fail on 
> read.   And also Even if user specified schema does not allow null ,  when 
> data is written using Hcatalog , it will write a schema that will allow null. 
> For example in the table ,  user specified , all CAPITAL letters in the 
> schema , and record name as LINEITEM.  The schema should be written as it is. 
>  Instead Hcatalog ignores it and generated its own avro schema from the hive 
> table case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6091) Empty pipeout files are created for connection create/close

2015-07-13 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624536#comment-14624536
 ] 

Bing Li commented on HIVE-6091:
---

Seems that the patch has been merged into Hive from 0.13.0 via 
https://issues.apache.org/jira/browse/HIVE-4395

> Empty pipeout files are created for connection create/close
> ---
>
> Key: HIVE-6091
> URL: https://issues.apache.org/jira/browse/HIVE-6091
> Project: Hive
>  Issue Type: Bug
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Attachments: HIVE-6091.patch
>
>
> Pipeout files are created when a connection is established and removed only 
> when data was produced. Instead we should create them only when data has to 
> be fetched or remove them whether data is fetched or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6091) Empty pipeout files are created for connection create/close

2015-07-13 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-6091:
--
Attachment: HIVE-6091.1.patch

With this patch, the pipeout file could be deleted after the session closed.
I have tested it based on Hive 1.2.1

> Empty pipeout files are created for connection create/close
> ---
>
> Key: HIVE-6091
> URL: https://issues.apache.org/jira/browse/HIVE-6091
> Project: Hive
>  Issue Type: Bug
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Attachments: HIVE-6091.1.patch, HIVE-6091.patch
>
>
> Pipeout files are created when a connection is established and removed only 
> when data was produced. Instead we should create them only when data has to 
> be fetched or remove them whether data is fetched or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.

2015-07-15 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-3:
---
Affects Version/s: 1.2.1

> ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. 
> ---
>
> Key: HIVE-3
> URL: https://issues.apache.org/jira/browse/HIVE-3
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 1.2.1
> Environment: 
>Reporter: Shiroy Pigarez
>
> I was trying to perform some column statistics using hive as per the 
> documentation 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive 
> and was encountering the following errors:
> Seems like a bug. Can you look into this? Thanks in advance.
> -- HIVE table
> {noformat}
> hive> create table people_part(
> name string,
> address string) PARTITIONED BY (dob string, nationality varchar(2))
> row format delimited fields terminated by '\t';
> {noformat}
> --Analyze table with partition dob and nationality with FOR COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 1:95 cannot recognize input near '' '' 
> '' in column name
> {noformat}
> --Analyze table with partition dob and nationality values specified with FOR 
> COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
> 

[jira] [Updated] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.

2015-07-15 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-3:
---
Priority: Critical  (was: Major)

> ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. 
> ---
>
> Key: HIVE-3
> URL: https://issues.apache.org/jira/browse/HIVE-3
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 1.2.1
> Environment: 
>Reporter: Shiroy Pigarez
>Priority: Critical
>
> I was trying to perform some column statistics using hive as per the 
> documentation 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive 
> and was encountering the following errors:
> Seems like a bug. Can you look into this? Thanks in advance.
> -- HIVE table
> {noformat}
> hive> create table people_part(
> name string,
> address string) PARTITIONED BY (dob string, nationality varchar(2))
> row format delimited fields terminated by '\t';
> {noformat}
> --Analyze table with partition dob and nationality with FOR COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 1:95 cannot recognize input near '' '' 
> '' in column name
> {noformat}
> --Analyze table with partition dob and nationality values specified with FOR 
> COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Dr

[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.

2015-07-15 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1462#comment-1462
 ] 

Bing Li commented on HIVE-3:


Hi, [~pxiong] and [~shiroy]
I tried this scenario on Hive 1.2.1. And found it could work for a table stored 
as TEXTFILE, but cant NOT work for the one stored as PARQUET.

Errors
==
Caused by: java.lang.IllegalArgumentException: Column [ds] was not found in 
schema!
at parquet.Preconditions.checkArgument(Preconditions.java:55)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:190)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:178)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:160)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:94)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:59)
at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:180)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:64)
at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:59)
at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:40)
at 
parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:126)
at 
parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:46)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:275)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:99)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:85)
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
... 16 more


Reproduced Queries
==
create table dummy (key string, value string) partitioned by (ds string, hr 
string);

load data local inpath 'kv1.txt' into table dummy partition (ds='2008',hr='12');
load data local inpath 'kv1.txt' into table dummy partition (ds='2008',hr='11');

select * from dummy;
analyze table dummy partition (ds='2008',hr='12') compute statistics for 
columns key;


create table dummy2 (key string, value string) partitioned by (ds string, hr 
string)stored as parquet;
insert into table dummy2 partition (ds='2008',hr='12')
select key, value from dummy where (ds='2008');

select * from dummy2;
analyze table dummy2 partition(ds='2008') compute statistics for columns key;

> ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. 
> ---
>
> Key: HIVE-3
> URL: https://issues.apache.org/jira/browse/HIVE-3
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 1.2.1
> Environment: 
>Reporter: Shiroy Pigarez
>Priority: Critical
>
> I was trying to perform some column statistics using hive as per the 
> documentation 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive 
> and was encountering the following errors:
> Seems like a bug. Can you look into this? Thanks in advance.
> -- HIVE table
> {noformat}
> hive> create table people_part(
> name string,
> address string) PARTITIONED BY (dob string, nationality varchar(2))
> row format delimited fields terminated by '\t';
> {noformat}
> --Analyze table with partition dob and nationality with FOR COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hado

[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.

2015-07-19 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633017#comment-14633017
 ] 

Bing Li commented on HIVE-3:


Hi, @Pengcheng Xiong
What's the value of hive.optimize.ppd in your cluster?
I can run into the error if I set it to true.

> ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. 
> ---
>
> Key: HIVE-3
> URL: https://issues.apache.org/jira/browse/HIVE-3
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 1.2.1
> Environment: 
>Reporter: Shiroy Pigarez
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> I was trying to perform some column statistics using hive as per the 
> documentation 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive 
> and was encountering the following errors:
> Seems like a bug. Can you look into this? Thanks in advance.
> -- HIVE table
> {noformat}
> hive> create table people_part(
> name string,
> address string) PARTITIONED BY (dob string, nationality varchar(2))
> row format delimited fields terminated by '\t';
> {noformat}
> --Analyze table with partition dob and nationality with FOR COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 1:95 cannot recognize input near '' '' 
> '' in column name
> {noformat}
> --Analyze table with partition dob and nationality values specified with FOR 
> COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hadoop.h

[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.

2015-07-20 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633635#comment-14633635
 ] 

Bing Li commented on HIVE-3:


Hi, [~pxiong]
Thank you for your quick response.
Yes, I tried the queries on two different cluster. And both of them ran into 
this error in 

analyze table dummy2 partition(ds='2008') compute statistics for columns key;

Then I tried to set hive.optimize.ppd to false, it would work, but got a bad 
performance.

Do you have some idea that which classes may lead into it?

Thank you!

> ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. 
> ---
>
> Key: HIVE-3
> URL: https://issues.apache.org/jira/browse/HIVE-3
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 1.2.1
> Environment: 
>Reporter: Shiroy Pigarez
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> I was trying to perform some column statistics using hive as per the 
> documentation 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive 
> and was encountering the following errors:
> Seems like a bug. Can you look into this? Thanks in advance.
> -- HIVE table
> {noformat}
> hive> create table people_part(
> name string,
> address string) PARTITIONED BY (dob string, nationality varchar(2))
> row format delimited fields terminated by '\t';
> {noformat}
> --Analyze table with partition dob and nationality with FOR COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 1:95 cannot recognize input near '' '' 
> '' in column name
> {noformat}
> --Analyze table with partition dob and nationality values specified with FOR 
> COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.Hive

[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.

2015-07-20 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634338#comment-14634338
 ] 

Bing Li commented on HIVE-3:


Hi, [~pxiong]
I didn't run the query for "people_part".

What I ran list in my previous comment with "Reproduced Queries".

In the "reproduced queries", I tried two different types of tables, one is 
TEXTFILE, and the other is PARQUET.

The ANALYZE command on TEXTFILE was passed, while failed on PARQUET table with 
the error.


analyze table dummy partition (ds='2008',hr='12') compute statistics for 
columns key;// PASS
analyze table dummy2 partition(ds='2008') compute statistics for columns key;  
//FAILED


Then I tried to disable hive.optimize.ppd, set its value to false. Then the 
following query could work without any error

analyze table dummy2 partition(ds='2008') compute statistics for columns key;  

> ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. 
> ---
>
> Key: HIVE-3
> URL: https://issues.apache.org/jira/browse/HIVE-3
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 1.2.1
> Environment: 
>Reporter: Shiroy Pigarez
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> I was trying to perform some column statistics using hive as per the 
> documentation 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive 
> and was encountering the following errors:
> Seems like a bug. Can you look into this? Thanks in advance.
> -- HIVE table
> {noformat}
> hive> create table people_part(
> name string,
> address string) PARTITIONED BY (dob string, nationality varchar(2))
> row format delimited fields terminated by '\t';
> {noformat}
> --Analyze table with partition dob and nationality with FOR COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 1:95 cannot recognize input near '' '' 
> '' in column name
> {noformat}
> --Analyze table with partition dob and nationality values specified with FOR 
> COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
>   

[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.

2015-07-21 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634801#comment-14634801
 ] 

Bing Li commented on HIVE-3:


Thank you, [~tfriedr]
With your fix in HIVE-11326, all the queries could work now.

> ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. 
> ---
>
> Key: HIVE-3
> URL: https://issues.apache.org/jira/browse/HIVE-3
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 1.2.1
> Environment: 
>Reporter: Shiroy Pigarez
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> I was trying to perform some column statistics using hive as per the 
> documentation 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive 
> and was encountering the following errors:
> Seems like a bug. Can you look into this? Thanks in advance.
> -- HIVE table
> {noformat}
> hive> create table people_part(
> name string,
> address string) PARTITIONED BY (dob string, nationality varchar(2))
> row format delimited fields terminated by '\t';
> {noformat}
> --Analyze table with partition dob and nationality with FOR COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 1:95 cannot recognize input near '' '' 
> '' in column name
> {noformat}
> --Analyze table with partition dob and nationality values specified with FOR 
> COLUMNS
> {noformat}
> hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') 
> COMPUTE STATISTICS FOR COLUMNS;
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDri

[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-09-08 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736074#comment-14736074
 ] 

Bing Li commented on HIVE-4577:
---

I submitted a review request manually.
The link is https://reviews.apache.org/r/38199/

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11201) HCatalog is ignoring user specified avro schema in the table definition

2015-09-08 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736076#comment-14736076
 ] 

Bing Li commented on HIVE-11201:


I submitted the review request manually.
The link is https://reviews.apache.org/r/34877/

> HCatalog  is ignoring user specified avro schema in the table definition
> 
>
> Key: HIVE-11201
> URL: https://issues.apache.org/jira/browse/HIVE-11201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Critical
> Attachments: HIVE-11201.1.patch
>
>
> HCatalog  is ignoring user specified avro schema in the table definition , 
> instead generating its own avro based  from hive meta store. 
> By generating its own schema  will result in mismatch names.  For exmple Avro 
> fields name are Case Sensitive.  By generating it's own schema will  result 
> in incorrect schema written to the avro file , and result   select fail on 
> read.   And also Even if user specified schema does not allow null ,  when 
> data is written using Hcatalog , it will write a schema that will allow null. 
> For example in the table ,  user specified , all CAPITAL letters in the 
> schema , and record name as LINEITEM.  The schema should be written as it is. 
>  Instead Hcatalog ignores it and generated its own avro schema from the hive 
> table case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036565#comment-16036565
 ] 

Bing Li commented on HIVE-16573:


Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS);

Do you think is ok?


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16573 started by Bing Li.
--
> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036565#comment-16036565
 ] 

Bing Li edited comment on HIVE-16573 at 6/5/17 5:51 AM:


Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
{quote}
String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS);
{quote}

Do you think is ok?



was (Author: libing):
Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS);

Do you think is ok?


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036565#comment-16036565
 ] 

Bing Li edited comment on HIVE-16573 at 6/5/17 5:52 AM:


Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
{quote}
{{ String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); }}
{quote}

Do you think is ok?



was (Author: libing):
Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
{quote}
String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS);
{quote}

Do you think is ok?


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036565#comment-16036565
 ] 

Bing Li edited comment on HIVE-16573 at 6/5/17 5:53 AM:


Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
{quote}
String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS);
{quote}

Do you think is ok?



was (Author: libing):
Hi, [~ruili] and [~anishek]
Seems that we can't import class SessionState into InPlaceUpdate.java, it will 
cause module cycles error during compiling, which is 
hive-common->hive-exec->hive-common.

I changed it as below:
{quote}
{{ String engine = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE);
boolean inPlaceUpdates = false;

if (engine.equals("tez"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS);

if (engine.equals("spark"))
  inPlaceUpdates = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); }}
{quote}

Do you think is ok?


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Attachment: HIVE-16573-branch2.3.patch

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573-branch2.3.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Attachment: (was: HIVE-16573-branch2.3.patch)

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Attachment: HIVE-16573.1.patch

Generate the patch file based on master branch

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Status: Patch Available  (was: In Progress)

I verified this patch, it could work for spark engine on HiveCLI.

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037924#comment-16037924
 ] 

Bing Li commented on HIVE-16573:


[~ruili] and [~anishek], thank you for your review.
I just submitted the patch.


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16800) Hive Metastore configuration with Mysql

2017-06-07 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16800:
--

Assignee: Bing Li

> Hive Metastore configuration with Mysql
> ---
>
> Key: HIVE-16800
> URL: https://issues.apache.org/jira/browse/HIVE-16800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.2
>Reporter: Vigneshwaran
>Assignee: Bing Li
>
> I'm trying to configure MySql as metastore in Hive 1.2.2 by following the 
> link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm 
> trying to run hive after all the step I'm getting the below errors:
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
> Caused by: java.lang.reflect.InvocationTargetException
> Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting 
> persistence propertiesNestedThrowables:



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16614) Support "set local time zone" statement

2017-06-07 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16614:
--

Assignee: Bing Li

> Support "set local time zone" statement
> ---
>
> Key: HIVE-16614
> URL: https://issues.apache.org/jira/browse/HIVE-16614
> Project: Hive
>  Issue Type: Improvement
>Reporter: Carter Shanklin
>Assignee: Bing Li
>
> HIVE-14412 introduces a timezone-aware timestamp.
> SQL has a concept of default time zone displacements, which are transparently 
> applied when converting between timezone-unaware types and timezone-aware 
> types and, in Hive's case, are also used to shift a timezone aware type to a 
> different time zone, depending on configuration.
> SQL also provides that the default time zone displacement be settable at a 
> session level, so that clients can access a database simultaneously from 
> different time zones and see time values in their own time zone.
> Currently the time zone displacement is fixed and is set based on the system 
> time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be 
> more convenient for users if they have the ability to set their time zone of 
> choice.
> SQL defines "set time zone" with 2 ways of specifying the time zone, first 
> using an interval and second using the special keyword LOCAL.
> Examples:
>   • set time zone '-8:00';
>   • set time zone LOCAL;
> LOCAL means to set the current default time zone displacement to the 
> session's original default time zone displacement.
> Reference: SQL:2011 section 19.4



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16800) Hive Metastore configuration with Mysql

2017-06-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16800 started by Bing Li.
--
> Hive Metastore configuration with Mysql
> ---
>
> Key: HIVE-16800
> URL: https://issues.apache.org/jira/browse/HIVE-16800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.2
>Reporter: Vigneshwaran
>Assignee: Bing Li
>
> I'm trying to configure MySql as metastore in Hive 1.2.2 by following the 
> link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm 
> trying to run hive after all the step I'm getting the below errors:
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
> Caused by: java.lang.reflect.InvocationTargetException
> Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting 
> persistence propertiesNestedThrowables:



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16800) Hive Metastore configuration with Mysql

2017-06-08 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042352#comment-16042352
 ] 

Bing Li commented on HIVE-16800:


Hi, Vigneshwaran
I think the document you referred to is out-of-date.

Please try the following steps in your cluster (using the commands for RHEL as 
an example):
1. Install MySQL
yum -y install mysql-server mysql mysql-devel

2. Start MySQL
/etc/init.d/mysqld start

3. Link or copy mysql-connector-java.jar to hive/lib

4. Set configurations in hive-site.xml
javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
javax.jdo.option.ConnectionURL=jdbc:mysql://myhost.com/hive?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine

5. Prepare database for HiveMetastore in MySQL
mysql>create database hive;
mysql> grant all on hive.* to 'APP'@'myhost.com' identified by 'mine';

6. Verification on MySQL
mysql -u APP -h myhost.com -p
Type with "mine" as the password

7. Run Hive SchemaTool
hive/bin/schematool -dbType mysql -initSchema

8. Start HiveMetastore
hive/bin/hive --service metastore

> Hive Metastore configuration with Mysql
> ---
>
> Key: HIVE-16800
> URL: https://issues.apache.org/jira/browse/HIVE-16800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.2
>Reporter: Vigneshwaran
>Assignee: Bing Li
>
> I'm trying to configure MySql as metastore in Hive 1.2.2 by following the 
> link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm 
> trying to run hive after all the step I'm getting the below errors:
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
> Caused by: java.lang.reflect.InvocationTargetException
> Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting 
> persistence propertiesNestedThrowables:



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-16800) Hive Metastore configuration with Mysql

2017-06-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li resolved HIVE-16800.

Resolution: Not A Bug

> Hive Metastore configuration with Mysql
> ---
>
> Key: HIVE-16800
> URL: https://issues.apache.org/jira/browse/HIVE-16800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.2
>Reporter: Vigneshwaran
>Assignee: Bing Li
>
> I'm trying to configure MySql as metastore in Hive 1.2.2 by following the 
> link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm 
> trying to run hive after all the step I'm getting the below errors:
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
> Caused by: java.lang.reflect.InvocationTargetException
> Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting 
> persistence propertiesNestedThrowables:



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-08 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042361#comment-16042361
 ] 

Bing Li commented on HIVE-16659:


Hi, [~ruili]
Could I take it over?

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16659:
--

Assignee: Bing Li  (was: Rui Li)

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16615) Support Time Zone Specifiers (i.e. "at time zone X")

2017-06-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16615:
--

Assignee: Bing Li

> Support Time Zone Specifiers (i.e. "at time zone X")
> 
>
> Key: HIVE-16615
> URL: https://issues.apache.org/jira/browse/HIVE-16615
> Project: Hive
>  Issue Type: Improvement
>Reporter: Carter Shanklin
>Assignee: Bing Li
>
> HIVE-14412 introduces a timezone-aware timestamp.
> SQL has a concept of "time zone specifier" which applies to any datetime 
> value expression (which covers time/timestamp with and without timezones). 
> Hive lacks a time type so we can put that aside for a while.
> Examples:
>   a. select time_stamp_with_time_zone at time zone '-8:00';
>   b. select time_stamp_without_time_zone at time zone LOCAL;
> These statements would adjust the expression from its original timezone into 
> a known target timezone.
> Using  the time zone specifier results in a data type that has a time zone. 
> If the original expression lacked a time zone, the result has a time zone. If 
> the original expression had a time zone, the result still has a time zone, 
> possibly a different one.
> LOCAL means to use the session's original default time zone displacement.
> The standard says that dates are not supported with time zone specifiers. It 
> seems common to ignore this rule and allow this, by converting the date to a 
> timestamp and then applying the usual rule.
> The standard only requires an interval or the LOCAL keyword. Some databases 
> allow time zone identifiers like PST.
> Reference: SQL:2011 section 6.31



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16766) Hive query with space as filter does not give proper result

2017-06-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16766:
--

Assignee: Bing Li

> Hive query with space as filter does not give proper result
> ---
>
> Key: HIVE-16766
> URL: https://issues.apache.org/jira/browse/HIVE-16766
> Project: Hive
>  Issue Type: Bug
>Reporter: Subash
>Assignee: Bing Li
>Priority: Critical
>
> Hi Team,
> I have used the query as below format and it does not give proper results. 
> Since there is a split by \s+ in ExecuteStatementOperation class in line 48, 
> I feel something goes wrong there. Could help me with this, if i am wrong ? 
> I am using Hive JDBC version 1.1.0
> The sample query is as follows,
> select count(1) as cnt from table where col1=" " and col2="D";



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16936) wrong result with CTAS(create table as select)

2017-06-30 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16936:
--

Assignee: Bing Li

> wrong result with CTAS(create table as select)
> --
>
> Key: HIVE-16936
> URL: https://issues.apache.org/jira/browse/HIVE-16936
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Xiaomeng Huang
>Assignee: Bing Li
>Priority: Critical
>
> 1. 
> {code}
> hive> desc abc_test_old;
> OK
> did   string
> activetimeint
> {code}
> 2. 
> {code}
> hive> select 'test' as did from abc_test_old
> > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1;
> OK
> test  
> {code}
> result is 'test'
> 3. 
> {code}
> hive> create table abc_test_12345 as
> > select 'test' as did from abc_test_old
> > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1;
> hive> select did from abc_test_12345 limit 1;
> OK
> 5FCAFD34-C124-4E13-AF65-27B675C945CC 
> {code}
> result is '5FCAFD34-C124-4E13-AF65-27B675C945CC'
> why result is not 'test'?
> 4. 
> {code}
> hive> explain
> > create table abc_test_12345 as
> > select 'test' as did from abc_test_old
> > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, Stage-4
>   Stage-3
>   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5
>   Stage-7 depends on stages: Stage-0
>   Stage-2
>   Stage-4
>   Stage-5 depends on stages: Stage-4
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abc_test_old
> Statistics: Num rows: 32 Data size: 1152 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: (did = '5FCAFD34-C124-4E13-AF65-27B675C945CC') 
> (type: boolean)
>   Statistics: Num rows: 16 Data size: 576 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> Statistics: Num rows: 16 Data size: 576 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 1
>   Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 36 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Operator Tree:
> Select Operator
>   expressions: '5FCAFD34-C124-4E13-AF65-27B675C945CC' (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Limit
> Number of rows: 1
> Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: true
>   Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>   serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>   name: default.abc_test_12345
> ..
> {code}
> why expressions is '5FCAFD34-C124-4E13-AF65-27B675C945CC'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-06-30 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16907:
--

Assignee: Bing Li

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE |
> | table:  
> 

[jira] [Work started] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-30 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16659 started by Bing Li.
--
> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-30 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16659:
---
Attachment: HIVE-16659.1.patch

This patch is based on branch-2.3.
With the above changes, I could get the explain result as below.

_hive> {color:#205081}set hive.spark.use.groupby.shuffle=true;{color}
hive> explain select key, count(val) from t1 group by key;_
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:red}Reducer 2 <- Map 1 (GROUP, 2){color}
  DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

Time taken: 51.289 seconds, Fetched: 54 row(s)

_hive> {color:#205081}set hive.spark.use.groupby.shuffle=false{color};
hive> explain select key, count(val) from t1 group by key;_
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:#205081}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color}
  DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
  tabl

[jira] [Comment Edited] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-30 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070235#comment-16070235
 ] 

Bing Li edited comment on HIVE-16659 at 6/30/17 3:10 PM:
-

This patch is based on branch-2.3.
With the above changes, I could get the explain result as below.

hive> {color:#d04437}set hive.spark.use.groupby.shuffle=true;{color}
hive> explain select key, count(val) from t1 group by 
key;{color:#d04437}colored text{color}
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:red}Reducer 2 <- Map 1 (GROUP, 2){color}
  DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

Time taken: 51.289 seconds, Fetched: 54 row(s)

hive> {color:#d04437}set hive.spark.use.groupby.shuffle=false;{color}
hive> explain select key, count(val) from t1 group by key;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:#d04437}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color}
  DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  S

[jira] [Comment Edited] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-30 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070235#comment-16070235
 ] 

Bing Li edited comment on HIVE-16659 at 6/30/17 3:11 PM:
-

This patch is based on branch-2.3.
With the above changes, I could get the explain result as below.

hive> {color:red}set hive.spark.use.groupby.shuffle=true;{color}
hive> explain select key, count(val) from t1 group by key;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:red}Reducer 2 <- Map 1 (GROUP, 2){color}
  DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

Time taken: 51.289 seconds, Fetched: 54 row(s)

hive> {color:red}set hive.spark.use.groupby.shuffle=false;{color}
hive> explain select key, count(val) from t1 group by key;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
{color:red}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color}
  DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: t1
  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: key (type: int), val (type: string)
outputColumnNames: key, val
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count(val)
  keys: key (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 20 Data size: 140 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 2
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 70 Basic sta

[jira] [Updated] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-06-30 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16659:
---
Status: Patch Available  (was: In Progress)

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
> Attachments: HIVE-16659.1.patch
>
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16766) Hive query with space as filter does not give proper result

2017-06-30 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071054#comment-16071054
 ] 

Bing Li commented on HIVE-16766:


Hi, Subash
Which Hive version did you use? Could you post the reproduce queries as well? 

I tried it on a Hive package built from branch-2.3, and it worked for me.

My Testing
==
*hive> describe test;*
OK
col1string
col2string
Time taken: 0.057 seconds, Fetched: 2 row(s)
*hive> select * from test;*
OK
a1  a2
b1  b2
c1  c2
D
Time taken: 0.22 seconds, Fetched: 4 row(s)

*hive> select count(1) as cnt from test where col1="" and col2="D";*
Query ID = root_20170630235239_b58b7dbc-14ef-4126-b56b-fdcf187acc09
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Spark Job = f25577ce-2ed6-4c5c-a64a-6ff7419ab778
--
  STAGES   ATTEMPTSTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED
--
Stage-5  0  FINISHED  1  100
   0
Stage-6  0  FINISHED  1  100
   0
--
STAGES: 02/02[==>>] 100%  ELAPSED TIME: 1.01 s
--
Status: Finished successfully in 1.01 seconds
OK
1
Time taken: 1.436 seconds, Fetched: 1 row(s)

> Hive query with space as filter does not give proper result
> ---
>
> Key: HIVE-16766
> URL: https://issues.apache.org/jira/browse/HIVE-16766
> Project: Hive
>  Issue Type: Bug
>Reporter: Subash
>Assignee: Bing Li
>Priority: Critical
>
> Hi Team,
> I have used the query as below format and it does not give proper results. 
> Since there is a split by \s+ in ExecuteStatementOperation class in line 48, 
> I feel something goes wrong there. Could help me with this, if i am wrong ? 
> I am using Hive JDBC version 1.1.0
> The sample query is as follows,
> select count(1) as cnt from table where col1=" " and col2="D";



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17004) Calculating Number Of Reducers Looks At All Files

2017-07-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-17004:
--

Assignee: Bing Li

> Calculating Number Of Reducers Looks At All Files
> -
>
> Key: HIVE-17004
> URL: https://issues.apache.org/jira/browse/HIVE-17004
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: BELUGA BEHR
>Assignee: Bing Li
>
> When calculating the number of Mappers and Reducers, the two algorithms are 
> looking at different data sets.  The number of Mappers are calculated based 
> on the number of splits and the number of Reducers are based on the number of 
> files within the HDFS directory.  What you see is that if I add files to a 
> sub-directory of the HDFS directory, the number of splits remains the same 
> since I did not tell Hive to search recursively, and the number of Reducers 
> increases.  Please improve this so that Reducers are looking at the same 
> files that are considered for splits and not at files within sub-directories 
> (unless configured to do so).
> {code}
> CREATE EXTERNAL TABLE Complaints (
>   a string,
>   b string,
>   c string,
>   d string,
>   e string,
>   f string,
>   g string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> LOCATION '/user/admin/complaints';
> {code}
> {code}
> [root@host ~]# sudo -u hdfs hdfs dfs -ls -R /user/admin/complaints
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.1.csv
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.2.csv
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.3.csv
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.4.csv
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.5.csv
> -rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
> /user/admin/complaints/Consumer_Complaints.csv
> {code}
> {code}
> INFO  : Compiling 
> command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae): 
> select a, count(1) from complaints group by a limit 10
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:a, 
> type:string, comment:null), FieldSchema(name:_c1, type:bigint, 
> comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae); 
> Time taken: 0.077 seconds
> INFO  : Executing 
> command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae): 
> select a, count(1) from complaints group by a limit 10
> INFO  : Query ID = hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 11
> INFO  : In order to change the average load for a reducer (in bytes):
> INFO  :   set hive.exec.reducers.bytes.per.reducer=
> INFO  : In order to limit the maximum number of reducers:
> INFO  :   set hive.exec.reducers.max=
> INFO  : In order to set a constant number of reducers:
> INFO  :   set mapreduce.job.reduces=
> INFO  : number of splits:2
> INFO  : Submitting tokens for job: job_1493729203063_0003
> INFO  : The url to track the job: 
> http://host:8088/proxy/application_1493729203063_0003/
> INFO  : Starting Job = job_1493729203063_0003, Tracking URL = 
> http://host:8088/proxy/application_1493729203063_0003/
> INFO  : Kill Command = 
> /opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/lib/hadoop/bin/hadoop job  
> -kill job_1493729203063_0003
> INFO  : Hadoop job information for Stage-1: number of mappers: 2; number of 
> reducers: 11
> INFO  : 2017-05-02 14:20:14,206 Stage-1 map = 0%,  reduce = 0%
> INFO  : 2017-05-02 14:20:22,520 Stage-1 map = 100%,  reduce = 0%, Cumulative 
> CPU 4.48 sec
> INFO  : 2017-05-02 14:20:34,029 Stage-1 map = 100%,  reduce = 27%, Cumulative 
> CPU 15.72 sec
> INFO  : 2017-05-02 14:20:35,069 Stage-1 map = 100%,  reduce = 55%, Cumulative 
> CPU 21.94 sec
> INFO  : 2017-05-02 14:20:36,110 Stage-1 map = 100%,  reduce = 64%, Cumulative 
> CPU 23.97 sec
> INFO  : 2017-05-02 14:20:39,233 Stage-1 map = 100%,  reduce = 73%, Cumulative 
> CPU 25.26 sec
> INFO  : 2017-05-02 14:20:43,392 Stage-1 map = 100%,  reduce = 100%, 
> Cumulative CPU 30.9 sec
> INFO  : MapReduce Total cumulative CPU time: 30 seconds 900 msec
> INFO  : Ended Job = job_1493729203063_0003
> INFO  : MapReduce Jobs Launched: 
> INFO  : Stage-Stage-1: Map: 2  Reduce: 11   Cumulative CPU: 30.9 sec   HDFS 
> Read: 735691149 HDFS Write: 153 SUCCESS
> INFO  : Total MapRe

[jira] [Work started] (HIVE-16766) Hive query with space as filter does not give proper result

2017-07-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16766 started by Bing Li.
--
> Hive query with space as filter does not give proper result
> ---
>
> Key: HIVE-16766
> URL: https://issues.apache.org/jira/browse/HIVE-16766
> Project: Hive
>  Issue Type: Bug
>Reporter: Subash
>Assignee: Bing Li
>Priority: Critical
>
> Hi Team,
> I have used the query as below format and it does not give proper results. 
> Since there is a split by \s+ in ExecuteStatementOperation class in line 48, 
> I feel something goes wrong there. Could help me with this, if i am wrong ? 
> I am using Hive JDBC version 1.1.0
> The sample query is as follows,
> select count(1) as cnt from table where col1=" " and col2="D";



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-07-01 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071514#comment-16071514
 ] 

Bing Li commented on HIVE-16659:


Hi, [~ruili]
Thank you for the review!
I checked the latest code on master branch, the current patch could be applied 
to it directly.
So I won't create a new patch file for the master branch for this Jira.
I will pay attention to it in the future.

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
> Attachments: HIVE-16659.1.patch
>
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11019) Can't create an Avro table with uniontype column correctly

2017-07-01 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071516#comment-16071516
 ] 

Bing Li commented on HIVE-11019:


It could work on branch-2.3.
Closed it.

hive> create table avro_union(union1 uniontype)STORED 
AS AVRO;
OK
Time taken: 2.04 seconds
hive> describe avro_union;
OK
union1  uniontype
Time taken: 0.165 seconds, Fetched: 1 row(s)

> Can't create an Avro table with uniontype column correctly
> --
>
> Key: HIVE-11019
> URL: https://issues.apache.org/jira/browse/HIVE-11019
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> I tried the example in 
> https://cwiki.apache.org/confluence/display/Hive/AvroSerDe
> And found that it can't create an AVRO table correctly with uniontype
> hive> create table avro_union(union1 uniontype)STORED 
> AS AVRO;
> OK
> Time taken: 0.083 seconds
> hive> describe avro_union;
> OK
> union1  uniontype  
>   
> Time taken: 0.058 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16950) Dropping hive database/table which was created explicitly in default database location, deletes all databases data from default database location

2017-07-01 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16950:
--

Assignee: Bing Li

> Dropping hive database/table which was created explicitly in default database 
> location, deletes all databases data from default database location
> -
>
> Key: HIVE-16950
> URL: https://issues.apache.org/jira/browse/HIVE-16950
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Rahul Kalgunde
>Assignee: Bing Li
>Priority: Minor
>
> When database/table is created explicitly pointing to the default location, 
> dropping the database/table deletes all the data associated with the all 
> databases/tables.
> Steps to replicate: 
> in below e.g. dropping table test_db2 also deletes data of test_db1 where as 
> metastore still contains test_db1
> hive> create database test_db1;
> OK
> Time taken: 4.858 seconds
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.599 seconds, Fetched: 1 row(s)
> hive> create database test_db2 location '/apps/hive/warehouse' ;
> OK
> Time taken: 1.457 seconds
> hive> describe database test_db2;
> OK
> test_db2
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse rootUSER
> Time taken: 0.582 seconds, Fetched: 1 row(s)
> hive> drop database test_db2;
> OK
> Time taken: 1.317 seconds
> hive> dfs -ls /apps/hive/warehouse;
> ls: `/apps/hive/warehouse': No such file or directory
> Command failed with exit code = 1
> Query returned non-zero code: 1, cause: null
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.629 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-11019) Can't create an Avro table with uniontype column correctly

2017-07-02 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li resolved HIVE-11019.

Resolution: Resolved

> Can't create an Avro table with uniontype column correctly
> --
>
> Key: HIVE-11019
> URL: https://issues.apache.org/jira/browse/HIVE-11019
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> I tried the example in 
> https://cwiki.apache.org/confluence/display/Hive/AvroSerDe
> And found that it can't create an AVRO table correctly with uniontype
> hive> create table avro_union(union1 uniontype)STORED 
> AS AVRO;
> OK
> Time taken: 0.083 seconds
> hive> describe avro_union;
> OK
> union1  uniontype  
>   
> Time taken: 0.058 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16906) Hive ATSHook should check for yarn.timeline-service.enabled before connecting to ATS

2017-07-02 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16906:
--

Assignee: Bing Li

> Hive ATSHook should check for yarn.timeline-service.enabled before connecting 
> to ATS
> 
>
> Key: HIVE-16906
> URL: https://issues.apache.org/jira/browse/HIVE-16906
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.2
>Reporter: Prabhu Joseph
>Assignee: Bing Li
>
> Hive ATShook has to check yarn.timeline-service.enabled (Indicate to clients 
> whether timeline service is enabled or not. If enabled, clients will put 
> entities and events to the timeline server.) before creating TimelineClient 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16950) Dropping hive database/table which was created explicitly in default database location, deletes all databases data from default database location

2017-07-03 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071999#comment-16071999
 ] 

Bing Li commented on HIVE-16950:


>From the description, the requirement is more like an EXTERNAL database which 
>has NOT been supported by Hive yet.

But I think we could add some check when create/drop database to avoid this 
issue.
There would be two ways to do this:
1. Throw an error when the target location on HDFS already exists.
An existing empty directory is invalid as well. Because currently, Hive allows 
to create two databases with the same location.
2. ONLY drop the tables belong to the target database.
With this purpose, we should get all the tables under this database when DROP 
DATABASE is invoked. 
But it would affect the performance of DROP statement.

I prefer the #1. [~ashutoshc], any comments on this?  Thank you.


> Dropping hive database/table which was created explicitly in default database 
> location, deletes all databases data from default database location
> -
>
> Key: HIVE-16950
> URL: https://issues.apache.org/jira/browse/HIVE-16950
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Rahul Kalgunde
>Assignee: Bing Li
>Priority: Minor
>
> When database/table is created explicitly pointing to the default location, 
> dropping the database/table deletes all the data associated with the all 
> databases/tables.
> Steps to replicate: 
> in below e.g. dropping table test_db2 also deletes data of test_db1 where as 
> metastore still contains test_db1
> hive> create database test_db1;
> OK
> Time taken: 4.858 seconds
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.599 seconds, Fetched: 1 row(s)
> hive> create database test_db2 location '/apps/hive/warehouse' ;
> OK
> Time taken: 1.457 seconds
> hive> describe database test_db2;
> OK
> test_db2
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse rootUSER
> Time taken: 0.582 seconds, Fetched: 1 row(s)
> hive> drop database test_db2;
> OK
> Time taken: 1.317 seconds
> hive> dfs -ls /apps/hive/warehouse;
> ls: `/apps/hive/warehouse': No such file or directory
> Command failed with exit code = 1
> Query returned non-zero code: 1, cause: null
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.629 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16950) Dropping hive database/table which was created explicitly in default database location, deletes all databases data from default database location

2017-07-03 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16950 started by Bing Li.
--
> Dropping hive database/table which was created explicitly in default database 
> location, deletes all databases data from default database location
> -
>
> Key: HIVE-16950
> URL: https://issues.apache.org/jira/browse/HIVE-16950
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Rahul Kalgunde
>Assignee: Bing Li
>Priority: Minor
>
> When database/table is created explicitly pointing to the default location, 
> dropping the database/table deletes all the data associated with the all 
> databases/tables.
> Steps to replicate: 
> in below e.g. dropping table test_db2 also deletes data of test_db1 where as 
> metastore still contains test_db1
> hive> create database test_db1;
> OK
> Time taken: 4.858 seconds
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.599 seconds, Fetched: 1 row(s)
> hive> create database test_db2 location '/apps/hive/warehouse' ;
> OK
> Time taken: 1.457 seconds
> hive> describe database test_db2;
> OK
> test_db2
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse rootUSER
> Time taken: 0.582 seconds, Fetched: 1 row(s)
> hive> drop database test_db2;
> OK
> Time taken: 1.317 seconds
> hive> dfs -ls /apps/hive/warehouse;
> ls: `/apps/hive/warehouse': No such file or directory
> Command failed with exit code = 1
> Query returned non-zero code: 1, cause: null
> hive> describe database test_db1;
> OK
> test_db1
> hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root  
>   USER
> Time taken: 0.629 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-07-04 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16659:
---
Attachment: HIVE-16659.2.patch

Refine the patch with an test case.

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
> Attachments: HIVE-16659.1.patch, HIVE-16659.2.patch
>
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-07-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073343#comment-16073343
 ] 

Bing Li commented on HIVE-16659:


[~ruili], I updated the patch based on your comment and add the link of the 
review request. 
Thank you!

> Query plan should reflect hive.spark.use.groupby.shuffle
> 
>
> Key: HIVE-16659
> URL: https://issues.apache.org/jira/browse/HIVE-16659
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
> Attachments: HIVE-16659.1.patch, HIVE-16659.2.patch
>
>
> It's useful to show the shuffle type used in the query plan. Currently it 
> shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16766) Hive query with space as filter does not give proper result

2017-07-04 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071054#comment-16071054
 ] 

Bing Li edited comment on HIVE-16766 at 7/4/17 8:52 AM:


Hi, [~subashprabanantham]
Which Hive version did you use? Could you post the reproduce queries as well? 

I tried it on a Hive package built from branch-2.3, and it worked for me.

My Testing
==
*hive> describe test;*
OK
col1string
col2string
Time taken: 0.057 seconds, Fetched: 2 row(s)
*hive> select * from test;*
OK
a1  a2
b1  b2
c1  c2
D
Time taken: 0.22 seconds, Fetched: 4 row(s)

*hive> select count(1) as cnt from test where col1="" and col2="D";*
Query ID = root_20170630235239_b58b7dbc-14ef-4126-b56b-fdcf187acc09
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Spark Job = f25577ce-2ed6-4c5c-a64a-6ff7419ab778
--
  STAGES   ATTEMPTSTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED
--
Stage-5  0  FINISHED  1  100
   0
Stage-6  0  FINISHED  1  100
   0
--
STAGES: 02/02[==>>] 100%  ELAPSED TIME: 1.01 s
--
Status: Finished successfully in 1.01 seconds
OK
1
Time taken: 1.436 seconds, Fetched: 1 row(s)


was (Author: libing):
Hi, Subash
Which Hive version did you use? Could you post the reproduce queries as well? 

I tried it on a Hive package built from branch-2.3, and it worked for me.

My Testing
==
*hive> describe test;*
OK
col1string
col2string
Time taken: 0.057 seconds, Fetched: 2 row(s)
*hive> select * from test;*
OK
a1  a2
b1  b2
c1  c2
D
Time taken: 0.22 seconds, Fetched: 4 row(s)

*hive> select count(1) as cnt from test where col1="" and col2="D";*
Query ID = root_20170630235239_b58b7dbc-14ef-4126-b56b-fdcf187acc09
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Spark Job = f25577ce-2ed6-4c5c-a64a-6ff7419ab778
--
  STAGES   ATTEMPTSTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED
--
Stage-5  0  FINISHED  1  100
   0
Stage-6  0  FINISHED  1  100
   0
--
STAGES: 02/02[==>>] 100%  ELAPSED TIME: 1.01 s
--
Status: Finished successfully in 1.01 seconds
OK
1
Time taken: 1.436 seconds, Fetched: 1 row(s)

> Hive query with space as filter does not give proper result
> ---
>
> Key: HIVE-16766
> URL: https://issues.apache.org/jira/browse/HIVE-16766
> Project: Hive
>  Issue Type: Bug
>Reporter: Subash
>Assignee: Bing Li
>Priority: Critical
>
> Hi Team,
> I have used the query as below format and it does not give proper results. 
> Since there is a split by \s+ in ExecuteStatementOperation class in line 48, 
> I feel something goes wrong there. Could help me with this, if i am wrong ? 
> I am using Hive JDBC version 1.1.0
> The sample query is as follows,
> select count(1) as cnt from table where col1=" " and col2="D";



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >