[jira] [Commented] (HIVE-6865) Failed to load data into Hive from Pig using HCatStorer()
[ https://issues.apache.org/jira/browse/HIVE-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566964#comment-14566964 ] Bing Li commented on HIVE-6865: --- This issue has been resolved in Hive-1.2.0. > Failed to load data into Hive from Pig using HCatStorer() > - > > Key: HIVE-6865 > URL: https://issues.apache.org/jira/browse/HIVE-6865 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.12.0 >Reporter: Bing Li >Assignee: Bing Li > > Reproduce steps: > 1. create a hive table > hive> create table t1 (c1 int, c2 int, c3 int); > 2. start pig shell > grunt> register $HIVE_HOME/lib/*.jar > grunt> register $HIVE_HOME/hcatalog/share/hcatalog/*.jar > grunt> A = load 'pig.txt' as (c1:int, c2:int, c3:int) > grunt> store A into 't1' using org.apache.hive.hcatalog.HCatSrorer(); > Error Message: > ERROR [main] org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: > Unable to recreate exception from backend error: > org.apache.hcatalog.common.HCatException : 2004 : HCatOutputFormat not > initialized, setOutput has to be called > at > org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:111) > at > org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:97) > at > org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:85) > at > org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:75) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1000) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:963) > at > java.security.AccessController.doPrivileged(AccessController.java:310) > at javax.security.auth.Subject.doAs(Subject.java:573) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:963) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:616) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128) > at > org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191) > at java.lang.Thread.run(Thread.java:738) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null
[ https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-10495: --- Attachment: HIVE-10495.1.patch > Hive index creation code throws NPE if index table is null > -- > > Key: HIVE-10495 > URL: https://issues.apache.org/jira/browse/HIVE-10495 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0, 1.2.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-10495.1.patch > > > The stack trace would be: > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102) > at $Proxy9.add_index(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6727) Table level stats for external tables are set incorrectly
[ https://issues.apache.org/jira/browse/HIVE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-6727: -- Attachment: HIVE-6727.3.patch This patch is created based on the latest Hive source code and included the fix in test case. > Table level stats for external tables are set incorrectly > - > > Key: HIVE-6727 > URL: https://issues.apache.org/jira/browse/HIVE-6727 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.13.0, 0.13.1 >Reporter: Harish Butani >Assignee: Bing Li > Attachments: HIVE-6727.2.patch, HIVE-6727.3.patch > > > if you do the following: > {code} > CREATE EXTERNAL TABLE anaylyze_external (a INT) LOCATION > 'data/files/ext_test'; > describe formatted anaylyze_external; > {code} > The table level stats are: > {noformat} > Table Parameters: > COLUMN_STATS_ACCURATE true > EXTERNALTRUE > numFiles0 > numRows 6 > rawDataSize 6 > totalSize 0 > {noformat} > numFiles and totalSize is always 0. > Issue is: > MetaStoreUtils:updateUnpartitionedTableStatsFast attempts to set table level > stats from FileStatus. But it doesn't account for External tables, it always > calls Warehouse.getFileStatusesForUnpartitionedTable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6727) Table level stats for external tables are set incorrectly
[ https://issues.apache.org/jira/browse/HIVE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567068#comment-14567068 ] Bing Li commented on HIVE-6727: --- Hi, [~ashutoshc] The current test cases in Hive already cover this case but the result is wrong. HIVE-6727.3.patch fixes the output file of the case. Thank you for your review. > Table level stats for external tables are set incorrectly > - > > Key: HIVE-6727 > URL: https://issues.apache.org/jira/browse/HIVE-6727 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.13.0, 0.13.1, 1.2.0 >Reporter: Harish Butani >Assignee: Bing Li > Attachments: HIVE-6727.2.patch, HIVE-6727.3.patch > > > if you do the following: > {code} > CREATE EXTERNAL TABLE anaylyze_external (a INT) LOCATION > 'data/files/ext_test'; > describe formatted anaylyze_external; > {code} > The table level stats are: > {noformat} > Table Parameters: > COLUMN_STATS_ACCURATE true > EXTERNALTRUE > numFiles0 > numRows 6 > rawDataSize 6 > totalSize 0 > {noformat} > numFiles and totalSize is always 0. > Issue is: > MetaStoreUtils:updateUnpartitionedTableStatsFast attempts to set table level > stats from FileStatus. But it doesn't account for External tables, it always > calls Warehouse.getFileStatusesForUnpartitionedTable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-4577: -- Affects Version/s: 1.2.0 > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-4577: -- Fix Version/s: 1.2.1 > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Fix For: 1.2.1 > > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567075#comment-14567075 ] Bing Li commented on HIVE-4577: --- Hi, [~thejas] Could you review the patch again? Thanks a lot! > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Fix For: 1.2.1 > > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one
[ https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-6990: -- Attachment: HIVE-6990.4.patch This patch is generated based on the latest Hive code in master branch. > Direct SQL fails when the explicit schema setting is different from the > default one > --- > > Key: HIVE-6990 > URL: https://issues.apache.org/jira/browse/HIVE-6990 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.12.0 > Environment: hive + derby >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-6990.1.patch, HIVE-6990.2.patch, HIVE-6990.3.patch, > HIVE-6990.4.patch > > > I got the following ERROR in hive.log > 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore > (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling > back to ORM > javax.jdo.JDODataStoreException: Error executing SQL query "select > PARTITIONS.PART_ID from PARTITIONS inner join TBLS on PARTITIONS.TBL_ID = > TBLS.TBL_ID inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join > PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and > FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and > ((FILTER0.PART_KEY_VAL = ?))". > at > org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451) > at > org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) > at java.lang.reflect.Method.invoke(Method.java:619) > at > org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124) > at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) > at java.lang.reflect.Method.invoke(Method.java:619) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103) > at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source) > Reproduce steps: > 1. set the following properties in hive-site.xml > > javax.jdo.mapping.Schema > HIVE > > > javax.jdo.option.ConnectionUserName > user1 > > 2. execute hive queries > hive> create table mytbl ( key int, value string); > hive> load data local inpath 'examples/files/kv1.txt' overwrite into table > mytbl; > hive> select * from mytbl; > hive> create view myview partitioned on (value) as select key, value from > mytbl where key=98; > hive> alter view myview add partition (value='val_98') partition > (value='val_xyz'); > hive> alter view myview drop partition (value='val_xyz'); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-6865) Failed to load data into Hive from Pig using HCatStorer()
[ https://issues.apache.org/jira/browse/HIVE-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li resolved HIVE-6865. --- Resolution: Fixed Fix Version/s: 1.2.0 I tried the same queries in Hive 1.2.0, it could work well. Close it as fixed > Failed to load data into Hive from Pig using HCatStorer() > - > > Key: HIVE-6865 > URL: https://issues.apache.org/jira/browse/HIVE-6865 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.12.0 >Reporter: Bing Li >Assignee: Bing Li > Fix For: 1.2.0 > > > Reproduce steps: > 1. create a hive table > hive> create table t1 (c1 int, c2 int, c3 int); > 2. start pig shell > grunt> register $HIVE_HOME/lib/*.jar > grunt> register $HIVE_HOME/hcatalog/share/hcatalog/*.jar > grunt> A = load 'pig.txt' as (c1:int, c2:int, c3:int) > grunt> store A into 't1' using org.apache.hive.hcatalog.HCatSrorer(); > Error Message: > ERROR [main] org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: > Unable to recreate exception from backend error: > org.apache.hcatalog.common.HCatException : 2004 : HCatOutputFormat not > initialized, setOutput has to be called > at > org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:111) > at > org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:97) > at > org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:85) > at > org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:75) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1000) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:963) > at > java.security.AccessController.doPrivileged(AccessController.java:310) > at javax.security.auth.Subject.doAs(Subject.java:573) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:963) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:616) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128) > at > org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191) > at java.lang.Thread.run(Thread.java:738) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6727) Table level stats for external tables are set incorrectly
[ https://issues.apache.org/jira/browse/HIVE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571901#comment-14571901 ] Bing Li commented on HIVE-6727: --- Thank you, Ashutosh! > Table level stats for external tables are set incorrectly > - > > Key: HIVE-6727 > URL: https://issues.apache.org/jira/browse/HIVE-6727 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0 >Reporter: Harish Butani >Assignee: Bing Li > Fix For: 1.3.0 > > Attachments: HIVE-6727.2.patch, HIVE-6727.3.patch > > > if you do the following: > {code} > CREATE EXTERNAL TABLE anaylyze_external (a INT) LOCATION > 'data/files/ext_test'; > describe formatted anaylyze_external; > {code} > The table level stats are: > {noformat} > Table Parameters: > COLUMN_STATS_ACCURATE true > EXTERNALTRUE > numFiles0 > numRows 6 > rawDataSize 6 > totalSize 0 > {noformat} > numFiles and totalSize is always 0. > Issue is: > MetaStoreUtils:updateUnpartitionedTableStatsFast attempts to set table level > stats from FileStatus. But it doesn't account for External tables, it always > calls Warehouse.getFileStatusesForUnpartitionedTable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6727) Table level stats for external tables are set incorrectly
[ https://issues.apache.org/jira/browse/HIVE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571902#comment-14571902 ] Bing Li commented on HIVE-6727: --- Thank you, Ashutosh! > Table level stats for external tables are set incorrectly > - > > Key: HIVE-6727 > URL: https://issues.apache.org/jira/browse/HIVE-6727 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0 >Reporter: Harish Butani >Assignee: Bing Li > Fix For: 1.3.0 > > Attachments: HIVE-6727.2.patch, HIVE-6727.3.patch > > > if you do the following: > {code} > CREATE EXTERNAL TABLE anaylyze_external (a INT) LOCATION > 'data/files/ext_test'; > describe formatted anaylyze_external; > {code} > The table level stats are: > {noformat} > Table Parameters: > COLUMN_STATS_ACCURATE true > EXTERNALTRUE > numFiles0 > numRows 6 > rawDataSize 6 > totalSize 0 > {noformat} > numFiles and totalSize is always 0. > Issue is: > MetaStoreUtils:updateUnpartitionedTableStatsFast attempts to set table level > stats from FileStatus. But it doesn't account for External tables, it always > calls Warehouse.getFileStatusesForUnpartitionedTable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-4401) Support quoted schema and table names in Hive /Hive JDBC Driver
[ https://issues.apache.org/jira/browse/HIVE-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li resolved HIVE-4401. --- Resolution: Won't Fix https://issues.apache.org/jira/browse/HIVE-6013 added supported for quoted column names using ` instead of double-quotes. If we want to expand on the existing feature, we should open a new JIRA with a different description. > Support quoted schema and table names in Hive /Hive JDBC Driver > --- > > Key: HIVE-4401 > URL: https://issues.apache.org/jira/browse/HIVE-4401 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Bing Li >Assignee: Bing Li > > Hive driver can not handle the quoted table names and schema names, which can > be processed by db2, and almost all other databases. > e.g. > SELECT * FROM "gosales"."branch" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10948) Slf4j warning in HiveCLI due to spark
[ https://issues.apache.org/jira/browse/HIVE-10948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-10948: --- Description: The spark-assembly-1.3.1.jar is added to the Hive classpath ./hive.distro: export SPARK_HOME=$sparkHome ./hive.distro: sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar` ./hive.distro: CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}" When launch HiveCLI, we could see the following message: === SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] WARNING: Use "yarn jar" to launch YARN applications. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] The bug is similar like HIVE-9496 was: The spark-assembly-1.3.1.jar is added to the Hive classpath ./hive.distro: export SPARK_HOME=$sparkHome ./hive.distro: sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar` ./hive.distro: CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}" When launch HiveCLI, we could see the following message: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] WARNING: Use "yarn jar" to launch YARN applications. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Slf4j warning in HiveCLI due to spark > - > > Key: HIVE-10948 > URL: https://issues.apache.org/jira/browse/HIVE-10948 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 1.2.0 >Reporter: Bing Li >Assignee: Bing Li >Priority: Minor > > The spark-assembly-1.3.1.jar is added to the Hive classpath > ./hive.distro: export SPARK_HOME=$sparkHome > ./hive.distro: sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar` > ./hive.distro: CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}" > When launch HiveCLI, we could see the following message: > === > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > WARNING: Use "yarn jar" to launch YARN applications. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/.../hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/.../spark/lib/spark-assembly-1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > > The bug is similar like HIVE-9496 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11019) Can't create an Avro table with uniontype column correctly
[ https://issues.apache.org/jira/browse/HIVE-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-11019: -- Assignee: Bing Li > Can't create an Avro table with uniontype column correctly > -- > > Key: HIVE-11019 > URL: https://issues.apache.org/jira/browse/HIVE-11019 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Bing Li >Assignee: Bing Li > > I tried the example in > https://cwiki.apache.org/confluence/display/Hive/AvroSerDe > And found that it can't create an AVRO table correctly with uniontype > hive> create table avro_union(union1 uniontype)STORED > AS AVRO; > OK > Time taken: 0.083 seconds > hive> describe avro_union; > OK > union1 uniontype > > Time taken: 0.058 seconds, Fetched: 1 row(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel
[ https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320675#comment-15320675 ] Bing Li commented on HIVE-13850: Hi, [~ashutoshc] Thank you for your comments. Yes, you're right. The issue hasn't been resolved by naming the target file with timestamp. We ran into it again... We tried to set the following properties, but still got the error. Hive.support.concurrency -> true Hive.txn.manager -> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager Are there any other properties required? Thank you. > File name conflict when have multiple INSERT INTO queries running in parallel > - > > Key: HIVE-13850 > URL: https://issues.apache.org/jira/browse/HIVE-13850 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-13850-1.2.1.patch > > > We have an application which connect to HiveServer2 via JDBC. > In the application, it executes "INSERT INTO" query to the same table. > If there are a lot of users running the application at the same time. Some of > the INSERT could fail. > The root cause is that in Hive.checkPaths(), it uses the following method to > check the existing of the file. But if there are multiple inserts running in > parallel, it will led to the conflict. > for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); > counter++) { > itemDest = new Path(destf, name + ("_copy_" + counter) + > filetype); > } > The Error Message > === > In hive log, > org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error > while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met > > adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- > 23_642_2056172497900766879-3321/-ext-1/00_0 to > hdfs://node:8020/apps/hive > /warehouse/metadata.db/scalding_stats/00_0_copy_9014 > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: > 2719) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: > 1645) > > In hadoop log, > WARN hdfs.StateChange (FSDirRenameOp.java: > unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: > failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- > staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- > 1/00_0 to /apps/hive/warehouse/metadata. > db/scalding_stats/00_0_copy_9014 because destination exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel
[ https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350471#comment-15350471 ] Bing Li commented on HIVE-13850: Hi, [~ashutoshc] Thanks a lot for your comment. It worked for us to set Hive with ACID supported. I will close this defect as well. > File name conflict when have multiple INSERT INTO queries running in parallel > - > > Key: HIVE-13850 > URL: https://issues.apache.org/jira/browse/HIVE-13850 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-13850-1.2.1.patch > > > We have an application which connect to HiveServer2 via JDBC. > In the application, it executes "INSERT INTO" query to the same table. > If there are a lot of users running the application at the same time. Some of > the INSERT could fail. > The root cause is that in Hive.checkPaths(), it uses the following method to > check the existing of the file. But if there are multiple inserts running in > parallel, it will led to the conflict. > for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); > counter++) { > itemDest = new Path(destf, name + ("_copy_" + counter) + > filetype); > } > The Error Message > === > In hive log, > org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error > while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met > > adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- > 23_642_2056172497900766879-3321/-ext-1/00_0 to > hdfs://node:8020/apps/hive > /warehouse/metadata.db/scalding_stats/00_0_copy_9014 > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: > 2719) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: > 1645) > > In hadoop log, > WARN hdfs.StateChange (FSDirRenameOp.java: > unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: > failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- > staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- > 1/00_0 to /apps/hive/warehouse/metadata. > db/scalding_stats/00_0_copy_9014 because destination exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel
[ https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li resolved HIVE-13850. Resolution: Not A Bug It could be resolved with Hive ACID supported. > File name conflict when have multiple INSERT INTO queries running in parallel > - > > Key: HIVE-13850 > URL: https://issues.apache.org/jira/browse/HIVE-13850 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-13850-1.2.1.patch > > > We have an application which connect to HiveServer2 via JDBC. > In the application, it executes "INSERT INTO" query to the same table. > If there are a lot of users running the application at the same time. Some of > the INSERT could fail. > The root cause is that in Hive.checkPaths(), it uses the following method to > check the existing of the file. But if there are multiple inserts running in > parallel, it will led to the conflict. > for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); > counter++) { > itemDest = new Path(destf, name + ("_copy_" + counter) + > filetype); > } > The Error Message > === > In hive log, > org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error > while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met > > adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- > 23_642_2056172497900766879-3321/-ext-1/00_0 to > hdfs://node:8020/apps/hive > /warehouse/metadata.db/scalding_stats/00_0_copy_9014 > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: > 2719) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: > 1645) > > In hadoop log, > WARN hdfs.StateChange (FSDirRenameOp.java: > unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: > failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- > staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- > 1/00_0 to /apps/hive/warehouse/metadata. > db/scalding_stats/00_0_copy_9014 because destination exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL
[ https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-14156: -- Assignee: Bing Li > Problem with Chinese characters as partition value when using MySQL > --- > > Key: HIVE-14156 > URL: https://issues.apache.org/jira/browse/HIVE-14156 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 2.0.0 >Reporter: Bing Li >Assignee: Bing Li > > Steps to reproduce: > create table t1 (name string, age int) partitioned by (city string) row > format delimited fields terminated by ','; > load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 > partition (city='北京'); > The content of /tmp/chn-partition.txt: > 小明,20 > 小红,15 > 张三,36 > 李四,50 > When check the partition value in MySQL, it shows ?? instead of "北京". > When run "drop table t1", it will hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL
[ https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361394#comment-15361394 ] Bing Li commented on HIVE-14156: I noticed that in the schema files under metastore/scripts/upgrade/mysql, like hive-schema-2.0.0.mysql.sql, the character set is latin1 for all tables instead of utf8. And it could work with MySQL if I update the following columns in the schema script to utf8 SDS.LOCATION PARTITIONS.PART_NAME PARTITION_KEY_VALS.PART_KEY_VAL 1) change the limitation of varchar(xxx) to varchar(255) 2) change "latin1" to "utf8" In Hive's wiki and HIVE-8550, it mentioned that Hive could support unicode in the partition name. Is there some special settings for MySQL to support it? > Problem with Chinese characters as partition value when using MySQL > --- > > Key: HIVE-14156 > URL: https://issues.apache.org/jira/browse/HIVE-14156 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 2.0.0 >Reporter: Bing Li >Assignee: Bing Li > > Steps to reproduce: > create table t1 (name string, age int) partitioned by (city string) row > format delimited fields terminated by ','; > load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 > partition (city='北京'); > The content of /tmp/chn-partition.txt: > 小明,20 > 小红,15 > 张三,36 > 李四,50 > When check the partition value in MySQL, it shows ?? instead of "北京". > When run "drop table t1", it will hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL
[ https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362091#comment-15362091 ] Bing Li commented on HIVE-14156: Hi, Rui I didn't have a chance to try other databases, like Derby, Oracle and Postgres. But one thing I found is that in the scripts for other databases, it didn't specify the character set. > Problem with Chinese characters as partition value when using MySQL > --- > > Key: HIVE-14156 > URL: https://issues.apache.org/jira/browse/HIVE-14156 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 2.0.0 >Reporter: Bing Li >Assignee: Bing Li > > Steps to reproduce: > create table t1 (name string, age int) partitioned by (city string) row > format delimited fields terminated by ','; > load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 > partition (city='北京'); > The content of /tmp/chn-partition.txt: > 小明,20 > 小红,15 > 张三,36 > 李四,50 > When check the partition value in MySQL, it shows ?? instead of "北京". > When run "drop table t1", it will hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL
[ https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362092#comment-15362092 ] Bing Li commented on HIVE-14156: Hi, Rui I didn't have a chance to try other databases, like Derby, Oracle and Postgres. But one thing I found is that in the scripts for other databases, it didn't specify the character set. > Problem with Chinese characters as partition value when using MySQL > --- > > Key: HIVE-14156 > URL: https://issues.apache.org/jira/browse/HIVE-14156 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 2.0.0 >Reporter: Bing Li >Assignee: Bing Li > > Steps to reproduce: > create table t1 (name string, age int) partitioned by (city string) row > format delimited fields terminated by ','; > load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 > partition (city='北京'); > The content of /tmp/chn-partition.txt: > 小明,20 > 小红,15 > 张三,36 > 李四,50 > When check the partition value in MySQL, it shows ?? instead of "北京". > When run "drop table t1", it will hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL
[ https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-14156: --- Comment: was deleted (was: Hi, Rui I didn't have a chance to try other databases, like Derby, Oracle and Postgres. But one thing I found is that in the scripts for other databases, it didn't specify the character set. ) > Problem with Chinese characters as partition value when using MySQL > --- > > Key: HIVE-14156 > URL: https://issues.apache.org/jira/browse/HIVE-14156 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 2.0.0 >Reporter: Bing Li >Assignee: Bing Li > > Steps to reproduce: > create table t1 (name string, age int) partitioned by (city string) row > format delimited fields terminated by ','; > load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 > partition (city='北京'); > The content of /tmp/chn-partition.txt: > 小明,20 > 小红,15 > 张三,36 > 李四,50 > When check the partition value in MySQL, it shows ?? instead of "北京". > When run "drop table t1", it will hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14156) Problem with Chinese characters as partition value when using MySQL
[ https://issues.apache.org/jira/browse/HIVE-14156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362286#comment-15362286 ] Bing Li commented on HIVE-14156: Hi, [~xiaobingo] I noticed that you fixed HIVE-8550 on windows, and mentioned that it should work on Linux. I ran the similar query but failed with MySQL. In order to make it work, besides the changes in Hive schema script, I also need to update MySQL's configuration file which is my.cnf. When you ran it on windows, did you change the configuraions for the database? Did you have a chance to run it on Linux as well? Thank you. > Problem with Chinese characters as partition value when using MySQL > --- > > Key: HIVE-14156 > URL: https://issues.apache.org/jira/browse/HIVE-14156 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 2.0.0 >Reporter: Bing Li >Assignee: Bing Li > > Steps to reproduce: > create table t1 (name string, age int) partitioned by (city string) row > format delimited fields terminated by ','; > load data local inpath '/tmp/chn-partition.txt' overwrite into table t1 > partition (city='北京'); > The content of /tmp/chn-partition.txt: > 小明,20 > 小红,15 > 张三,36 > 李四,50 > When check the partition value in MySQL, it shows ?? instead of "北京". > When run "drop table t1", it will hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled
[ https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288963#comment-15288963 ] Bing Li commented on HIVE-13384: Refer to Drill-3413, we found the method to resolve this issue in the client side. The key point is that to get the delegation token for the proxy user, and assign it to hive.metastore.token.signature. I tried this method in two different scenario: 1. use the proxy user to initialize an object of HiveMetaStoreClient, which is mentioned in the description 2. access to Hive table in Pig via HCatalog Here are the sample codes for above two scenarios: 1. use the proxy use to create HiveMetaStoreClient object UserGroupInformation loginUser = UserGroupInformation.getLoginUser (); // in this example, the loginUser is user hive // the "loginuser" impersonates user hdfs UserGroupInformation ugi = UserGroupInformation.createProxyUser ("hdfs", loginUser); // in this example, user hive is the super user // which will do the login with its keytab and principle // user hdfs is the proxyuser HiveMetaStoreClient realUserClient = new HiveMetaStoreClient(new HiveConf()); // get the delegation token for proxyuser hdfs, and the owner of this token is hdfs as well String delegationTokenStr = realUserClient.getDelegationToken("hdfs","hdfs"); realUserClient.close(); String DELEGATION_TOKEN = "DelegationTokenForHiveMetaStoreServer"; // create a delegation token object and add it to the given UGI Utils.setTokenStr(ugi, delegationTokenStr, DELEGATION_TOKEN); ugi.doAs (new PrivilegedExceptionAction () { public Void run () throws Exception { hiveConf = new HiveConf (); hiveConf.set("hive.metastore.token.signature",DELEGATION_TOKEN); client = new HiveMetaStoreClient (hiveConf); return null; } }); 2. In Pig Java program HiveConf hiveConf = new HiveConf(); HCatClient client = HCatClient.create(hiveConf); UserGroupInformation ugi = UserGroupInformation.createProxyUser(proxyUser, UserGroupInformation.getLoginUser()); // get and set the delegation token String tokenStrForm = client.getDelegationToken(proxyUser, proxyUser); String DELEGATION_TOKEN = "DelegationTokenForHiveMetaStoreServer"; Utils.setTokenStr(ugi, tokenStrForm, DELEGATION_TOKEN); Properties pigProp = new Properties(); pigProp.setProperty("hive.metastore.token.signature",DELEGATION_TOKEN ); client.close(); // initialize pigServer with the pigProperty PigServer pigServer = new PigServer(ExecType.MAPREDUCE, pigProp); ugi.doAs(new PrivilegedExceptionAction() { public Void run() throws Exception { loadJars(pigServer); // customize method runQuery(pigServer); // customize method return null; } }); > Failed to create HiveMetaStoreClient object with proxy user when Kerberos > enabled > - > > Key: HIVE-13384 > URL: https://issues.apache.org/jira/browse/HIVE-13384 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li > > I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0) > But found that it can't new a HiveMetaStoreClient object successfully via a > proxy using in Kerberos env. > === > 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > == > When I debugging on Hive, I found that the error came from open() method in > HiveMetaStoreClient class. > Around line 406, > transport = UserGroupInformation.getCurrentUser().doAs(new > PrivilegedExceptionAction() { //FAILED, because the current user > doesn't have the cridential > But it will work if I change above line to > transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new > PrivilegedExceptionAction() { //PASS > I found DRILL-3413 fixes this error in Drill side as a workaround. But if I > submit a mapreduce job via Pig/HCatalog, it runs into the same issue again > when initialize the object via HCatalog. > It would be better to f
[jira] [Resolved] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled
[ https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li resolved HIVE-13384. Resolution: Won't Fix > Failed to create HiveMetaStoreClient object with proxy user when Kerberos > enabled > - > > Key: HIVE-13384 > URL: https://issues.apache.org/jira/browse/HIVE-13384 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li > > I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0) > But found that it can't new a HiveMetaStoreClient object successfully via a > proxy using in Kerberos env. > === > 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > == > When I debugging on Hive, I found that the error came from open() method in > HiveMetaStoreClient class. > Around line 406, > transport = UserGroupInformation.getCurrentUser().doAs(new > PrivilegedExceptionAction() { //FAILED, because the current user > doesn't have the cridential > But it will work if I change above line to > transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new > PrivilegedExceptionAction() { //PASS > I found DRILL-3413 fixes this error in Drill side as a workaround. But if I > submit a mapreduce job via Pig/HCatalog, it runs into the same issue again > when initialize the object via HCatalog. > It would be better to fix this issue in Hive side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled
[ https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-13384: -- Assignee: Bing Li > Failed to create HiveMetaStoreClient object with proxy user when Kerberos > enabled > - > > Key: HIVE-13384 > URL: https://issues.apache.org/jira/browse/HIVE-13384 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li >Assignee: Bing Li > > I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0) > But found that it can't new a HiveMetaStoreClient object successfully via a > proxy using in Kerberos env. > === > 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > == > When I debugging on Hive, I found that the error came from open() method in > HiveMetaStoreClient class. > Around line 406, > transport = UserGroupInformation.getCurrentUser().doAs(new > PrivilegedExceptionAction() { //FAILED, because the current user > doesn't have the cridential > But it will work if I change above line to > transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new > PrivilegedExceptionAction() { //PASS > I found DRILL-3413 fixes this error in Drill side as a workaround. But if I > submit a mapreduce job via Pig/HCatalog, it runs into the same issue again > when initialize the object via HCatalog. > It would be better to fix this issue in Hive side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel
[ https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-13850: --- Affects Version/s: 1.2.1 > File name conflict when have multiple INSERT INTO queries running in parallel > - > > Key: HIVE-13850 > URL: https://issues.apache.org/jira/browse/HIVE-13850 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Bing Li >Assignee: Bing Li > > We have an application which connect to HiveServer2 via JDBC. > In the application, it executes "INSERT INTO" query to the same table. > If there are a lot of users running the application at the same time. Some of > the INSERT could fail. > In hive log, > org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error > while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met > > adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- > 23_642_2056172497900766879-3321/-ext-1/00_0 to > hdfs://node:8020/apps/hive > /warehouse/metadata.db/scalding_stats/00_0_copy_9014 > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: > 2719) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: > 1645) > > In hadoop log, > WARN hdfs.StateChange (FSDirRenameOp.java: > unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: > failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- > staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- > 1/00_0 to /apps/hive/warehouse/metadata. > db/scalding_stats/00_0_copy_9014 because destination exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel
[ https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-13850: --- Description: We have an application which connect to HiveServer2 via JDBC. In the application, it executes "INSERT INTO" query to the same table. If there are a lot of users running the application at the same time. Some of the INSERT could fail. The root cause is that in Hive.checkPaths(), it uses the following method to check the existing of the file. But if there are multiple inserts running in parallel, it will led to the conflict. for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); counter++) { itemDest = new Path(destf, name + ("_copy_" + counter) + filetype); } The Error Message === In hive log, org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- 23_642_2056172497900766879-3321/-ext-1/00_0 to hdfs://node:8020/apps/hive /warehouse/metadata.db/scalding_stats/00_0_copy_9014 at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 2719) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 1645) In hadoop log, WARN hdfs.StateChange (FSDirRenameOp.java: unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 1/00_0 to /apps/hive/warehouse/metadata. db/scalding_stats/00_0_copy_9014 because destination exists was: We have an application which connect to HiveServer2 via JDBC. In the application, it executes "INSERT INTO" query to the same table. If there are a lot of users running the application at the same time. Some of the INSERT could fail. In hive log, org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- 23_642_2056172497900766879-3321/-ext-1/00_0 to hdfs://node:8020/apps/hive /warehouse/metadata.db/scalding_stats/00_0_copy_9014 at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 2719) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 1645) In hadoop log, WARN hdfs.StateChange (FSDirRenameOp.java: unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 1/00_0 to /apps/hive/warehouse/metadata. db/scalding_stats/00_0_copy_9014 because destination exists > File name conflict when have multiple INSERT INTO queries running in parallel > - > > Key: HIVE-13850 > URL: https://issues.apache.org/jira/browse/HIVE-13850 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Bing Li >Assignee: Bing Li > > We have an application which connect to HiveServer2 via JDBC. > In the application, it executes "INSERT INTO" query to the same table. > If there are a lot of users running the application at the same time. Some of > the INSERT could fail. > The root cause is that in Hive.checkPaths(), it uses the following method to > check the existing of the file. But if there are multiple inserts running in > parallel, it will led to the conflict. > for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); > counter++) { > itemDest = new Path(destf, name + ("_copy_" + counter) + > filetype); > } > The Error Message > === > In hive log, > org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error > while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met > > adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- > 23_642_2056172497900766879-3321/-ext-1/00_0 to > hdfs://node:8020/apps/hive > /warehouse/
[jira] [Updated] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel
[ https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-13850: --- Attachment: HIVE-13850-1.2.1.patch This patch file is based on Hive-1.2.1. It will use the time stamp to name the data file under the table directory to avoid the conflict. > File name conflict when have multiple INSERT INTO queries running in parallel > - > > Key: HIVE-13850 > URL: https://issues.apache.org/jira/browse/HIVE-13850 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-13850-1.2.1.patch > > > We have an application which connect to HiveServer2 via JDBC. > In the application, it executes "INSERT INTO" query to the same table. > If there are a lot of users running the application at the same time. Some of > the INSERT could fail. > The root cause is that in Hive.checkPaths(), it uses the following method to > check the existing of the file. But if there are multiple inserts running in > parallel, it will led to the conflict. > for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); > counter++) { > itemDest = new Path(destf, name + ("_copy_" + counter) + > filetype); > } > The Error Message > === > In hive log, > org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error > while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met > > adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- > 23_642_2056172497900766879-3321/-ext-1/00_0 to > hdfs://node:8020/apps/hive > /warehouse/metadata.db/scalding_stats/00_0_copy_9014 > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: > 2719) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: > 1645) > > In hadoop log, > WARN hdfs.StateChange (FSDirRenameOp.java: > unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: > failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- > staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- > 1/00_0 to /apps/hive/warehouse/metadata. > db/scalding_stats/00_0_copy_9014 because destination exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6091) Empty pipeout files are created for connection create/close
[ https://issues.apache.org/jira/browse/HIVE-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-6091: -- Attachment: HIVE-6091.2.patch Re-generated the patch based on the latest code in master branch > Empty pipeout files are created for connection create/close > --- > > Key: HIVE-6091 > URL: https://issues.apache.org/jira/browse/HIVE-6091 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.0, 1.2.1 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Attachments: HIVE-6091.1.patch, HIVE-6091.2.patch, HIVE-6091.patch > > > Pipeout files are created when a connection is established and removed only > when data was produced. Instead we should create them only when data has to > be fetched or remove them whether data is fetched or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null
[ https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-10495: --- Attachment: HIVE-10495.2.patch Re-generated the patch based on the latest master branch. > Hive index creation code throws NPE if index table is null > -- > > Key: HIVE-10495 > URL: https://issues.apache.org/jira/browse/HIVE-10495 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0, 1.2.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-10495.1.patch, HIVE-10495.2.patch > > > The stack trace would be: > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102) > at $Proxy9.add_index(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null
[ https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-10495: --- Affects Version/s: 1.2.1 > Hive index creation code throws NPE if index table is null > -- > > Key: HIVE-10495 > URL: https://issues.apache.org/jira/browse/HIVE-10495 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0, 1.2.0, 1.2.1 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-10495.1.patch, HIVE-10495.2.patch > > > The stack trace would be: > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102) > at $Proxy9.add_index(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver
[ https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739929#comment-14739929 ] Bing Li commented on HIVE-10982: Hi, [~pxiong] Yes, I will start to work on this soon. Thank you. > Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver > -- > > Key: HIVE-10982 > URL: https://issues.apache.org/jira/browse/HIVE-10982 > Project: Hive > Issue Type: Improvement > Components: JDBC >Affects Versions: 1.2.0 >Reporter: Bing Li >Assignee: Bing Li >Priority: Critical > > The current JDBC driver for Hive hard-code the value of setFetchSize to 50, > which will be a bottleneck for performance. > Pentaho filed this issue as http://jira.pentaho.com/browse/PDI-11511, whose > status is open. > Also it has discussion in > http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform > http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one
[ https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-6990: -- Attachment: HIVE-6990.5.patch The patch is created based on the latest code in master branch > Direct SQL fails when the explicit schema setting is different from the > default one > --- > > Key: HIVE-6990 > URL: https://issues.apache.org/jira/browse/HIVE-6990 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.12.0 > Environment: hive + derby >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-6990.1.patch, HIVE-6990.2.patch, HIVE-6990.3.patch, > HIVE-6990.4.patch, HIVE-6990.5.patch > > > I got the following ERROR in hive.log > 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore > (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling > back to ORM > javax.jdo.JDODataStoreException: Error executing SQL query "select > PARTITIONS.PART_ID from PARTITIONS inner join TBLS on PARTITIONS.TBL_ID = > TBLS.TBL_ID inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join > PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and > FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and > ((FILTER0.PART_KEY_VAL = ?))". > at > org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451) > at > org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) > at java.lang.reflect.Method.invoke(Method.java:619) > at > org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124) > at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) > at java.lang.reflect.Method.invoke(Method.java:619) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103) > at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source) > Reproduce steps: > 1. set the following properties in hive-site.xml > > javax.jdo.mapping.Schema > HIVE > > > javax.jdo.option.ConnectionUserName > user1 > > 2. execute hive queries > hive> create table mytbl ( key int, value string); > hive> load data local inpath 'examples/files/kv1.txt' overwrite into table > mytbl; > hive> select * from mytbl; > hive> create view myview partitioned on (value) as select key, value from > mytbl where key=98; > hive> alter view myview add partition (value='val_98') partition > (value='val_xyz'); > hive> alter view myview drop partition (value='val_xyz'); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one
[ https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-6990: -- Affects Version/s: 0.14.0 1.2.1 > Direct SQL fails when the explicit schema setting is different from the > default one > --- > > Key: HIVE-6990 > URL: https://issues.apache.org/jira/browse/HIVE-6990 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.12.0, 0.14.0, 1.2.1 > Environment: hive + derby >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-6990.1.patch, HIVE-6990.2.patch, HIVE-6990.3.patch, > HIVE-6990.4.patch, HIVE-6990.5.patch > > > I got the following ERROR in hive.log > 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore > (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling > back to ORM > javax.jdo.JDODataStoreException: Error executing SQL query "select > PARTITIONS.PART_ID from PARTITIONS inner join TBLS on PARTITIONS.TBL_ID = > TBLS.TBL_ID inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join > PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and > FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and > ((FILTER0.PART_KEY_VAL = ?))". > at > org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451) > at > org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) > at java.lang.reflect.Method.invoke(Method.java:619) > at > org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124) > at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) > at java.lang.reflect.Method.invoke(Method.java:619) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103) > at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source) > Reproduce steps: > 1. set the following properties in hive-site.xml > > javax.jdo.mapping.Schema > HIVE > > > javax.jdo.option.ConnectionUserName > user1 > > 2. execute hive queries > hive> create table mytbl ( key int, value string); > hive> load data local inpath 'examples/files/kv1.txt' overwrite into table > mytbl; > hive> select * from mytbl; > hive> create view myview partitioned on (value) as select key, value from > mytbl where key=98; > hive> alter view myview add partition (value='val_98') partition > (value='val_xyz'); > hive> alter view myview drop partition (value='val_xyz'); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9169) UT: set hive.support.concurrency to true for spark UTs
[ https://issues.apache.org/jira/browse/HIVE-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-9169: -- Assignee: (was: Bing Li) > UT: set hive.support.concurrency to true for spark UTs > -- > > Key: HIVE-9169 > URL: https://issues.apache.org/jira/browse/HIVE-9169 > Project: Hive > Issue Type: Sub-task > Components: Tests >Affects Versions: spark-branch >Reporter: Thomas Friedrich >Priority: Minor > > The test cases > lock1 > lock2 > lock3 > lock4 > are failing because the flag hive.support.concurrency is set to false in the > hive-site.xml for the spark tests. > This value was set to true in trunk with HIVE-1293 when these test cases were > introduced to Hive. > After setting the value to true and generating the output files, the test > cases are successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver
[ https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-10982: --- Attachment: HIVE-10982.1.patch > Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver > -- > > Key: HIVE-10982 > URL: https://issues.apache.org/jira/browse/HIVE-10982 > Project: Hive > Issue Type: Improvement > Components: JDBC >Affects Versions: 1.2.0 >Reporter: Bing Li >Assignee: Bing Li >Priority: Critical > Attachments: HIVE-10982.1.patch > > > The current JDBC driver for Hive hard-code the value of setFetchSize to 50, > which will be a bottleneck for performance. > Pentaho filed this issue as http://jira.pentaho.com/browse/PDI-11511, whose > status is open. > Also it has discussion in > http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform > http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver
[ https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905921#comment-14905921 ] Bing Li commented on HIVE-10982: The patch is created based on the latest code in master branch > Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver > -- > > Key: HIVE-10982 > URL: https://issues.apache.org/jira/browse/HIVE-10982 > Project: Hive > Issue Type: Improvement > Components: JDBC >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li >Assignee: Bing Li >Priority: Critical > Attachments: HIVE-10982.1.patch > > > The current JDBC driver for Hive hard-code the value of setFetchSize to 50, > which will be a bottleneck for performance. > Pentaho filed this issue as http://jira.pentaho.com/browse/PDI-11511, whose > status is open. > Also it has discussion in > http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform > http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver
[ https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906075#comment-14906075 ] Bing Li commented on HIVE-10982: [~pxiong] and [~vgumashta], I have upload the patch and waiting for the reply from the community. Thank you. > Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver > -- > > Key: HIVE-10982 > URL: https://issues.apache.org/jira/browse/HIVE-10982 > Project: Hive > Issue Type: Improvement > Components: JDBC >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li >Assignee: Bing Li >Priority: Critical > Attachments: HIVE-10982.1.patch > > > The current JDBC driver for Hive hard-code the value of setFetchSize to 50, > which will be a bottleneck for performance. > Pentaho filed this issue as http://jira.pentaho.com/browse/PDI-11511, whose > status is open. > Also it has discussion in > http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform > http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver
[ https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934744#comment-14934744 ] Bing Li commented on HIVE-10982: Hi, [~vgumashta] Thank you for your comment. Do you mean to invoke a new property to hive-site.xml, which will control the max size responded by HS2 at the same time? Do you know the current control mechanism on HS2? > Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver > -- > > Key: HIVE-10982 > URL: https://issues.apache.org/jira/browse/HIVE-10982 > Project: Hive > Issue Type: Improvement > Components: JDBC >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li >Assignee: Bing Li >Priority: Critical > Attachments: HIVE-10982.1.patch > > > The current JDBC driver for Hive hard-code the value of setFetchSize to 50, > which will be a bottleneck for performance. > Pentaho filed this issue as http://jira.pentaho.com/browse/PDI-11511, whose > status is open. > Also it has discussion in > http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform > http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10982) Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver
[ https://issues.apache.org/jira/browse/HIVE-10982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996186#comment-14996186 ] Bing Li commented on HIVE-10982: Hi, [~alangates] Thank you for your comment. Yes, I still want to be able to set this property via the connection URL. I will rebase the patch soon. Thank you. > Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver > -- > > Key: HIVE-10982 > URL: https://issues.apache.org/jira/browse/HIVE-10982 > Project: Hive > Issue Type: Improvement > Components: JDBC >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li >Assignee: Bing Li >Priority: Critical > Attachments: HIVE-10982.1.patch > > > The current JDBC driver for Hive hard-code the value of setFetchSize to 50, > which will be a bottleneck for performance. > Pentaho filed this issue as http://jira.pentaho.com/browse/PDI-11511, whose > status is open. > Also it has discussion in > http://forums.pentaho.com/showthread.php?158381-Hive-JDBC-Query-too-slow-too-many-fetches-after-query-execution-Kettle-Xform > http://mail-archives.apache.org/mod_mbox/hive-user/201307.mbox/%3ccacq46vevgrfqg5rwxnr1psgyz7dcf07mvlo8mm2qit3anm1...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6963) Beeline logs are printing on the console
[ https://issues.apache.org/jira/browse/HIVE-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381700#comment-14381700 ] Bing Li commented on HIVE-6963: --- Hi, Chinna Have you uploaded the latest patch? I tried the patch attached in this Jira, and found: 1. In order to launch bin/beeline, I need to add the following jars to HADOOP_CLASSPATH in bin/ext/beeline.sh hive/lib/hive-shims-0.23.jar hive/lib/hive-shims-common-secure.jar hive/lib/hive-shims-common.jar 2. The log file doesn't contain much info as the one for HiveCLI in its log file, it only has the following lines: [biadmin@bdvs1100 biadmin]$ cat hive.log 2015-02-13 06:53:50,145 INFO jdbc.Utils (Utils.java:parseURL(285)) - Supplied authorities: bdvs1100.svl.ibm.com:1 2015-02-13 06:53:50,149 INFO jdbc.Utils (Utils.java:parseURL(372)) - Resolved authority: bdvs1100.svl.ibm.com:1 2015-02-13 06:53:50,184 INFO jdbc.HiveConnection (HiveConnection.java:openTransport(191)) - Will try to open client transport with JDBC Uri: jdbc:hive2://9.123.2.21:1 Are they known issue or worked as design? Thank you. - Bing > Beeline logs are printing on the console > > > Key: HIVE-6963 > URL: https://issues.apache.org/jira/browse/HIVE-6963 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Chinna Rao Lalam >Assignee: Chinna Rao Lalam > Attachments: HIVE-6963.patch > > > beeline logs are not redirected to the log file. > If log is redirected to log file, only required information will print on the > console. > This way it is more easy to read the output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-4577: -- Attachment: HIVE-4577.4.patch Re-create the patch file based on the latest code in trunk > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513892#comment-14513892 ] Bing Li commented on HIVE-4577: --- Hi, [~thejas] I generated a new patch for this defect, also fixed a bug in my previous patch. Could you help to review it? Thank you! > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-4577: -- Affects Version/s: 1.1.0 0.14.0 0.13.1 > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null
[ https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-10495: --- Attachment: HIVE-10495.1.patch The patch is created based on the latest trunk. > Hive index creation code throws NPE if index table is null > -- > > Key: HIVE-10495 > URL: https://issues.apache.org/jira/browse/HIVE-10495 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-10495.1.patch > > > The stack trace would be: > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102) > at $Proxy9.add_index(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518930#comment-14518930 ] Bing Li commented on HIVE-4577: --- The failure should not related to this patch. > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10495) Hive index creation code throws NPE if index table is null
[ https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518904#comment-14518904 ] Bing Li commented on HIVE-10495: The failure should not related to this patch. > Hive index creation code throws NPE if index table is null > -- > > Key: HIVE-10495 > URL: https://issues.apache.org/jira/browse/HIVE-10495 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-10495.1.patch > > > The stack trace would be: > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102) > at $Proxy9.add_index(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null
[ https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-10495: --- Attachment: (was: HIVE-10495.1.patch) > Hive index creation code throws NPE if index table is null > -- > > Key: HIVE-10495 > URL: https://issues.apache.org/jira/browse/HIVE-10495 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Bing Li >Assignee: Bing Li > > The stack trace would be: > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102) > at $Proxy9.add_index(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11201) HCatalog is ignoring user specified avro schema in the table definition
[ https://issues.apache.org/jira/browse/HIVE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-11201: --- Attachment: HIVE-11201.1.patch The patch is created based on the latest code in master branch. > HCatalog is ignoring user specified avro schema in the table definition > > > Key: HIVE-11201 > URL: https://issues.apache.org/jira/browse/HIVE-11201 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.0 >Reporter: Bing Li >Assignee: Bing Li >Priority: Critical > Attachments: HIVE-11201.1.patch > > > HCatalog is ignoring user specified avro schema in the table definition , > instead generating its own avro based from hive meta store. > By generating its own schema will result in mismatch names. For exmple Avro > fields name are Case Sensitive. By generating it's own schema will result > in incorrect schema written to the avro file , and result select fail on > read. And also Even if user specified schema does not allow null , when > data is written using Hcatalog , it will write a schema that will allow null. > For example in the table , user specified , all CAPITAL letters in the > schema , and record name as LINEITEM. The schema should be written as it is. > Instead Hcatalog ignores it and generated its own avro schema from the hive > table case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6091) Empty pipeout files are created for connection create/close
[ https://issues.apache.org/jira/browse/HIVE-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624536#comment-14624536 ] Bing Li commented on HIVE-6091: --- Seems that the patch has been merged into Hive from 0.13.0 via https://issues.apache.org/jira/browse/HIVE-4395 > Empty pipeout files are created for connection create/close > --- > > Key: HIVE-6091 > URL: https://issues.apache.org/jira/browse/HIVE-6091 > Project: Hive > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Attachments: HIVE-6091.patch > > > Pipeout files are created when a connection is established and removed only > when data was produced. Instead we should create them only when data has to > be fetched or remove them whether data is fetched or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6091) Empty pipeout files are created for connection create/close
[ https://issues.apache.org/jira/browse/HIVE-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-6091: -- Attachment: HIVE-6091.1.patch With this patch, the pipeout file could be deleted after the session closed. I have tested it based on Hive 1.2.1 > Empty pipeout files are created for connection create/close > --- > > Key: HIVE-6091 > URL: https://issues.apache.org/jira/browse/HIVE-6091 > Project: Hive > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Attachments: HIVE-6091.1.patch, HIVE-6091.patch > > > Pipeout files are created when a connection is established and removed only > when data was produced. Instead we should create them only when data has to > be fetched or remove them whether data is fetched or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-3: --- Affects Version/s: 1.2.1 > ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. > --- > > Key: HIVE-3 > URL: https://issues.apache.org/jira/browse/HIVE-3 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1, 1.2.1 > Environment: >Reporter: Shiroy Pigarez > > I was trying to perform some column statistics using hive as per the > documentation > https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive > and was encountering the following errors: > Seems like a bug. Can you look into this? Thanks in advance. > -- HIVE table > {noformat} > hive> create table people_part( > name string, > address string) PARTITIONED BY (dob string, nationality varchar(2)) > row format delimited fields terminated by '\t'; > {noformat} > --Analyze table with partition dob and nationality with FOR COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > FAILED: ParseException line 1:95 cannot recognize input near '' '' > '' in column name > {noformat} > --Analyze table with partition dob and nationality values specified with FOR > COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) >
[jira] [Updated] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-3: --- Priority: Critical (was: Major) > ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. > --- > > Key: HIVE-3 > URL: https://issues.apache.org/jira/browse/HIVE-3 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1, 1.2.1 > Environment: >Reporter: Shiroy Pigarez >Priority: Critical > > I was trying to perform some column statistics using hive as per the > documentation > https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive > and was encountering the following errors: > Seems like a bug. Can you look into this? Thanks in advance. > -- HIVE table > {noformat} > hive> create table people_part( > name string, > address string) PARTITIONED BY (dob string, nationality varchar(2)) > row format delimited fields terminated by '\t'; > {noformat} > --Analyze table with partition dob and nationality with FOR COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > FAILED: ParseException line 1:95 cannot recognize input near '' '' > '' in column name > {noformat} > --Analyze table with partition dob and nationality values specified with FOR > COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at org.apache.hadoop.hive.ql.Dr
[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1462#comment-1462 ] Bing Li commented on HIVE-3: Hi, [~pxiong] and [~shiroy] I tried this scenario on Hive 1.2.1. And found it could work for a table stored as TEXTFILE, but cant NOT work for the one stored as PARQUET. Errors == Caused by: java.lang.IllegalArgumentException: Column [ds] was not found in schema! at parquet.Preconditions.checkArgument(Preconditions.java:55) at parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:190) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:178) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:160) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:94) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:59) at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:180) at parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:64) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:59) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:40) at parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:126) at parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:46) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:275) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:99) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:85) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67) ... 16 more Reproduced Queries == create table dummy (key string, value string) partitioned by (ds string, hr string); load data local inpath 'kv1.txt' into table dummy partition (ds='2008',hr='12'); load data local inpath 'kv1.txt' into table dummy partition (ds='2008',hr='11'); select * from dummy; analyze table dummy partition (ds='2008',hr='12') compute statistics for columns key; create table dummy2 (key string, value string) partitioned by (ds string, hr string)stored as parquet; insert into table dummy2 partition (ds='2008',hr='12') select key, value from dummy where (ds='2008'); select * from dummy2; analyze table dummy2 partition(ds='2008') compute statistics for columns key; > ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. > --- > > Key: HIVE-3 > URL: https://issues.apache.org/jira/browse/HIVE-3 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1, 1.2.1 > Environment: >Reporter: Shiroy Pigarez >Priority: Critical > > I was trying to perform some column statistics using hive as per the > documentation > https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive > and was encountering the following errors: > Seems like a bug. Can you look into this? Thanks in advance. > -- HIVE table > {noformat} > hive> create table people_part( > name string, > address string) PARTITIONED BY (dob string, nationality varchar(2)) > row format delimited fields terminated by '\t'; > {noformat} > --Analyze table with partition dob and nationality with FOR COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hado
[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633017#comment-14633017 ] Bing Li commented on HIVE-3: Hi, @Pengcheng Xiong What's the value of hive.optimize.ppd in your cluster? I can run into the error if I set it to true. > ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. > --- > > Key: HIVE-3 > URL: https://issues.apache.org/jira/browse/HIVE-3 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1, 1.2.1 > Environment: >Reporter: Shiroy Pigarez >Assignee: Pengcheng Xiong >Priority: Critical > > I was trying to perform some column statistics using hive as per the > documentation > https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive > and was encountering the following errors: > Seems like a bug. Can you look into this? Thanks in advance. > -- HIVE table > {noformat} > hive> create table people_part( > name string, > address string) PARTITIONED BY (dob string, nationality varchar(2)) > row format delimited fields terminated by '\t'; > {noformat} > --Analyze table with partition dob and nationality with FOR COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > FAILED: ParseException line 1:95 cannot recognize input near '' '' > '' in column name > {noformat} > --Analyze table with partition dob and nationality values specified with FOR > COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hadoop.h
[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633635#comment-14633635 ] Bing Li commented on HIVE-3: Hi, [~pxiong] Thank you for your quick response. Yes, I tried the queries on two different cluster. And both of them ran into this error in analyze table dummy2 partition(ds='2008') compute statistics for columns key; Then I tried to set hive.optimize.ppd to false, it would work, but got a bad performance. Do you have some idea that which classes may lead into it? Thank you! > ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. > --- > > Key: HIVE-3 > URL: https://issues.apache.org/jira/browse/HIVE-3 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1, 1.2.1 > Environment: >Reporter: Shiroy Pigarez >Assignee: Pengcheng Xiong >Priority: Critical > > I was trying to perform some column statistics using hive as per the > documentation > https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive > and was encountering the following errors: > Seems like a bug. Can you look into this? Thanks in advance. > -- HIVE table > {noformat} > hive> create table people_part( > name string, > address string) PARTITIONED BY (dob string, nationality varchar(2)) > row format delimited fields terminated by '\t'; > {noformat} > --Analyze table with partition dob and nationality with FOR COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > FAILED: ParseException line 1:95 cannot recognize input near '' '' > '' in column name > {noformat} > --Analyze table with partition dob and nationality values specified with FOR > COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.Hive
[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634338#comment-14634338 ] Bing Li commented on HIVE-3: Hi, [~pxiong] I didn't run the query for "people_part". What I ran list in my previous comment with "Reproduced Queries". In the "reproduced queries", I tried two different types of tables, one is TEXTFILE, and the other is PARQUET. The ANALYZE command on TEXTFILE was passed, while failed on PARQUET table with the error. analyze table dummy partition (ds='2008',hr='12') compute statistics for columns key;// PASS analyze table dummy2 partition(ds='2008') compute statistics for columns key; //FAILED Then I tried to disable hive.optimize.ppd, set its value to false. Then the following query could work without any error analyze table dummy2 partition(ds='2008') compute statistics for columns key; > ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. > --- > > Key: HIVE-3 > URL: https://issues.apache.org/jira/browse/HIVE-3 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1, 1.2.1 > Environment: >Reporter: Shiroy Pigarez >Assignee: Pengcheng Xiong >Priority: Critical > > I was trying to perform some column statistics using hive as per the > documentation > https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive > and was encountering the following errors: > Seems like a bug. Can you look into this? Thanks in advance. > -- HIVE table > {noformat} > hive> create table people_part( > name string, > address string) PARTITIONED BY (dob string, nationality varchar(2)) > row format delimited fields terminated by '\t'; > {noformat} > --Analyze table with partition dob and nationality with FOR COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > FAILED: ParseException line 1:95 cannot recognize input near '' '' > '' in column name > {noformat} > --Analyze table with partition dob and nationality values specified with FOR > COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) >
[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634801#comment-14634801 ] Bing Li commented on HIVE-3: Thank you, [~tfriedr] With your fix in HIVE-11326, all the queries could work now. > ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. > --- > > Key: HIVE-3 > URL: https://issues.apache.org/jira/browse/HIVE-3 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1, 1.2.1 > Environment: >Reporter: Shiroy Pigarez >Assignee: Pengcheng Xiong >Priority: Critical > > I was trying to perform some column statistics using hive as per the > documentation > https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive > and was encountering the following errors: > Seems like a bug. Can you look into this? Thanks in advance. > -- HIVE table > {noformat} > hive> create table people_part( > name string, > address string) PARTITIONED BY (dob string, nationality varchar(2)) > row format delimited fields terminated by '\t'; > {noformat} > --Analyze table with partition dob and nationality with FOR COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > FAILED: ParseException line 1:95 cannot recognize input near '' '' > '' in column name > {noformat} > --Analyze table with partition dob and nationality values specified with FOR > COLUMNS > {noformat} > hive> ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') > COMPUTE STATISTICS FOR COLUMNS; > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) > at > org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDri
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736074#comment-14736074 ] Bing Li commented on HIVE-4577: --- I submitted a review request manually. The link is https://reviews.apache.org/r/38199/ > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11201) HCatalog is ignoring user specified avro schema in the table definition
[ https://issues.apache.org/jira/browse/HIVE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736076#comment-14736076 ] Bing Li commented on HIVE-11201: I submitted the review request manually. The link is https://reviews.apache.org/r/34877/ > HCatalog is ignoring user specified avro schema in the table definition > > > Key: HIVE-11201 > URL: https://issues.apache.org/jira/browse/HIVE-11201 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li >Assignee: Bing Li >Priority: Critical > Attachments: HIVE-11201.1.patch > > > HCatalog is ignoring user specified avro schema in the table definition , > instead generating its own avro based from hive meta store. > By generating its own schema will result in mismatch names. For exmple Avro > fields name are Case Sensitive. By generating it's own schema will result > in incorrect schema written to the avro file , and result select fail on > read. And also Even if user specified schema does not allow null , when > data is written using Hcatalog , it will write a schema that will allow null. > For example in the table , user specified , all CAPITAL letters in the > schema , and record name as LINEITEM. The schema should be written as it is. > Instead Hcatalog ignores it and generated its own avro schema from the hive > table case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036565#comment-16036565 ] Bing Li commented on HIVE-16573: Hi, [~ruili] and [~anishek] Seems that we can't import class SessionState into InPlaceUpdate.java, it will cause module cycles error during compiling, which is hive-common->hive-exec->hive-common. I changed it as below: String engine = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE); boolean inPlaceUpdates = false; if (engine.equals("tez")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS); if (engine.equals("spark")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); Do you think is ok? > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16573 started by Bing Li. -- > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036565#comment-16036565 ] Bing Li edited comment on HIVE-16573 at 6/5/17 5:51 AM: Hi, [~ruili] and [~anishek] Seems that we can't import class SessionState into InPlaceUpdate.java, it will cause module cycles error during compiling, which is hive-common->hive-exec->hive-common. I changed it as below: {quote} String engine = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE); boolean inPlaceUpdates = false; if (engine.equals("tez")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS); if (engine.equals("spark")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); {quote} Do you think is ok? was (Author: libing): Hi, [~ruili] and [~anishek] Seems that we can't import class SessionState into InPlaceUpdate.java, it will cause module cycles error during compiling, which is hive-common->hive-exec->hive-common. I changed it as below: String engine = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE); boolean inPlaceUpdates = false; if (engine.equals("tez")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS); if (engine.equals("spark")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); Do you think is ok? > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036565#comment-16036565 ] Bing Li edited comment on HIVE-16573 at 6/5/17 5:52 AM: Hi, [~ruili] and [~anishek] Seems that we can't import class SessionState into InPlaceUpdate.java, it will cause module cycles error during compiling, which is hive-common->hive-exec->hive-common. I changed it as below: {quote} {{ String engine = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE); boolean inPlaceUpdates = false; if (engine.equals("tez")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS); if (engine.equals("spark")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); }} {quote} Do you think is ok? was (Author: libing): Hi, [~ruili] and [~anishek] Seems that we can't import class SessionState into InPlaceUpdate.java, it will cause module cycles error during compiling, which is hive-common->hive-exec->hive-common. I changed it as below: {quote} String engine = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE); boolean inPlaceUpdates = false; if (engine.equals("tez")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS); if (engine.equals("spark")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); {quote} Do you think is ok? > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036565#comment-16036565 ] Bing Li edited comment on HIVE-16573 at 6/5/17 5:53 AM: Hi, [~ruili] and [~anishek] Seems that we can't import class SessionState into InPlaceUpdate.java, it will cause module cycles error during compiling, which is hive-common->hive-exec->hive-common. I changed it as below: {quote} String engine = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE); boolean inPlaceUpdates = false; if (engine.equals("tez")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS); if (engine.equals("spark")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); {quote} Do you think is ok? was (Author: libing): Hi, [~ruili] and [~anishek] Seems that we can't import class SessionState into InPlaceUpdate.java, it will cause module cycles error during compiling, which is hive-common->hive-exec->hive-common. I changed it as below: {quote} {{ String engine = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE); boolean inPlaceUpdates = false; if (engine.equals("tez")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS); if (engine.equals("spark")) inPlaceUpdates = HiveConf.getBoolVar(conf, HiveConf.ConfVars.SPARK_EXEC_INPLACE_PROGRESS); }} {quote} Do you think is ok? > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16573: --- Attachment: HIVE-16573-branch2.3.patch > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573-branch2.3.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16573: --- Attachment: (was: HIVE-16573-branch2.3.patch) > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16573: --- Attachment: HIVE-16573.1.patch Generate the patch file based on master branch > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16573: --- Status: Patch Available (was: In Progress) I verified this patch, it could work for spark engine on HiveCLI. > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037924#comment-16037924 ] Bing Li commented on HIVE-16573: [~ruili] and [~anishek], thank you for your review. I just submitted the patch. > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16800) Hive Metastore configuration with Mysql
[ https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16800: -- Assignee: Bing Li > Hive Metastore configuration with Mysql > --- > > Key: HIVE-16800 > URL: https://issues.apache.org/jira/browse/HIVE-16800 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.2 >Reporter: Vigneshwaran >Assignee: Bing Li > > I'm trying to configure MySql as metastore in Hive 1.2.2 by following the > link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm > trying to run hive after all the step I'm getting the below errors: > Exception in thread "main" java.lang.RuntimeException: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > Caused by: java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) > Caused by: java.lang.reflect.InvocationTargetException > Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting > persistence propertiesNestedThrowables: -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16614) Support "set local time zone" statement
[ https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16614: -- Assignee: Bing Li > Support "set local time zone" statement > --- > > Key: HIVE-16614 > URL: https://issues.apache.org/jira/browse/HIVE-16614 > Project: Hive > Issue Type: Improvement >Reporter: Carter Shanklin >Assignee: Bing Li > > HIVE-14412 introduces a timezone-aware timestamp. > SQL has a concept of default time zone displacements, which are transparently > applied when converting between timezone-unaware types and timezone-aware > types and, in Hive's case, are also used to shift a timezone aware type to a > different time zone, depending on configuration. > SQL also provides that the default time zone displacement be settable at a > session level, so that clients can access a database simultaneously from > different time zones and see time values in their own time zone. > Currently the time zone displacement is fixed and is set based on the system > time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be > more convenient for users if they have the ability to set their time zone of > choice. > SQL defines "set time zone" with 2 ways of specifying the time zone, first > using an interval and second using the special keyword LOCAL. > Examples: > • set time zone '-8:00'; > • set time zone LOCAL; > LOCAL means to set the current default time zone displacement to the > session's original default time zone displacement. > Reference: SQL:2011 section 19.4 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (HIVE-16800) Hive Metastore configuration with Mysql
[ https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16800 started by Bing Li. -- > Hive Metastore configuration with Mysql > --- > > Key: HIVE-16800 > URL: https://issues.apache.org/jira/browse/HIVE-16800 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.2 >Reporter: Vigneshwaran >Assignee: Bing Li > > I'm trying to configure MySql as metastore in Hive 1.2.2 by following the > link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm > trying to run hive after all the step I'm getting the below errors: > Exception in thread "main" java.lang.RuntimeException: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > Caused by: java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) > Caused by: java.lang.reflect.InvocationTargetException > Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting > persistence propertiesNestedThrowables: -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16800) Hive Metastore configuration with Mysql
[ https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042352#comment-16042352 ] Bing Li commented on HIVE-16800: Hi, Vigneshwaran I think the document you referred to is out-of-date. Please try the following steps in your cluster (using the commands for RHEL as an example): 1. Install MySQL yum -y install mysql-server mysql mysql-devel 2. Start MySQL /etc/init.d/mysqld start 3. Link or copy mysql-connector-java.jar to hive/lib 4. Set configurations in hive-site.xml javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver javax.jdo.option.ConnectionURL=jdbc:mysql://myhost.com/hive?createDatabaseIfNotExist=true javax.jdo.option.ConnectionUserName=APP javax.jdo.option.ConnectionPassword=mine 5. Prepare database for HiveMetastore in MySQL mysql>create database hive; mysql> grant all on hive.* to 'APP'@'myhost.com' identified by 'mine'; 6. Verification on MySQL mysql -u APP -h myhost.com -p Type with "mine" as the password 7. Run Hive SchemaTool hive/bin/schematool -dbType mysql -initSchema 8. Start HiveMetastore hive/bin/hive --service metastore > Hive Metastore configuration with Mysql > --- > > Key: HIVE-16800 > URL: https://issues.apache.org/jira/browse/HIVE-16800 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.2 >Reporter: Vigneshwaran >Assignee: Bing Li > > I'm trying to configure MySql as metastore in Hive 1.2.2 by following the > link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm > trying to run hive after all the step I'm getting the below errors: > Exception in thread "main" java.lang.RuntimeException: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > Caused by: java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) > Caused by: java.lang.reflect.InvocationTargetException > Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting > persistence propertiesNestedThrowables: -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HIVE-16800) Hive Metastore configuration with Mysql
[ https://issues.apache.org/jira/browse/HIVE-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li resolved HIVE-16800. Resolution: Not A Bug > Hive Metastore configuration with Mysql > --- > > Key: HIVE-16800 > URL: https://issues.apache.org/jira/browse/HIVE-16800 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.2 >Reporter: Vigneshwaran >Assignee: Bing Li > > I'm trying to configure MySql as metastore in Hive 1.2.2 by following the > link https://dzone.com/articles/how-configure-mysql-metastore, but when I'm > trying to run hive after all the step I'm getting the below errors: > Exception in thread "main" java.lang.RuntimeException: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > Caused by: java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) > Caused by: java.lang.reflect.InvocationTargetException > Caused by: javax.jdo.JDOFatalUserException: Exception thrown setting > persistence propertiesNestedThrowables: -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042361#comment-16042361 ] Bing Li commented on HIVE-16659: Hi, [~ruili] Could I take it over? > Query plan should reflect hive.spark.use.groupby.shuffle > > > Key: HIVE-16659 > URL: https://issues.apache.org/jira/browse/HIVE-16659 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > > It's useful to show the shuffle type used in the query plan. Currently it > shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16659: -- Assignee: Bing Li (was: Rui Li) > Query plan should reflect hive.spark.use.groupby.shuffle > > > Key: HIVE-16659 > URL: https://issues.apache.org/jira/browse/HIVE-16659 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li > > It's useful to show the shuffle type used in the query plan. Currently it > shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16615) Support Time Zone Specifiers (i.e. "at time zone X")
[ https://issues.apache.org/jira/browse/HIVE-16615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16615: -- Assignee: Bing Li > Support Time Zone Specifiers (i.e. "at time zone X") > > > Key: HIVE-16615 > URL: https://issues.apache.org/jira/browse/HIVE-16615 > Project: Hive > Issue Type: Improvement >Reporter: Carter Shanklin >Assignee: Bing Li > > HIVE-14412 introduces a timezone-aware timestamp. > SQL has a concept of "time zone specifier" which applies to any datetime > value expression (which covers time/timestamp with and without timezones). > Hive lacks a time type so we can put that aside for a while. > Examples: > a. select time_stamp_with_time_zone at time zone '-8:00'; > b. select time_stamp_without_time_zone at time zone LOCAL; > These statements would adjust the expression from its original timezone into > a known target timezone. > Using the time zone specifier results in a data type that has a time zone. > If the original expression lacked a time zone, the result has a time zone. If > the original expression had a time zone, the result still has a time zone, > possibly a different one. > LOCAL means to use the session's original default time zone displacement. > The standard says that dates are not supported with time zone specifiers. It > seems common to ignore this rule and allow this, by converting the date to a > timestamp and then applying the usual rule. > The standard only requires an interval or the LOCAL keyword. Some databases > allow time zone identifiers like PST. > Reference: SQL:2011 section 6.31 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16766) Hive query with space as filter does not give proper result
[ https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16766: -- Assignee: Bing Li > Hive query with space as filter does not give proper result > --- > > Key: HIVE-16766 > URL: https://issues.apache.org/jira/browse/HIVE-16766 > Project: Hive > Issue Type: Bug >Reporter: Subash >Assignee: Bing Li >Priority: Critical > > Hi Team, > I have used the query as below format and it does not give proper results. > Since there is a split by \s+ in ExecuteStatementOperation class in line 48, > I feel something goes wrong there. Could help me with this, if i am wrong ? > I am using Hive JDBC version 1.1.0 > The sample query is as follows, > select count(1) as cnt from table where col1=" " and col2="D"; -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16936) wrong result with CTAS(create table as select)
[ https://issues.apache.org/jira/browse/HIVE-16936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16936: -- Assignee: Bing Li > wrong result with CTAS(create table as select) > -- > > Key: HIVE-16936 > URL: https://issues.apache.org/jira/browse/HIVE-16936 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Xiaomeng Huang >Assignee: Bing Li >Priority: Critical > > 1. > {code} > hive> desc abc_test_old; > OK > did string > activetimeint > {code} > 2. > {code} > hive> select 'test' as did from abc_test_old > > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1; > OK > test > {code} > result is 'test' > 3. > {code} > hive> create table abc_test_12345 as > > select 'test' as did from abc_test_old > > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1; > hive> select did from abc_test_12345 limit 1; > OK > 5FCAFD34-C124-4E13-AF65-27B675C945CC > {code} > result is '5FCAFD34-C124-4E13-AF65-27B675C945CC' > why result is not 'test'? > 4. > {code} > hive> explain > > create table abc_test_12345 as > > select 'test' as did from abc_test_old > > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, Stage-4 > Stage-3 > Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > Stage-7 depends on stages: Stage-0 > Stage-2 > Stage-4 > Stage-5 depends on stages: Stage-4 > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: abc_test_old > Statistics: Num rows: 32 Data size: 1152 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: (did = '5FCAFD34-C124-4E13-AF65-27B675C945CC') > (type: boolean) > Statistics: Num rows: 16 Data size: 576 Basic stats: COMPLETE > Column stats: NONE > Select Operator > Statistics: Num rows: 16 Data size: 576 Basic stats: COMPLETE > Column stats: NONE > Limit > Number of rows: 1 > Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE > Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 36 Basic stats: > COMPLETE Column stats: NONE > Reduce Operator Tree: > Select Operator > expressions: '5FCAFD34-C124-4E13-AF65-27B675C945CC' (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column > stats: NONE > Limit > Number of rows: 1 > Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE > Column stats: NONE > File Output Operator > compressed: true > Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE > Column stats: NONE > table: > input format: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > output format: > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde > name: default.abc_test_12345 > .. > {code} > why expressions is '5FCAFD34-C124-4E13-AF65-27B675C945CC' -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16907: -- Assignee: Bing Li > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | File Output Operator > | > | compressed: false > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE | > | table: >
[jira] [Work started] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16659 started by Bing Li. -- > Query plan should reflect hive.spark.use.groupby.shuffle > > > Key: HIVE-16659 > URL: https://issues.apache.org/jira/browse/HIVE-16659 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li > > It's useful to show the shuffle type used in the query plan. Currently it > shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16659: --- Attachment: HIVE-16659.1.patch This patch is based on branch-2.3. With the above changes, I could get the explain result as below. _hive> {color:#205081}set hive.spark.use.groupby.shuffle=true;{color} hive> explain select key, count(val) from t1 group by key;_ OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: {color:red}Reducer 2 <- Map 1 (GROUP, 2){color} DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 51.289 seconds, Fetched: 54 row(s) _hive> {color:#205081}set hive.spark.use.groupby.shuffle=false{color}; hive> explain select key, count(val) from t1 group by key;_ OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: {color:#205081}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color} DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE tabl
[jira] [Comment Edited] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070235#comment-16070235 ] Bing Li edited comment on HIVE-16659 at 6/30/17 3:10 PM: - This patch is based on branch-2.3. With the above changes, I could get the explain result as below. hive> {color:#d04437}set hive.spark.use.groupby.shuffle=true;{color} hive> explain select key, count(val) from t1 group by key;{color:#d04437}colored text{color} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: {color:red}Reducer 2 <- Map 1 (GROUP, 2){color} DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 51.289 seconds, Fetched: 54 row(s) hive> {color:#d04437}set hive.spark.use.groupby.shuffle=false;{color} hive> explain select key, count(val) from t1 group by key; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: {color:#d04437}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color} DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false S
[jira] [Comment Edited] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070235#comment-16070235 ] Bing Li edited comment on HIVE-16659 at 6/30/17 3:11 PM: - This patch is based on branch-2.3. With the above changes, I could get the explain result as below. hive> {color:red}set hive.spark.use.groupby.shuffle=true;{color} hive> explain select key, count(val) from t1 group by key; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: {color:red}Reducer 2 <- Map 1 (GROUP, 2){color} DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 51.289 seconds, Fetched: 54 row(s) hive> {color:red}set hive.spark.use.groupby.shuffle=false;{color} hive> explain select key, count(val) from t1 group by key; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: {color:red}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color} DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic sta
[jira] [Updated] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16659: --- Status: Patch Available (was: In Progress) > Query plan should reflect hive.spark.use.groupby.shuffle > > > Key: HIVE-16659 > URL: https://issues.apache.org/jira/browse/HIVE-16659 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li > Attachments: HIVE-16659.1.patch > > > It's useful to show the shuffle type used in the query plan. Currently it > shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16766) Hive query with space as filter does not give proper result
[ https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071054#comment-16071054 ] Bing Li commented on HIVE-16766: Hi, Subash Which Hive version did you use? Could you post the reproduce queries as well? I tried it on a Hive package built from branch-2.3, and it worked for me. My Testing == *hive> describe test;* OK col1string col2string Time taken: 0.057 seconds, Fetched: 2 row(s) *hive> select * from test;* OK a1 a2 b1 b2 c1 c2 D Time taken: 0.22 seconds, Fetched: 4 row(s) *hive> select count(1) as cnt from test where col1="" and col2="D";* Query ID = root_20170630235239_b58b7dbc-14ef-4126-b56b-fdcf187acc09 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Spark Job = f25577ce-2ed6-4c5c-a64a-6ff7419ab778 -- STAGES ATTEMPTSTATUS TOTAL COMPLETED RUNNING PENDING FAILED -- Stage-5 0 FINISHED 1 100 0 Stage-6 0 FINISHED 1 100 0 -- STAGES: 02/02[==>>] 100% ELAPSED TIME: 1.01 s -- Status: Finished successfully in 1.01 seconds OK 1 Time taken: 1.436 seconds, Fetched: 1 row(s) > Hive query with space as filter does not give proper result > --- > > Key: HIVE-16766 > URL: https://issues.apache.org/jira/browse/HIVE-16766 > Project: Hive > Issue Type: Bug >Reporter: Subash >Assignee: Bing Li >Priority: Critical > > Hi Team, > I have used the query as below format and it does not give proper results. > Since there is a split by \s+ in ExecuteStatementOperation class in line 48, > I feel something goes wrong there. Could help me with this, if i am wrong ? > I am using Hive JDBC version 1.1.0 > The sample query is as follows, > select count(1) as cnt from table where col1=" " and col2="D"; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17004) Calculating Number Of Reducers Looks At All Files
[ https://issues.apache.org/jira/browse/HIVE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-17004: -- Assignee: Bing Li > Calculating Number Of Reducers Looks At All Files > - > > Key: HIVE-17004 > URL: https://issues.apache.org/jira/browse/HIVE-17004 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 2.1.1 >Reporter: BELUGA BEHR >Assignee: Bing Li > > When calculating the number of Mappers and Reducers, the two algorithms are > looking at different data sets. The number of Mappers are calculated based > on the number of splits and the number of Reducers are based on the number of > files within the HDFS directory. What you see is that if I add files to a > sub-directory of the HDFS directory, the number of splits remains the same > since I did not tell Hive to search recursively, and the number of Reducers > increases. Please improve this so that Reducers are looking at the same > files that are considered for splits and not at files within sub-directories > (unless configured to do so). > {code} > CREATE EXTERNAL TABLE Complaints ( > a string, > b string, > c string, > d string, > e string, > f string, > g string > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' > LOCATION '/user/admin/complaints'; > {code} > {code} > [root@host ~]# sudo -u hdfs hdfs dfs -ls -R /user/admin/complaints > -rwxr-xr-x 2 admin admin 122607137 2017-05-02 14:12 > /user/admin/complaints/Consumer_Complaints.1.csv > -rwxr-xr-x 2 admin admin 122607137 2017-05-02 14:12 > /user/admin/complaints/Consumer_Complaints.2.csv > -rwxr-xr-x 2 admin admin 122607137 2017-05-02 14:12 > /user/admin/complaints/Consumer_Complaints.3.csv > -rwxr-xr-x 2 admin admin 122607137 2017-05-02 14:12 > /user/admin/complaints/Consumer_Complaints.4.csv > -rwxr-xr-x 2 admin admin 122607137 2017-05-02 14:12 > /user/admin/complaints/Consumer_Complaints.5.csv > -rwxr-xr-x 2 admin admin 122607137 2017-05-02 14:12 > /user/admin/complaints/Consumer_Complaints.csv > {code} > {code} > INFO : Compiling > command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae): > select a, count(1) from complaints group by a limit 10 > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:a, > type:string, comment:null), FieldSchema(name:_c1, type:bigint, > comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae); > Time taken: 0.077 seconds > INFO : Executing > command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae): > select a, count(1) from complaints group by a limit 10 > INFO : Query ID = hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in serial mode > INFO : Number of reduce tasks not specified. Estimated from input data size: > 11 > INFO : In order to change the average load for a reducer (in bytes): > INFO : set hive.exec.reducers.bytes.per.reducer= > INFO : In order to limit the maximum number of reducers: > INFO : set hive.exec.reducers.max= > INFO : In order to set a constant number of reducers: > INFO : set mapreduce.job.reduces= > INFO : number of splits:2 > INFO : Submitting tokens for job: job_1493729203063_0003 > INFO : The url to track the job: > http://host:8088/proxy/application_1493729203063_0003/ > INFO : Starting Job = job_1493729203063_0003, Tracking URL = > http://host:8088/proxy/application_1493729203063_0003/ > INFO : Kill Command = > /opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/lib/hadoop/bin/hadoop job > -kill job_1493729203063_0003 > INFO : Hadoop job information for Stage-1: number of mappers: 2; number of > reducers: 11 > INFO : 2017-05-02 14:20:14,206 Stage-1 map = 0%, reduce = 0% > INFO : 2017-05-02 14:20:22,520 Stage-1 map = 100%, reduce = 0%, Cumulative > CPU 4.48 sec > INFO : 2017-05-02 14:20:34,029 Stage-1 map = 100%, reduce = 27%, Cumulative > CPU 15.72 sec > INFO : 2017-05-02 14:20:35,069 Stage-1 map = 100%, reduce = 55%, Cumulative > CPU 21.94 sec > INFO : 2017-05-02 14:20:36,110 Stage-1 map = 100%, reduce = 64%, Cumulative > CPU 23.97 sec > INFO : 2017-05-02 14:20:39,233 Stage-1 map = 100%, reduce = 73%, Cumulative > CPU 25.26 sec > INFO : 2017-05-02 14:20:43,392 Stage-1 map = 100%, reduce = 100%, > Cumulative CPU 30.9 sec > INFO : MapReduce Total cumulative CPU time: 30 seconds 900 msec > INFO : Ended Job = job_1493729203063_0003 > INFO : MapReduce Jobs Launched: > INFO : Stage-Stage-1: Map: 2 Reduce: 11 Cumulative CPU: 30.9 sec HDFS > Read: 735691149 HDFS Write: 153 SUCCESS > INFO : Total MapRe
[jira] [Work started] (HIVE-16766) Hive query with space as filter does not give proper result
[ https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16766 started by Bing Li. -- > Hive query with space as filter does not give proper result > --- > > Key: HIVE-16766 > URL: https://issues.apache.org/jira/browse/HIVE-16766 > Project: Hive > Issue Type: Bug >Reporter: Subash >Assignee: Bing Li >Priority: Critical > > Hi Team, > I have used the query as below format and it does not give proper results. > Since there is a split by \s+ in ExecuteStatementOperation class in line 48, > I feel something goes wrong there. Could help me with this, if i am wrong ? > I am using Hive JDBC version 1.1.0 > The sample query is as follows, > select count(1) as cnt from table where col1=" " and col2="D"; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071514#comment-16071514 ] Bing Li commented on HIVE-16659: Hi, [~ruili] Thank you for the review! I checked the latest code on master branch, the current patch could be applied to it directly. So I won't create a new patch file for the master branch for this Jira. I will pay attention to it in the future. > Query plan should reflect hive.spark.use.groupby.shuffle > > > Key: HIVE-16659 > URL: https://issues.apache.org/jira/browse/HIVE-16659 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li > Attachments: HIVE-16659.1.patch > > > It's useful to show the shuffle type used in the query plan. Currently it > shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-11019) Can't create an Avro table with uniontype column correctly
[ https://issues.apache.org/jira/browse/HIVE-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071516#comment-16071516 ] Bing Li commented on HIVE-11019: It could work on branch-2.3. Closed it. hive> create table avro_union(union1 uniontype)STORED AS AVRO; OK Time taken: 2.04 seconds hive> describe avro_union; OK union1 uniontype Time taken: 0.165 seconds, Fetched: 1 row(s) > Can't create an Avro table with uniontype column correctly > -- > > Key: HIVE-11019 > URL: https://issues.apache.org/jira/browse/HIVE-11019 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Bing Li >Assignee: Bing Li > > I tried the example in > https://cwiki.apache.org/confluence/display/Hive/AvroSerDe > And found that it can't create an AVRO table correctly with uniontype > hive> create table avro_union(union1 uniontype)STORED > AS AVRO; > OK > Time taken: 0.083 seconds > hive> describe avro_union; > OK > union1 uniontype > > Time taken: 0.058 seconds, Fetched: 1 row(s) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16950) Dropping hive database/table which was created explicitly in default database location, deletes all databases data from default database location
[ https://issues.apache.org/jira/browse/HIVE-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16950: -- Assignee: Bing Li > Dropping hive database/table which was created explicitly in default database > location, deletes all databases data from default database location > - > > Key: HIVE-16950 > URL: https://issues.apache.org/jira/browse/HIVE-16950 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Rahul Kalgunde >Assignee: Bing Li >Priority: Minor > > When database/table is created explicitly pointing to the default location, > dropping the database/table deletes all the data associated with the all > databases/tables. > Steps to replicate: > in below e.g. dropping table test_db2 also deletes data of test_db1 where as > metastore still contains test_db1 > hive> create database test_db1; > OK > Time taken: 4.858 seconds > hive> describe database test_db1; > OK > test_db1 > hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root > USER > Time taken: 0.599 seconds, Fetched: 1 row(s) > hive> create database test_db2 location '/apps/hive/warehouse' ; > OK > Time taken: 1.457 seconds > hive> describe database test_db2; > OK > test_db2 > hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse rootUSER > Time taken: 0.582 seconds, Fetched: 1 row(s) > hive> drop database test_db2; > OK > Time taken: 1.317 seconds > hive> dfs -ls /apps/hive/warehouse; > ls: `/apps/hive/warehouse': No such file or directory > Command failed with exit code = 1 > Query returned non-zero code: 1, cause: null > hive> describe database test_db1; > OK > test_db1 > hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root > USER > Time taken: 0.629 seconds, Fetched: 1 row(s) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-11019) Can't create an Avro table with uniontype column correctly
[ https://issues.apache.org/jira/browse/HIVE-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li resolved HIVE-11019. Resolution: Resolved > Can't create an Avro table with uniontype column correctly > -- > > Key: HIVE-11019 > URL: https://issues.apache.org/jira/browse/HIVE-11019 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Bing Li >Assignee: Bing Li > > I tried the example in > https://cwiki.apache.org/confluence/display/Hive/AvroSerDe > And found that it can't create an AVRO table correctly with uniontype > hive> create table avro_union(union1 uniontype)STORED > AS AVRO; > OK > Time taken: 0.083 seconds > hive> describe avro_union; > OK > union1 uniontype > > Time taken: 0.058 seconds, Fetched: 1 row(s) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16906) Hive ATSHook should check for yarn.timeline-service.enabled before connecting to ATS
[ https://issues.apache.org/jira/browse/HIVE-16906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16906: -- Assignee: Bing Li > Hive ATSHook should check for yarn.timeline-service.enabled before connecting > to ATS > > > Key: HIVE-16906 > URL: https://issues.apache.org/jira/browse/HIVE-16906 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.2 >Reporter: Prabhu Joseph >Assignee: Bing Li > > Hive ATShook has to check yarn.timeline-service.enabled (Indicate to clients > whether timeline service is enabled or not. If enabled, clients will put > entities and events to the timeline server.) before creating TimelineClient -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16950) Dropping hive database/table which was created explicitly in default database location, deletes all databases data from default database location
[ https://issues.apache.org/jira/browse/HIVE-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071999#comment-16071999 ] Bing Li commented on HIVE-16950: >From the description, the requirement is more like an EXTERNAL database which >has NOT been supported by Hive yet. But I think we could add some check when create/drop database to avoid this issue. There would be two ways to do this: 1. Throw an error when the target location on HDFS already exists. An existing empty directory is invalid as well. Because currently, Hive allows to create two databases with the same location. 2. ONLY drop the tables belong to the target database. With this purpose, we should get all the tables under this database when DROP DATABASE is invoked. But it would affect the performance of DROP statement. I prefer the #1. [~ashutoshc], any comments on this? Thank you. > Dropping hive database/table which was created explicitly in default database > location, deletes all databases data from default database location > - > > Key: HIVE-16950 > URL: https://issues.apache.org/jira/browse/HIVE-16950 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Rahul Kalgunde >Assignee: Bing Li >Priority: Minor > > When database/table is created explicitly pointing to the default location, > dropping the database/table deletes all the data associated with the all > databases/tables. > Steps to replicate: > in below e.g. dropping table test_db2 also deletes data of test_db1 where as > metastore still contains test_db1 > hive> create database test_db1; > OK > Time taken: 4.858 seconds > hive> describe database test_db1; > OK > test_db1 > hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root > USER > Time taken: 0.599 seconds, Fetched: 1 row(s) > hive> create database test_db2 location '/apps/hive/warehouse' ; > OK > Time taken: 1.457 seconds > hive> describe database test_db2; > OK > test_db2 > hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse rootUSER > Time taken: 0.582 seconds, Fetched: 1 row(s) > hive> drop database test_db2; > OK > Time taken: 1.317 seconds > hive> dfs -ls /apps/hive/warehouse; > ls: `/apps/hive/warehouse': No such file or directory > Command failed with exit code = 1 > Query returned non-zero code: 1, cause: null > hive> describe database test_db1; > OK > test_db1 > hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root > USER > Time taken: 0.629 seconds, Fetched: 1 row(s) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-16950) Dropping hive database/table which was created explicitly in default database location, deletes all databases data from default database location
[ https://issues.apache.org/jira/browse/HIVE-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16950 started by Bing Li. -- > Dropping hive database/table which was created explicitly in default database > location, deletes all databases data from default database location > - > > Key: HIVE-16950 > URL: https://issues.apache.org/jira/browse/HIVE-16950 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Rahul Kalgunde >Assignee: Bing Li >Priority: Minor > > When database/table is created explicitly pointing to the default location, > dropping the database/table deletes all the data associated with the all > databases/tables. > Steps to replicate: > in below e.g. dropping table test_db2 also deletes data of test_db1 where as > metastore still contains test_db1 > hive> create database test_db1; > OK > Time taken: 4.858 seconds > hive> describe database test_db1; > OK > test_db1 > hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root > USER > Time taken: 0.599 seconds, Fetched: 1 row(s) > hive> create database test_db2 location '/apps/hive/warehouse' ; > OK > Time taken: 1.457 seconds > hive> describe database test_db2; > OK > test_db2 > hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse rootUSER > Time taken: 0.582 seconds, Fetched: 1 row(s) > hive> drop database test_db2; > OK > Time taken: 1.317 seconds > hive> dfs -ls /apps/hive/warehouse; > ls: `/apps/hive/warehouse': No such file or directory > Command failed with exit code = 1 > Query returned non-zero code: 1, cause: null > hive> describe database test_db1; > OK > test_db1 > hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/test_db1.db root > USER > Time taken: 0.629 seconds, Fetched: 1 row(s) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16659: --- Attachment: HIVE-16659.2.patch Refine the patch with an test case. > Query plan should reflect hive.spark.use.groupby.shuffle > > > Key: HIVE-16659 > URL: https://issues.apache.org/jira/browse/HIVE-16659 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li > Attachments: HIVE-16659.1.patch, HIVE-16659.2.patch > > > It's useful to show the shuffle type used in the query plan. Currently it > shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073343#comment-16073343 ] Bing Li commented on HIVE-16659: [~ruili], I updated the patch based on your comment and add the link of the review request. Thank you! > Query plan should reflect hive.spark.use.groupby.shuffle > > > Key: HIVE-16659 > URL: https://issues.apache.org/jira/browse/HIVE-16659 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li > Attachments: HIVE-16659.1.patch, HIVE-16659.2.patch > > > It's useful to show the shuffle type used in the query plan. Currently it > shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16766) Hive query with space as filter does not give proper result
[ https://issues.apache.org/jira/browse/HIVE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071054#comment-16071054 ] Bing Li edited comment on HIVE-16766 at 7/4/17 8:52 AM: Hi, [~subashprabanantham] Which Hive version did you use? Could you post the reproduce queries as well? I tried it on a Hive package built from branch-2.3, and it worked for me. My Testing == *hive> describe test;* OK col1string col2string Time taken: 0.057 seconds, Fetched: 2 row(s) *hive> select * from test;* OK a1 a2 b1 b2 c1 c2 D Time taken: 0.22 seconds, Fetched: 4 row(s) *hive> select count(1) as cnt from test where col1="" and col2="D";* Query ID = root_20170630235239_b58b7dbc-14ef-4126-b56b-fdcf187acc09 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Spark Job = f25577ce-2ed6-4c5c-a64a-6ff7419ab778 -- STAGES ATTEMPTSTATUS TOTAL COMPLETED RUNNING PENDING FAILED -- Stage-5 0 FINISHED 1 100 0 Stage-6 0 FINISHED 1 100 0 -- STAGES: 02/02[==>>] 100% ELAPSED TIME: 1.01 s -- Status: Finished successfully in 1.01 seconds OK 1 Time taken: 1.436 seconds, Fetched: 1 row(s) was (Author: libing): Hi, Subash Which Hive version did you use? Could you post the reproduce queries as well? I tried it on a Hive package built from branch-2.3, and it worked for me. My Testing == *hive> describe test;* OK col1string col2string Time taken: 0.057 seconds, Fetched: 2 row(s) *hive> select * from test;* OK a1 a2 b1 b2 c1 c2 D Time taken: 0.22 seconds, Fetched: 4 row(s) *hive> select count(1) as cnt from test where col1="" and col2="D";* Query ID = root_20170630235239_b58b7dbc-14ef-4126-b56b-fdcf187acc09 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Spark Job = f25577ce-2ed6-4c5c-a64a-6ff7419ab778 -- STAGES ATTEMPTSTATUS TOTAL COMPLETED RUNNING PENDING FAILED -- Stage-5 0 FINISHED 1 100 0 Stage-6 0 FINISHED 1 100 0 -- STAGES: 02/02[==>>] 100% ELAPSED TIME: 1.01 s -- Status: Finished successfully in 1.01 seconds OK 1 Time taken: 1.436 seconds, Fetched: 1 row(s) > Hive query with space as filter does not give proper result > --- > > Key: HIVE-16766 > URL: https://issues.apache.org/jira/browse/HIVE-16766 > Project: Hive > Issue Type: Bug >Reporter: Subash >Assignee: Bing Li >Priority: Critical > > Hi Team, > I have used the query as below format and it does not give proper results. > Since there is a split by \s+ in ExecuteStatementOperation class in line 48, > I feel something goes wrong there. Could help me with this, if i am wrong ? > I am using Hive JDBC version 1.1.0 > The sample query is as follows, > select count(1) as cnt from table where col1=" " and col2="D"; -- This message was sent by Atlassian JIRA (v6.4.14#64029)