[jira] [Work logged] (HIVE-25550) Increase the RM_PROGRESS column max length to fit metrics stat
[ https://issues.apache.org/jira/browse/HIVE-25550?focusedWorklogId=657013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-657013 ] ASF GitHub Bot logged work on HIVE-25550: - Author: ASF GitHub Bot Created on: 29/Sep/21 05:52 Start Date: 29/Sep/21 05:52 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #2668: URL: https://github.com/apache/hive/pull/2668#discussion_r718173826 ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -1556,7 +1556,7 @@ - + Review comment: will this work for oracle? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 657013) Time Spent: 0.5h (was: 20m) > Increase the RM_PROGRESS column max length to fit metrics stat > -- > > Key: HIVE-25550 > URL: https://issues.apache.org/jira/browse/HIVE-25550 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Presently it fails with the following trace: > {noformat} > [[Event Name: EVENT_ALLOC_WRITE_ID; Total Number: 213; Total Time: 85347.0; > Mean: 400.6901408450704; Median: 392.0; Standard Deviation: > 33.99178239314741; Variance: 1155.4412702630862; Kurtosis: 83.69411620601193; > Skewness: 83.69411620601193; 25th Percentile: 384.0; 50th Percentile: 392.0; > 75th Percentile: 408.0; 90th Percentile: 417.0; Top 5 EventIds(EventId=Time) > {1498476=791, 1498872=533, 1497805=508, 1498808=500, 1499027=492};]]}"}]}" in > column ""RM_PROGRESS"" that has maximum length of 4000. Please correct your > data! > at > org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180) > ~{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25571) Fix Metastore script for Oracle Database
[ https://issues.apache.org/jira/browse/HIVE-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-25571: Labels: pull-request-available (was: ) > Fix Metastore script for Oracle Database > > > Key: HIVE-25571 > URL: https://issues.apache.org/jira/browse/HIVE-25571 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > > Error:1 > {noformat} > 354/359 CREATE UNIQUE INDEX DBPRIVILEGEINDEX ON DC_PRIVS > (AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE); > Error: ORA-00955: name is already used by an existing object > (state=42000,code=955) > Aborting command set because "force" is false and command failed: "CREATE > UNIQUE INDEX DBPRIVILEGEINDEX ON DC_PRIVS > (AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE);" > [ERROR] 2021-09-29 09:18:59.075 [main] MetastoreSchemaTool - Schema > initialization FAILED! Metastore state would be inconsistent! > Schema initialization FAILED! Metastore state would be inconsistent!{noformat} > Error:2 > {noformat} > Error: ORA-00900: invalid SQL statement (state=42000,code=900) > Aborting command set because "force" is false and command failed: "=== > -- HIVE-24396 > -- Create DataCo{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25571) Fix Metastore script for Oracle Database
[ https://issues.apache.org/jira/browse/HIVE-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena reassigned HIVE-25571: --- > Fix Metastore script for Oracle Database > > > Key: HIVE-25571 > URL: https://issues.apache.org/jira/browse/HIVE-25571 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > > Error:1 > {noformat} > 354/359 CREATE UNIQUE INDEX DBPRIVILEGEINDEX ON DC_PRIVS > (AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE); > Error: ORA-00955: name is already used by an existing object > (state=42000,code=955) > Aborting command set because "force" is false and command failed: "CREATE > UNIQUE INDEX DBPRIVILEGEINDEX ON DC_PRIVS > (AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE);" > [ERROR] 2021-09-29 09:18:59.075 [main] MetastoreSchemaTool - Schema > initialization FAILED! Metastore state would be inconsistent! > Schema initialization FAILED! Metastore state would be inconsistent!{noformat} > Error:2 > {noformat} > Error: ORA-00900: invalid SQL statement (state=42000,code=900) > Aborting command set because "force" is false and command failed: "=== > -- HIVE-24396 > -- Create DataCo{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25570) Hive should send full URL path for authorization for the command insert overwrite location
[ https://issues.apache.org/jira/browse/HIVE-25570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala reassigned HIVE-25570: > Hive should send full URL path for authorization for the command insert > overwrite location > -- > > Key: HIVE-25570 > URL: https://issues.apache.org/jira/browse/HIVE-25570 > Project: Hive > Issue Type: Bug > Components: Authorization, HiveServer2 >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > For authorization, Hive is currently sending the path given as input from the > user for the command, for eg > {code:java} > insert overwrite directory > '/user/warehouse/tablespace/external/something/new/test_new_tb1' select * > from test_tb1; > {code} > Hive is sending the path as > '/user/warehouse/tablespace/external/something/new/test_new_tb1' > Instead, Hive should send a fully qualified path for authorization, for e.g: > 'hdfs://hostname:port_name/user/warehouse/tablespace/external/something/new/test_new_tb1' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables
[ https://issues.apache.org/jira/browse/HIVE-25546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-25546: -- Status: Patch Available (was: Open) > Enable incremental rebuild of Materialized views with insert only source > tables > --- > > Key: HIVE-25546 > URL: https://issues.apache.org/jira/browse/HIVE-25546 > Project: Hive > Issue Type: Improvement > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code} > create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES > ('transactional'='true', 'transactional_properties'='insert_only'); > create materialized view mat1 stored as orc TBLPROPERTIES > ('transactional'='true') as > select a, b, c from t1 where a > 10; > {code} > Currently materialized view *mat1* can not be rebuilt incrementally because > it has an insert only source table (t1). Such tables does not have > ROW_ID.write_id which is required to identify newly inserted records since > the last rebuild. > HIVE-25406 adds the ability to query write_id. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25557) Hive 3.1.2 with Tez is slow to clount data in parquet format
[ https://issues.apache.org/jira/browse/HIVE-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421905#comment-17421905 ] katty he commented on HIVE-25557: - count(*) on MR wil faster than Tez, normally, count operation can only read parquet metadata, but in this case it read all the data and compute, do i am confused and there is plan: !image-2021-09-29-11-07-04-118.png! > Hive 3.1.2 with Tez is slow to clount data in parquet format > > > Key: HIVE-25557 > URL: https://issues.apache.org/jira/browse/HIVE-25557 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.1.2 > Environment: Tez *0.10.1* >Reporter: katty he >Priority: Major > Attachments: image-2021-09-29-11-07-04-118.png > > > recently, i use test a sql like seelct count(*) from table in Hive 3.1.2 with > Tez, and the table is in parquet format, normally, when counting, the query > engin can read metadata instead of reading the full data, but in my case, > Tez can not get count by metadata only, it will read the data, so it's slow, > when count 2 billion data, tez wil use 500s , and spend 60s to initialized, > ts that a problem? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25557) Hive 3.1.2 with Tez is slow to clount data in parquet format
[ https://issues.apache.org/jira/browse/HIVE-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] katty he updated HIVE-25557: Attachment: image-2021-09-29-11-07-04-118.png > Hive 3.1.2 with Tez is slow to clount data in parquet format > > > Key: HIVE-25557 > URL: https://issues.apache.org/jira/browse/HIVE-25557 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.1.2 > Environment: Tez *0.10.1* >Reporter: katty he >Priority: Major > Attachments: image-2021-09-29-11-07-04-118.png > > > recently, i use test a sql like seelct count(*) from table in Hive 3.1.2 with > Tez, and the table is in parquet format, normally, when counting, the query > engin can read metadata instead of reading the full data, but in my case, > Tez can not get count by metadata only, it will read the data, so it's slow, > when count 2 billion data, tez wil use 500s , and spend 60s to initialized, > ts that a problem? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25541) JsonSerDe: TBLPROPERTY treating nested json as String
[ https://issues.apache.org/jira/browse/HIVE-25541?focusedWorklogId=656970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656970 ] ASF GitHub Bot logged work on HIVE-25541: - Author: ASF GitHub Bot Created on: 29/Sep/21 01:38 Start Date: 29/Sep/21 01:38 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #2664: URL: https://github.com/apache/hive/pull/2664#discussion_r718082739 ## File path: serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java ## @@ -393,7 +402,16 @@ private Object visitLeafNode(final JsonNode leafNode, case DOUBLE: return Double.valueOf(leafNode.asDouble()); case STRING: - return leafNode.asText(); + if (leafNode.isValueNode()) { +return leafNode.asText(); + } else { +if (isEnabled(Feature.STRINGIFY_COMPLEX_FIELDS)) { + return leafNode.toString(); +} else { + throw new SerDeException( + "Complex field found in JSON does not match table definition: " + typeInfo.getTypeName()); Review comment: Sorry for this, I wonder that if the column is defined as varchar or char in hive schema, but corresponds to a complex field the in json, should we do something for such cases? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656970) Time Spent: 1h 50m (was: 1h 40m) > JsonSerDe: TBLPROPERTY treating nested json as String > - > > Key: HIVE-25541 > URL: https://issues.apache.org/jira/browse/HIVE-25541 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Native Jsonserde 'org.apache.hive.hcatalog.data.JsonSerDe' currently does not > support loading nested json into a string type directly. It requires the > declaring the column as complex type (struct, map, array) to unpack nested > json data. > Even though the data field is not a valid JSON String type there is value > treating it as plain String instead of throwing an exception as we currently > do. > {code:java} > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}} > {code} > This JIRA introduces an extra Table Property allowing to Stringify Complex > JSON values instead of forcing the User to define the complete nested > structure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25324) Add option to disable PartitionManagementTask
[ https://issues.apache.org/jira/browse/HIVE-25324?focusedWorklogId=656955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656955 ] ASF GitHub Bot logged work on HIVE-25324: - Author: ASF GitHub Bot Created on: 29/Sep/21 00:10 Start Date: 29/Sep/21 00:10 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2470: URL: https://github.com/apache/hive/pull/2470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656955) Time Spent: 50m (was: 40m) > Add option to disable PartitionManagementTask > - > > Key: HIVE-25324 > URL: https://issues.apache.org/jira/browse/HIVE-25324 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > When large number of tables (e.g 2000) and databases are present, > PartitionManagementTask scans all tables and partitions causing pressure on > HMS. > Currently there is no way to disable PartitionManagementTask as well. Round > about option is to provide pattern via > "metastore.partition.management.database.pattern / > metastore.partition.management.table.pattern". > It will be good to provide an option to disable it completely.{color:#807d6e} > {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25343) Create or replace view should clean the old table properties
[ https://issues.apache.org/jira/browse/HIVE-25343?focusedWorklogId=656953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656953 ] ASF GitHub Bot logged work on HIVE-25343: - Author: ASF GitHub Bot Created on: 29/Sep/21 00:10 Start Date: 29/Sep/21 00:10 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2492: URL: https://github.com/apache/hive/pull/2492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656953) Time Spent: 40m (was: 0.5h) > Create or replace view should clean the old table properties > > > Key: HIVE-25343 > URL: https://issues.apache.org/jira/browse/HIVE-25343 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.0 >Reporter: Lantao Jin >Assignee: Lantao Jin >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2021-07-19 at 15.36.29.png > > Time Spent: 40m > Remaining Estimate: 0h > > In many cases, users use Spark and Hive together. When a user creates a view > via Spark, the table output columns will store in table properties, such as > !Screen Shot 2021-07-19 at 15.36.29.png|width=80%! > After that, if the user runs the command "create or replace view" via Hive, > to change the schema. The old table properties added by Spark are not cleaned > by Hive. Then users read the table via Spark. The schema didn't change. It > very confused users. > How to reproduce: > {code} > spark-sql>create table lajin_table (a int, b int) stored as parquet; > spark-sql>create view lajin_view as select * from lajin_table; > spark-sql> desc lajin_view; > a int NULLNULL > b int NULLNULL > hive>desc lajin_view; > a int > b int > hive>create or replace view lajin_view as select a, b, 3 as c from > lajin_table; > hive>desc lajin_view; > a int > b int > c int > spark-sql> desc lajin_view; -- not changed > a int NULLNULL > b int NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23756) Added more constraints to the package.jdo file
[ https://issues.apache.org/jira/browse/HIVE-23756?focusedWorklogId=656954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656954 ] ASF GitHub Bot logged work on HIVE-23756: - Author: ASF GitHub Bot Created on: 29/Sep/21 00:10 Start Date: 29/Sep/21 00:10 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2254: URL: https://github.com/apache/hive/pull/2254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656954) Time Spent: 1h 40m (was: 1.5h) > Added more constraints to the package.jdo file > -- > > Key: HIVE-23756 > URL: https://issues.apache.org/jira/browse/HIVE-23756 > Project: Hive > Issue Type: Bug >Reporter: Ganesha Shreedhara >Assignee: Steve Carlin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23756.1.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Drop table command fails intermittently with the following exception. > {code:java} > Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent > row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT > "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at > com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at > com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) > Appat > org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372) > at > org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628) > at > org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207) > at > org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179) > at > org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901) > ... 36 more > Caused by: > com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: > Cannot delete or update a parent row: a foreign key constraint fails > ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") > REFERENCES "CDS" ("CD_ID")) > at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:377) > at com.mysql.jdbc.Util.getInstance(Util.java:360) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code} > Although HIVE-19994 resolves this issue, the FK constraint name of COLUMNS_V2 > table specified in package.jdo file is not same as the FK constraint name > used while creating COLUMNS_V2 table ([Ref|#L60]]). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries
[ https://issues.apache.org/jira/browse/HIVE-25517?focusedWorklogId=656928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656928 ] ASF GitHub Bot logged work on HIVE-25517: - Author: ASF GitHub Bot Created on: 28/Sep/21 21:36 Start Date: 28/Sep/21 21:36 Worklog Time Spent: 10m Work Description: sourabh912 commented on pull request #2638: URL: https://github.com/apache/hive/pull/2638#issuecomment-929643289 Thank you @nrg4878 for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656928) Time Spent: 2h (was: 1h 50m) > Follow up on HIVE-24951: External Table created with Uppercase name using > CTAS does not produce result for select queries > - > > Key: HIVE-25517 > URL: https://issues.apache.org/jira/browse/HIVE-25517 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Sourabh Goyal >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the > recommendation was to use getDefaultTablePath() to set the location for an > external table. This Jira addresses that and makes getDefaultTablePath() more > generic. > > cc - [~ngangam] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries
[ https://issues.apache.org/jira/browse/HIVE-25517?focusedWorklogId=656927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656927 ] ASF GitHub Bot logged work on HIVE-25517: - Author: ASF GitHub Bot Created on: 28/Sep/21 21:36 Start Date: 28/Sep/21 21:36 Worklog Time Spent: 10m Work Description: sourabh912 closed pull request #2638: URL: https://github.com/apache/hive/pull/2638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656927) Time Spent: 1h 50m (was: 1h 40m) > Follow up on HIVE-24951: External Table created with Uppercase name using > CTAS does not produce result for select queries > - > > Key: HIVE-25517 > URL: https://issues.apache.org/jira/browse/HIVE-25517 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Sourabh Goyal >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the > recommendation was to use getDefaultTablePath() to set the location for an > external table. This Jira addresses that and makes getDefaultTablePath() more > generic. > > cc - [~ngangam] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25349) Skip password authentication when a trusted header is present in the Http request
[ https://issues.apache.org/jira/browse/HIVE-25349?focusedWorklogId=656889=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656889 ] ASF GitHub Bot logged work on HIVE-25349: - Author: ASF GitHub Bot Created on: 28/Sep/21 20:28 Start Date: 28/Sep/21 20:28 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #2496: URL: https://github.com/apache/hive/pull/2496#issuecomment-929587566 recheck -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656889) Time Spent: 40m (was: 0.5h) > Skip password authentication when a trusted header is present in the Http > request > - > > Key: HIVE-25349 > URL: https://issues.apache.org/jira/browse/HIVE-25349 > Project: Hive > Issue Type: Improvement > Components: Hive, HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available, security-review-needed > Time Spent: 40m > Remaining Estimate: 0h > > Whenever a trusted header is present in the HTTP servlet request, skip the > password based authentication, since the user is pre-authorized and extract > the user name from Authorization header. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run
[ https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656863 ] ASF GitHub Bot logged work on HIVE-25538: - Author: ASF GitHub Bot Created on: 28/Sep/21 20:25 Start Date: 28/Sep/21 20:25 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2655: URL: https://github.com/apache/hive/pull/2655#discussion_r716418809 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java ## @@ -243,6 +248,53 @@ public void testReplCM() throws Throwable { Lists.newArrayList(result, result)); } + @Test + public void testReplCommitTransactionOnSourceDelete() throws Throwable { +String tableName = "testReplCommitTransactionOnSourceDelete"; +String[] result = new String[] { "5" }; + +// Do a bootstrap dump. +WarehouseInstance.Tuple bootStrapDump = primary.dump(primaryDbName); +replica.load(replicatedDbName, primaryDbName).run("REPL STATUS " + replicatedDbName) +.verifyResult(bootStrapDump.lastReplicationId); + +// Add some data to the table & do a incremental dump. +ReplicationTestUtils.insertRecords(primary, primaryDbName, primaryDbNameExtra, tableName, null, false, +ReplicationTestUtils.OperationType.REPL_TEST_ACID_INSERT); +WarehouseInstance.Tuple incrementalDump = primary.dump(primaryDbName); Review comment: Can you please add the tables with following property: - ORC Format (I think covered) - bucketed - text input format All these tables should have a drop table use case like you are targeting now? ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java ## @@ -243,6 +248,64 @@ public void testReplCM() throws Throwable { Lists.newArrayList(result, result)); } + @Test + public void testReplCommitTransactionOnSourceDeleteORC() throws Throwable { +// Run test with ORC format & with transactional true. +testReplCommitTransactionOnSourceDelete("STORED AS ORC", "'transactional'='true'"); + } + + @Test + public void testReplCommitTransactionOnSourceDeleteText() throws Throwable { +// Run test with TEXT format & with transactional true. Review comment: false? ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -401,6 +404,18 @@ private boolean isSourceFileMismatch(FileSystem sourceFs, ReplChangeManager.File return false; } + @VisibleForTesting + private void runTestOnlyExecutions() throws IOException { Review comment: Wondering if this logic can be. moved to test itself ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -65,6 +65,8 @@ private final String copyAsUser; private FileSystem destinationFs; private final int maxParallelCopyTask; + @VisibleForTesting Review comment: If the method is public does the annotation VisibleForTesting have any impact? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656863) Time Spent: 1h 20m (was: 1h 10m) > CommitTxn replay failing during incremental run > --- > > Key: HIVE-25538 > URL: https://issues.apache.org/jira/browse/HIVE-25538 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > CommitTxn Fails during incremental run, in case the source file is deleted > post copy & before checksum validation. > {noformat} > 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] > thrift.ProcessFunction: Internal error processing commit_txn > org.apache.thrift.TException: > /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_ > (is not a directory) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151) > at >
[jira] [Work logged] (HIVE-25545) Add/Drop constraints events on table should create authorizable events in HS2
[ https://issues.apache.org/jira/browse/HIVE-25545?focusedWorklogId=656797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656797 ] ASF GitHub Bot logged work on HIVE-25545: - Author: ASF GitHub Bot Created on: 28/Sep/21 20:20 Start Date: 28/Sep/21 20:20 Worklog Time Spent: 10m Work Description: nrg4878 commented on a change in pull request #2665: URL: https://github.com/apache/hive/pull/2665#discussion_r717176887 ## File path: ql/src/test/queries/clientnegative/groupby_join_pushdown.q ## @@ -22,45 +22,45 @@ FROM src f JOIN src g ON(f.key = g.key) GROUP BY f.key, g.key; EXPLAIN -SELECT f.ctinyint, g.ctinyint, SUM(f.cbigint) +SELECT f.ctinyint, g.ctinyint, SUM(f.cbigint) Review comment: did you remove the spaces on purpose or a consequence of IDE? ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/constraint/drop/AlterTableDropConstraintAnalyzer.java ## @@ -47,11 +51,18 @@ protected void analyzeCommand(TableName tableName, Map partition String constraintName = unescapeIdentifier(command.getChild(0).getText()); AlterTableDropConstraintDesc desc = new AlterTableDropConstraintDesc(tableName, null, constraintName); -rootTasks.add(TaskFactory.get(new DDLWork(getInputs(), getOutputs(), desc))); Table table = getTable(tableName); +WriteEntity.WriteType writeType = null; if (AcidUtils.isTransactionalTable(table)) { setAcidDdlDesc(desc); + writeType = WriteType.DDL_EXCLUSIVE; +} else { + writeType = WriteEntity.determineAlterTableWriteType(AlterTableType.DROP_CONSTRAINT); } +inputs.add(new ReadEntity(table)); Review comment: can we not call addInputsOutputsAlterTable() like we did for ADD CONSTRAINT? It seems like all alter can use this method. ## File path: ql/src/test/results/clientnegative/groupby_join_pushdown.q.out ## @@ -1358,249 +1358,15 @@ STAGE PLANS: PREHOOK: query: ALTER TABLE alltypesorc ADD CONSTRAINT pk_alltypesorc_1 PRIMARY KEY (ctinyint) DISABLE RELY PREHOOK: type: ALTERTABLE_ADDCONSTRAINT -POSTHOOK: query: ALTER TABLE alltypesorc ADD CONSTRAINT pk_alltypesorc_1 PRIMARY KEY (ctinyint) DISABLE RELY -POSTHOOK: type: ALTERTABLE_ADDCONSTRAINT -PREHOOK: query: explain -SELECT sum(f.cint), f.ctinyint -FROM alltypesorc f JOIN alltypesorc g ON(f.ctinyint = g.ctinyint) -GROUP BY f.ctinyint, g.ctinyint -PREHOOK: type: QUERY -PREHOOK: Input: default@alltypesorc - A masked pattern was here -POSTHOOK: query: explain -SELECT sum(f.cint), f.ctinyint -FROM alltypesorc f JOIN alltypesorc g ON(f.ctinyint = g.ctinyint) -GROUP BY f.ctinyint, g.ctinyint -POSTHOOK: type: QUERY -POSTHOOK: Input: default@alltypesorc - A masked pattern was here -STAGE DEPENDENCIES: - Stage-1 is a root stage - Stage-0 depends on stages: Stage-1 - -STAGE PLANS: - Stage: Stage-1 -Tez - A masked pattern was here - Edges: -Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE) -Reducer 3 <- Reducer 2 (SIMPLE_EDGE) - A masked pattern was here - Vertices: -Map 1 -Map Operator Tree: -TableScan - alias: f - Statistics: Num rows: 12288 Data size: 73392 Basic stats: COMPLETE Column stats: COMPLETE - Select Operator -expressions: ctinyint (type: tinyint), cint (type: int) -outputColumnNames: _col0, _col1 -Statistics: Num rows: 12288 Data size: 73392 Basic stats: COMPLETE Column stats: COMPLETE -Reduce Output Operator - key expressions: _col0 (type: tinyint) - null sort order: z - sort order: + - Map-reduce partition columns: _col0 (type: tinyint) - Statistics: Num rows: 12288 Data size: 73392 Basic stats: COMPLETE Column stats: COMPLETE - value expressions: _col1 (type: int) -Execution mode: vectorized, llap -LLAP IO: all inputs -Map 4 -Map Operator Tree: -TableScan - alias: g - Statistics: Num rows: 12288 Data size: 36696 Basic stats: COMPLETE Column stats: COMPLETE - Select Operator -expressions: ctinyint (type: tinyint) -outputColumnNames: _col0 -Statistics: Num rows: 12288 Data size: 36696 Basic stats: COMPLETE Column stats: COMPLETE -Reduce Output Operator - key expressions: _col0 (type: tinyint) - null sort order: z - sort order: + - Map-reduce partition columns: _col0 (type: tinyint) -
[jira] [Work logged] (HIVE-25566) Show column constraints for "DESC FORMATTED TABLE"
[ https://issues.apache.org/jira/browse/HIVE-25566?focusedWorklogId=656767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656767 ] ASF GitHub Bot logged work on HIVE-25566: - Author: ASF GitHub Bot Created on: 28/Sep/21 20:17 Start Date: 28/Sep/21 20:17 Worklog Time Spent: 10m Work Description: soumyakanti3578 opened a new pull request #2678: URL: https://github.com/apache/hive/pull/2678 ### What changes were proposed in this pull request? Column constraints are added with the data type to increase readability. ### Why are the changes needed? Improves readability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? mvn test -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite=true -Dqfile=show_create_table.q -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656767) Time Spent: 20m (was: 10m) > Show column constraints for "DESC FORMATTED TABLE" > -- > > Key: HIVE-25566 > URL: https://issues.apache.org/jira/browse/HIVE-25566 > Project: Hive > Issue Type: New Feature >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently, column constraints are not shown with the data type of columns. > They are shown all together at the end, but showing them with the data type > will make the description more readable. > > Example: > Create table > > {code:java} > CREATE TABLE TEST( > col1 varchar(100) NOT NULL COMMENT "comment for column 1", > col2 timestamp DEFAULT CURRENT_TIMESTAMP() COMMENT "comment for column 2", > col3 decimal, > col4 varchar(512) NOT NULL, > col5 varchar(100), > primary key(col1, col2) disable novalidate) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; > {code} > > Currently, {{DESC FORMATTED TABLE }} returns, > {code:java} > # col_namedata_type comment > col1 varchar(100)comment for column 1 > col2 timestamp comment for column 2 > col3 decimal(10,0) > col4 varchar(512) > col5 varchar(100) > # Detailed Table Information > Database: default > A masked pattern was here > Retention:0 > A masked pattern was here > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\",\"col4\":\"true\",\"col5\":\"true\"}} > bucketing_version 2 > numFiles0 > numRows 0 > rawDataSize 0 > totalSize 0 > A masked pattern was here > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > # Constraints > # Primary Key > Table:default.test > Constraint Name: A masked pattern was here > Column Name: col1 > Column Name: col2 > # Not Null Constraints > Table:default.test > Constraint Name: A masked pattern was here > Column Name: col1 > Constraint Name: A masked pattern was here > Column Name: col4
[jira] [Work logged] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?focusedWorklogId=656719=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656719 ] ASF GitHub Bot logged work on HIVE-25561: - Author: ASF GitHub Bot Created on: 28/Sep/21 20:13 Start Date: 28/Sep/21 20:13 Worklog Time Spent: 10m Work Description: zhengchenyu commented on pull request #2674: URL: https://github.com/apache/hive/pull/2674#issuecomment-928998346 @abstractdog Can you help me review it, or give me some suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656719) Time Spent: 0.5h (was: 20m) > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25550) Increase the RM_PROGRESS column max length to fit metrics stat
[ https://issues.apache.org/jira/browse/HIVE-25550?focusedWorklogId=656677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656677 ] ASF GitHub Bot logged work on HIVE-25550: - Author: ASF GitHub Bot Created on: 28/Sep/21 20:10 Start Date: 28/Sep/21 20:10 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #2668: URL: https://github.com/apache/hive/pull/2668#discussion_r717396745 ## File path: metastore/scripts/upgrade/derby/058-HIVE-23516.derby.sql ## @@ -4,7 +4,7 @@ CREATE TABLE "APP"."REPLICATION_METRICS" ( "RM_POLICY" varchar(256) NOT NULL, "RM_DUMP_EXECUTION_ID" bigint NOT NULL, "RM_METADATA" varchar(4000), - "RM_PROGRESS" varchar(4000), + "RM_PROGRESS" varchar(24000), Review comment: these files are older ones. You can skip updating them. Only the scripts inside standalone-metastore should be updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656677) Time Spent: 20m (was: 10m) > Increase the RM_PROGRESS column max length to fit metrics stat > -- > > Key: HIVE-25550 > URL: https://issues.apache.org/jira/browse/HIVE-25550 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Presently it fails with the following trace: > {noformat} > [[Event Name: EVENT_ALLOC_WRITE_ID; Total Number: 213; Total Time: 85347.0; > Mean: 400.6901408450704; Median: 392.0; Standard Deviation: > 33.99178239314741; Variance: 1155.4412702630862; Kurtosis: 83.69411620601193; > Skewness: 83.69411620601193; 25th Percentile: 384.0; 50th Percentile: 392.0; > 75th Percentile: 408.0; 90th Percentile: 417.0; Top 5 EventIds(EventId=Time) > {1498476=791, 1498872=533, 1497805=508, 1498808=500, 1499027=492};]]}"}]}" in > column ""RM_PROGRESS"" that has maximum length of 4000. Please correct your > data! > at > org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180) > ~{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit
[ https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656624 ] ASF GitHub Bot logged work on HIVE-24579: - Author: ASF GitHub Bot Created on: 28/Sep/21 20:05 Start Date: 28/Sep/21 20:05 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #2656: URL: https://github.com/apache/hive/pull/2656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656624) Time Spent: 1h 40m (was: 1.5h) > Incorrect Result For Groupby With Limit > --- > > Key: HIVE-24579 > URL: https://issues.apache.org/jira/browse/HIVE-24579 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: Nemon Lou >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > {code:sql} > create table test(id int); > explain extended select id,count(*) from test group by id limit 10; > {code} > There is an TopN unexpectly for map phase, which casues incorrect result. > {code:sql} > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: test > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > GatherStats: false > Select Operator > expressions: id (type: int) > outputColumnNames: id > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: count() > keys: id (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > tag: -1 > TopN: 10 > TopN Hash Memory Usage: 0.1 > value expressions: _col1 (type: bigint) > auto parallelism: true > Execution mode: vectorized > Path -> Alias: > file:/user/hive/warehouse/test [test] > Path -> Partition: > file:/user/hive/warehouse/test > Partition > base file name: test > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns id > columns.comments > columns.types int > file.inputformat org.apache.hadoop.mapred.TextInputFormat > file.outputformat > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > location file:/user/hive/warehouse/test > name default.test > numFiles 0 > numRows 0 > rawDataSize 0 > serialization.ddl struct test { i32 id} > serialization.format 1 > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 0 > transient_lastDdlTime 1609730190 > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
[jira] [Work logged] (HIVE-25541) JsonSerDe: TBLPROPERTY treating nested json as String
[ https://issues.apache.org/jira/browse/HIVE-25541?focusedWorklogId=656623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656623 ] ASF GitHub Bot logged work on HIVE-25541: - Author: ASF GitHub Bot Created on: 28/Sep/21 20:05 Start Date: 28/Sep/21 20:05 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #2664: URL: https://github.com/apache/hive/pull/2664#discussion_r717160062 ## File path: serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java ## @@ -393,7 +402,16 @@ private Object visitLeafNode(final JsonNode leafNode, case DOUBLE: return Double.valueOf(leafNode.asDouble()); case STRING: - return leafNode.asText(); + if (leafNode.isValueNode()) { +return leafNode.asText(); + } else { +if (isEnabled(Feature.STRINGIFY_COMPLEX_FIELDS)) { + return leafNode.toString(); +} else { + throw new SerDeException( + "Complex field found in JSON does not match table definition: " + typeInfo.getTypeName()); Review comment: could we do the same for the input of varchars or chars? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656623) Time Spent: 1h 40m (was: 1.5h) > JsonSerDe: TBLPROPERTY treating nested json as String > - > > Key: HIVE-25541 > URL: https://issues.apache.org/jira/browse/HIVE-25541 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Native Jsonserde 'org.apache.hive.hcatalog.data.JsonSerDe' currently does not > support loading nested json into a string type directly. It requires the > declaring the column as complex type (struct, map, array) to unpack nested > json data. > Even though the data field is not a valid JSON String type there is value > treating it as plain String instead of throwing an exception as we currently > do. > {code:java} > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}} > {code} > This JIRA introduces an extra Table Property allowing to Stringify Complex > JSON values instead of forcing the User to define the complete nested > structure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25569) Enable table definition over a single file
[ https://issues.apache.org/jira/browse/HIVE-25569?focusedWorklogId=656497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656497 ] ASF GitHub Bot logged work on HIVE-25569: - Author: ASF GitHub Bot Created on: 28/Sep/21 19:54 Start Date: 28/Sep/21 19:54 Worklog Time Spent: 10m Work Description: kgyrtkirk opened a new pull request #2680: URL: https://github.com/apache/hive/pull/2680 Change-Id: I6e8afa3463951c5b4e032df390df06a0d634fde7 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656497) Time Spent: 20m (was: 10m) > Enable table definition over a single file > -- > > Key: HIVE-25569 > URL: https://issues.apache.org/jira/browse/HIVE-25569 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Suppose there is a directory where multiple files are present - and by a 3rd > party database system this is perfectly normal - because its treating a > single file as the contents of the table. > Tables defined in the metastore follow a different principle - tables are > considered to be under a directory - and all files under that directory are > the contents of that directory. > To enable seamless migration/evaluation of Hive and other databases using HMS > as a metadatabackend the ability to define a table over a single file would > be usefull. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException
[ https://issues.apache.org/jira/browse/HIVE-20303?focusedWorklogId=656500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656500 ] ASF GitHub Bot logged work on HIVE-20303: - Author: ASF GitHub Bot Created on: 28/Sep/21 19:54 Start Date: 28/Sep/21 19:54 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request #2679: URL: https://github.com/apache/hive/pull/2679 ### What changes were proposed in this pull request? Extract the full table reference (DB name + table name) from the AST. ### Why are the changes needed? Without these changes queries fail with `InvalidTableException`. ### Does this PR introduce _any_ user-facing change? Queries will not fail. ### How was this patch tested? `mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert2_overwrite_partitions.q -Dtest.output.overwrite` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656500) Time Spent: 20m (was: 10m) > INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws > InvalidTableException > -- > > Key: HIVE-20303 > URL: https://issues.apache.org/jira/browse/HIVE-20303 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: xhmz >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The following scenario reproduces the problem: > {code:sql} > CREATE DATABASE db2; > CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds > STRING); > INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT > EXISTS SELECT 100, 200; > {code} > The last query ({{INSERT OVERWRITE ...}}) fails with the following stack > trace: > {noformat} > 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] > ql.Driver: FAILED: SemanticException > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > org.apache.hadoop.hive.ql.parse.SemanticException: > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > {noformat} > The problem does not reproduce when the
[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit
[ https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656481 ] ASF GitHub Bot logged work on HIVE-24579: - Author: ASF GitHub Bot Created on: 28/Sep/21 19:52 Start Date: 28/Sep/21 19:52 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2656: URL: https://github.com/apache/hive/pull/2656#discussion_r717361733 ## File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out ## @@ -71,33 +71,34 @@ STAGE PLANS: mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 316 Data size: 30020 Basic stats: COMPLETE Column stats: COMPLETE -Limit - Number of rows: 5 - Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE - Reduce Output Operator -null sort order: -sort order: -Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE -TopN Hash Memory Usage: 0.1 -value expressions: _col0 (type: string), _col1 (type: double) +Reduce Output Operator Review comment: we lost the `Limit` operator from here - as a result we will be shuffling all input rows. I think this could become more costly for larger tables than the old plan. I don't see TopN hash enabled on the reduce operator - which could possibly save the day in this case; why did we loose that as well? ## File path: ql/src/test/results/clientpositive/llap/limit_pushdown.q.out ## @@ -1075,6 +1072,13 @@ STAGE PLANS: Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 316 Data size: 30020 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: bigint) +Execution mode: vectorized, llap +LLAP IO: all inputs +Map 3 +Map Operator Tree: +TableScan + alias: src + Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE Top N Key Operator Review comment: in the old plan: did we have 2 TopN key operators in this plan which are equal? this is unrelated to this patch; but we may have an issue with its comparision - and because of that SWO is not able to simplify them ## File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out ## @@ -71,33 +71,34 @@ STAGE PLANS: mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 316 Data size: 30020 Basic stats: COMPLETE Column stats: COMPLETE -Limit - Number of rows: 5 - Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE - Reduce Output Operator -null sort order: -sort order: -Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE -TopN Hash Memory Usage: 0.1 -value expressions: _col0 (type: string), _col1 (type: double) +Reduce Output Operator Review comment: I missed that - most likely because the the row estimate was >100. In that case this doesn't seem to be a problem; however we should fix the stat estimate for the TNKO - could you open a ticket? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656481) Time Spent: 1.5h (was: 1h 20m) > Incorrect Result For Groupby With Limit > --- > > Key: HIVE-24579 > URL: https://issues.apache.org/jira/browse/HIVE-24579 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: Nemon Lou >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > {code:sql} > create table test(id int); > explain extended select id,count(*) from test group by id limit 10; > {code} > There is an TopN unexpectly for map phase, which
[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit
[ https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656446 ] ASF GitHub Bot logged work on HIVE-24579: - Author: ASF GitHub Bot Created on: 28/Sep/21 19:50 Start Date: 28/Sep/21 19:50 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2656: URL: https://github.com/apache/hive/pull/2656#discussion_r717404007 ## File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out ## @@ -71,33 +71,34 @@ STAGE PLANS: mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 316 Data size: 30020 Basic stats: COMPLETE Column stats: COMPLETE -Limit - Number of rows: 5 - Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE - Reduce Output Operator -null sort order: -sort order: -Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE -TopN Hash Memory Usage: 0.1 -value expressions: _col0 (type: string), _col1 (type: double) +Reduce Output Operator Review comment: But we still have TopNKey operator in the Mapper (both old and new plan) it filters out the majority of the rows. This query has the same issue like the example in the jira: it has gby with limit + aggregate function in the project: ``` SELECT src.key, sum(substr(src.value,5)) GROUP BY src.key LIMIT 5 ``` If no ordering is specified we may end up with incorrect aggregations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656446) Time Spent: 1h 20m (was: 1h 10m) > Incorrect Result For Groupby With Limit > --- > > Key: HIVE-24579 > URL: https://issues.apache.org/jira/browse/HIVE-24579 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: Nemon Lou >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > {code:sql} > create table test(id int); > explain extended select id,count(*) from test group by id limit 10; > {code} > There is an TopN unexpectly for map phase, which casues incorrect result. > {code:sql} > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: test > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > GatherStats: false > Select Operator > expressions: id (type: int) > outputColumnNames: id > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: count() > keys: id (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > tag: -1 > TopN: 10 > TopN Hash Memory Usage: 0.1 > value expressions: _col1 (type: bigint) > auto parallelism: true > Execution mode: vectorized > Path -> Alias: > file:/user/hive/warehouse/test [test] > Path -> Partition: > file:/user/hive/warehouse/test >
[jira] [Work logged] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries
[ https://issues.apache.org/jira/browse/HIVE-25517?focusedWorklogId=656453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656453 ] ASF GitHub Bot logged work on HIVE-25517: - Author: ASF GitHub Bot Created on: 28/Sep/21 19:50 Start Date: 28/Sep/21 19:50 Worklog Time Spent: 10m Work Description: sourabh912 commented on pull request #2638: URL: https://github.com/apache/hive/pull/2638#issuecomment-928142540 The test failure does not seem related to this patch. ``` [2021-09-22T22:28:25.641Z] [INFO] T E S T S [2021-09-22T22:28:25.641Z] [INFO] --- [2021-09-22T22:28:26.707Z] [INFO] Running org.apache.hadoop.hive.metastore.dbinstall.ITestPostgres [2021-09-22T22:29:05.786Z] [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 32.595 s <<< FAILURE! - in org.apache.hadoop.hive.metastore.dbinstall.ITestPostgres [2021-09-22T22:29:05.786Z] [ERROR] install(org.apache.hadoop.hive.metastore.dbinstall.ITestPostgres) Time elapsed: 6.768 s <<< FAILURE! [2021-09-22T22:29:05.786Z] java.lang.AssertionError: expected:<0> but was:<1> [2021-09-22T22:29:05.786Z] [2021-09-22T22:29:05.786Z] [INFO] [2021-09-22T22:29:05.786Z] [INFO] Results: [2021-09-22T22:29:05.786Z] [INFO] [2021-09-22T22:29:05.786Z] [ERROR] Failures: [2021-09-22T22:29:05.786Z] [ERROR] ITestPostgres>DbInstallBase.install:30 expected:<0> but was:<1> [2021-09-22T22:29:05.786Z] [INFO] [2021-09-22T22:29:05.786Z] [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656453) Time Spent: 1h 40m (was: 1.5h) > Follow up on HIVE-24951: External Table created with Uppercase name using > CTAS does not produce result for select queries > - > > Key: HIVE-25517 > URL: https://issues.apache.org/jira/browse/HIVE-25517 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Sourabh Goyal >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the > recommendation was to use getDefaultTablePath() to set the location for an > external table. This Jira addresses that and makes getDefaultTablePath() more > generic. > > cc - [~ngangam] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries
[ https://issues.apache.org/jira/browse/HIVE-25517?focusedWorklogId=656447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656447 ] ASF GitHub Bot logged work on HIVE-25517: - Author: ASF GitHub Bot Created on: 28/Sep/21 19:50 Start Date: 28/Sep/21 19:50 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #2638: URL: https://github.com/apache/hive/pull/2638#issuecomment-929434572 Fix has been merged to master. Please close the PR. Thank you for the work on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656447) Time Spent: 1.5h (was: 1h 20m) > Follow up on HIVE-24951: External Table created with Uppercase name using > CTAS does not produce result for select queries > - > > Key: HIVE-25517 > URL: https://issues.apache.org/jira/browse/HIVE-25517 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Sourabh Goyal >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the > recommendation was to use getDefaultTablePath() to set the location for an > external table. This Jira addresses that and makes getDefaultTablePath() more > generic. > > cc - [~ngangam] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25541) JsonSerDe: TBLPROPERTY treating nested json as String
[ https://issues.apache.org/jira/browse/HIVE-25541?focusedWorklogId=656422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656422 ] ASF GitHub Bot logged work on HIVE-25541: - Author: ASF GitHub Bot Created on: 28/Sep/21 19:47 Start Date: 28/Sep/21 19:47 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2664: URL: https://github.com/apache/hive/pull/2664#discussion_r717554831 ## File path: serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java ## @@ -393,7 +402,16 @@ private Object visitLeafNode(final JsonNode leafNode, case DOUBLE: return Double.valueOf(leafNode.asDouble()); case STRING: - return leafNode.asText(); + if (leafNode.isValueNode()) { +return leafNode.asText(); + } else { +if (isEnabled(Feature.STRINGIFY_COMPLEX_FIELDS)) { + return leafNode.toString(); +} else { + throw new SerDeException( + "Complex field found in JSON does not match table definition: " + typeInfo.getTypeName()); Review comment: Hey @dengzhhu653 not sure what you are referring to here -- this PR is targeting complex fields with non defined Hive schema (like a map of maps which is defined as a simple map) Enabling this feature will cause the JSON reader to treat the above complex field as a String (the input type is not important here) -- does it make sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656422) Time Spent: 1.5h (was: 1h 20m) > JsonSerDe: TBLPROPERTY treating nested json as String > - > > Key: HIVE-25541 > URL: https://issues.apache.org/jira/browse/HIVE-25541 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Native Jsonserde 'org.apache.hive.hcatalog.data.JsonSerDe' currently does not > support loading nested json into a string type directly. It requires the > declaring the column as complex type (struct, map, array) to unpack nested > json data. > Even though the data field is not a valid JSON String type there is value > treating it as plain String instead of throwing an exception as we currently > do. > {code:java} > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}} > {code} > This JIRA introduces an extra Table Property allowing to Stringify Complex > JSON values instead of forcing the User to define the complete nested > structure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run
[ https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656416 ] ASF GitHub Bot logged work on HIVE-25538: - Author: ASF GitHub Bot Created on: 28/Sep/21 19:47 Start Date: 28/Sep/21 19:47 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #2655: URL: https://github.com/apache/hive/pull/2655#discussion_r717329217 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -65,6 +65,8 @@ private final String copyAsUser; private FileSystem destinationFs; private final int maxParallelCopyTask; + @VisibleForTesting Review comment: Functionality wise, I think NO. It is for the devs most probably. #Copied -> The point of an annotation is that its convention and could be used in static code analysis, whereas a simple comment could not. It serves the same purpose as the normal annotations like LimitedPrivate,The InterfaceStability ones ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java ## @@ -243,6 +248,64 @@ public void testReplCM() throws Throwable { Lists.newArrayList(result, result)); } + @Test + public void testReplCommitTransactionOnSourceDeleteORC() throws Throwable { +// Run test with ORC format & with transactional true. +testReplCommitTransactionOnSourceDelete("STORED AS ORC", "'transactional'='true'"); + } + + @Test + public void testReplCommitTransactionOnSourceDeleteText() throws Throwable { +// Run test with TEXT format & with transactional true. Review comment: Yeps, Thanx Corrected ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -401,6 +404,18 @@ private boolean isSourceFileMismatch(FileSystem sourceFs, ReplChangeManager.File return false; } + @VisibleForTesting + private void runTestOnlyExecutions() throws IOException { Review comment: Yahh, My first try was to do so, I thought of using PowerMock, But MiniDfs has issue with it. Which is part of Hadoop, We can't bother that. The most I could pull out is a Callable into the test. So to avoid the delete or FS operations here, and in case in future that can be used later as well. Let me know if there is any other way out. :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656416) Time Spent: 1h 10m (was: 1h) > CommitTxn replay failing during incremental run > --- > > Key: HIVE-25538 > URL: https://issues.apache.org/jira/browse/HIVE-25538 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > CommitTxn Fails during incremental run, in case the source file is deleted > post copy & before checksum validation. > {noformat} > 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] > thrift.ProcessFunction: Internal error processing commit_txn > org.apache.thrift.TException: > /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_ > (is not a directory) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:424) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
[jira] [Commented] (HIVE-25566) Show column constraints for "DESC FORMATTED TABLE"
[ https://issues.apache.org/jira/browse/HIVE-25566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421574#comment-17421574 ] Soumyakanti Das commented on HIVE-25566: Done! > Show column constraints for "DESC FORMATTED TABLE" > -- > > Key: HIVE-25566 > URL: https://issues.apache.org/jira/browse/HIVE-25566 > Project: Hive > Issue Type: New Feature >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, column constraints are not shown with the data type of columns. > They are shown all together at the end, but showing them with the data type > will make the description more readable. > > Example: > Create table > > {code:java} > CREATE TABLE TEST( > col1 varchar(100) NOT NULL COMMENT "comment for column 1", > col2 timestamp DEFAULT CURRENT_TIMESTAMP() COMMENT "comment for column 2", > col3 decimal, > col4 varchar(512) NOT NULL, > col5 varchar(100), > primary key(col1, col2) disable novalidate) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; > {code} > > Currently, {{DESC FORMATTED TABLE }} returns, > {code:java} > # col_namedata_type comment > col1 varchar(100)comment for column 1 > col2 timestamp comment for column 2 > col3 decimal(10,0) > col4 varchar(512) > col5 varchar(100) > # Detailed Table Information > Database: default > A masked pattern was here > Retention:0 > A masked pattern was here > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\",\"col4\":\"true\",\"col5\":\"true\"}} > bucketing_version 2 > numFiles0 > numRows 0 > rawDataSize 0 > totalSize 0 > A masked pattern was here > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > # Constraints > # Primary Key > Table:default.test > Constraint Name: A masked pattern was here > Column Name: col1 > Column Name: col2 > # Not Null Constraints > Table:default.test > Constraint Name: A masked pattern was here > Column Name: col1 > Constraint Name: A masked pattern was here > Column Name: col4 > # Default Constraints > Table:default.test > Constraint Name: A masked pattern was here > Column Name:col2 Default Value:CURRENT_TIMESTAMP() > {code} > > Adding the column constraints will look something like, > {code:java} > # col_namedata_type > comment > col1 varchar(100) PRIMARY KEY NOT NULL > comment for column 1 > col2 timestamp PRIMARY KEY DEFAULT CURRENT_TIMESTAMP() > comment for column 2 > col3 decimal(10,0) > col4 varchar(512) NOT NULL > col5 varchar(100) > # Detailed Table Information > Database: default > A masked pattern was here > Retention:0 > A masked pattern was here > Table Type: MANAGED_TABLE > Table Parameters:
[jira] [Updated] (HIVE-25566) Show column constraints for "DESC FORMATTED TABLE"
[ https://issues.apache.org/jira/browse/HIVE-25566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Soumyakanti Das updated HIVE-25566: --- Description: Currently, column constraints are not shown with the data type of columns. They are shown all together at the end, but showing them with the data type will make the description more readable. Example: Create table {code:java} CREATE TABLE TEST( col1 varchar(100) NOT NULL COMMENT "comment for column 1", col2 timestamp DEFAULT CURRENT_TIMESTAMP() COMMENT "comment for column 2", col3 decimal, col4 varchar(512) NOT NULL, col5 varchar(100), primary key(col1, col2) disable novalidate) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; {code} Currently, {{DESC FORMATTED TABLE }} returns, {code:java} # col_name data_type comment col1varchar(100)comment for column 1 col2timestamp comment for column 2 col3decimal(10,0) col4varchar(512) col5varchar(100) # Detailed Table Information Database: default A masked pattern was here Retention: 0 A masked pattern was here Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\",\"col4\":\"true\",\"col5\":\"true\"}} bucketing_version 2 numFiles0 numRows 0 rawDataSize 0 totalSize 0 A masked pattern was here # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 # Constraints # Primary Key Table: default.test Constraint Name: A masked pattern was here Column Name:col1 Column Name:col2 # Not Null Constraints Table: default.test Constraint Name: A masked pattern was here Column Name:col1 Constraint Name: A masked pattern was here Column Name:col4 # Default Constraints Table: default.test Constraint Name: A masked pattern was here Column Name:col2Default Value:CURRENT_TIMESTAMP() {code} Adding the column constraints will look something like, {code:java} # col_name data_type comment col1varchar(100) PRIMARY KEY NOT NULL comment for column 1 col2timestamp PRIMARY KEY DEFAULT CURRENT_TIMESTAMP() comment for column 2 col3decimal(10,0) col4varchar(512) NOT NULL col5varchar(100) # Detailed Table Information Database: default A masked pattern was here Retention: 0 A masked pattern was here Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\",\"col4\":\"true\",\"col5\":\"true\"}} bucketing_version 2 numFiles0 numRows 0 rawDataSize 0 totalSize 0 A masked pattern was here # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:
[jira] [Updated] (HIVE-25566) Show column constraints for "DESC FORMATTED TABLE"
[ https://issues.apache.org/jira/browse/HIVE-25566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Soumyakanti Das updated HIVE-25566: --- Issue Type: New Feature (was: Improvement) > Show column constraints for "DESC FORMATTED TABLE" > -- > > Key: HIVE-25566 > URL: https://issues.apache.org/jira/browse/HIVE-25566 > Project: Hive > Issue Type: New Feature >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, column constraints are not shown with the data type of columns. > They are shown all together at the end, but showing them with the data type > will make the description more readable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries
[ https://issues.apache.org/jira/browse/HIVE-25517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-25517. -- Fix Version/s: 4.0.0 Resolution: Fixed Fix has been committed to master. Closing the jira. Thank you for the fix [~sourabh912] > Follow up on HIVE-24951: External Table created with Uppercase name using > CTAS does not produce result for select queries > - > > Key: HIVE-25517 > URL: https://issues.apache.org/jira/browse/HIVE-25517 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Sourabh Goyal >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the > recommendation was to use getDefaultTablePath() to set the location for an > external table. This Jira addresses that and makes getDefaultTablePath() more > generic. > > cc - [~ngangam] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries
[ https://issues.apache.org/jira/browse/HIVE-25517?focusedWorklogId=656260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656260 ] ASF GitHub Bot logged work on HIVE-25517: - Author: ASF GitHub Bot Created on: 28/Sep/21 17:18 Start Date: 28/Sep/21 17:18 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #2638: URL: https://github.com/apache/hive/pull/2638#issuecomment-929434572 Fix has been merged to master. Please close the PR. Thank you for the work on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656260) Time Spent: 1h 20m (was: 1h 10m) > Follow up on HIVE-24951: External Table created with Uppercase name using > CTAS does not produce result for select queries > - > > Key: HIVE-25517 > URL: https://issues.apache.org/jira/browse/HIVE-25517 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Sourabh Goyal >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the > recommendation was to use getDefaultTablePath() to set the location for an > external table. This Jira addresses that and makes getDefaultTablePath() more > generic. > > cc - [~ngangam] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25568) Estimate TopNKey operator statistics.
[ https://issues.apache.org/jira/browse/HIVE-25568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-25568: -- Description: Currently TopNKey operator has the same statistics as it's parent operator: {code} TableScan alias: src Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE Top N Key Operator sort order: + keys: key (type: string) null sort order: z Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE top n: 5 {code} This operator filters out rows and this should be indicated in statistics. was: Currently TopNKey operator has the same statistics as it's parent operator: {code} TableScan alias: src Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE Top N Key Operator sort order: + keys: key (type: string) null sort order: z Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE top n: 5 {code} This operator filters out rows and this should be indicated in statistics. > Estimate TopNKey operator statistics. > - > > Key: HIVE-25568 > URL: https://issues.apache.org/jira/browse/HIVE-25568 > Project: Hive > Issue Type: Improvement >Reporter: Krisztian Kasa >Priority: Major > > Currently TopNKey operator has the same statistics as it's parent operator: > {code} > TableScan > alias: src > Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column > stats: COMPLETE > Top N Key Operator > sort order: + > keys: key (type: string) > null sort order: z > Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column > stats: COMPLETE > top n: 5 > {code} > This operator filters out rows and this should be indicated in statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25569) Enable table definition over a single file
[ https://issues.apache.org/jira/browse/HIVE-25569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25569: -- Labels: pull-request-available (was: ) > Enable table definition over a single file > -- > > Key: HIVE-25569 > URL: https://issues.apache.org/jira/browse/HIVE-25569 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Suppose there is a directory where multiple files are present - and by a 3rd > party database system this is perfectly normal - because its treating a > single file as the contents of the table. > Tables defined in the metastore follow a different principle - tables are > considered to be under a directory - and all files under that directory are > the contents of that directory. > To enable seamless migration/evaluation of Hive and other databases using HMS > as a metadatabackend the ability to define a table over a single file would > be usefull. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25569) Enable table definition over a single file
[ https://issues.apache.org/jira/browse/HIVE-25569?focusedWorklogId=656191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656191 ] ASF GitHub Bot logged work on HIVE-25569: - Author: ASF GitHub Bot Created on: 28/Sep/21 14:42 Start Date: 28/Sep/21 14:42 Worklog Time Spent: 10m Work Description: kgyrtkirk opened a new pull request #2680: URL: https://github.com/apache/hive/pull/2680 Change-Id: I6e8afa3463951c5b4e032df390df06a0d634fde7 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656191) Remaining Estimate: 0h Time Spent: 10m > Enable table definition over a single file > -- > > Key: HIVE-25569 > URL: https://issues.apache.org/jira/browse/HIVE-25569 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Suppose there is a directory where multiple files are present - and by a 3rd > party database system this is perfectly normal - because its treating a > single file as the contents of the table. > Tables defined in the metastore follow a different principle - tables are > considered to be under a directory - and all files under that directory are > the contents of that directory. > To enable seamless migration/evaluation of Hive and other databases using HMS > as a metadatabackend the ability to define a table over a single file would > be usefull. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25569) Enable table definition over a single file
[ https://issues.apache.org/jira/browse/HIVE-25569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421431#comment-17421431 ] Zoltan Haindrich commented on HIVE-25569: - Proposed solution: SingleFileSystem Suppose we have a file in a regular filesystem (hdfs://tmp/f1.txt) - over we want to define a table. To avoid the problems we could get into by setting its parent directory as the table's dir. An sfs wrapped URI could be used: sfs+hdfs://tmp/f1.txt/SINGLEFILE. Specifying the SINGLEFILE path element instructs this filesystem to show only the f1.txt under that directory. {code} $ hdfs dfs -find 'hdfs://localhost:20500/tmp/d1/' hdfs://localhost:20500/tmp/d1 hdfs://localhost:20500/tmp/d1/f1 hdfs://localhost:20500/tmp/d1/f2 $ hdfs dfs -find 'sfs+hdfs://localhost:20500/tmp/d1/' sfs+hdfs://localhost:20500/tmp/d1 sfs+hdfs://localhost:20500/tmp/d1/f1 sfs+hdfs://localhost:20500/tmp/d1/f1/SINGLEFILE sfs+hdfs://localhost:20500/tmp/d1/f1/SINGLEFILE/f1 sfs+hdfs://localhost:20500/tmp/d1/f2 sfs+hdfs://localhost:20500/tmp/d1/f2/SINGLEFILE sfs+hdfs://localhost:20500/tmp/d1/f2/SINGLEFILE/f2 {code} > Enable table definition over a single file > -- > > Key: HIVE-25569 > URL: https://issues.apache.org/jira/browse/HIVE-25569 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > Suppose there is a directory where multiple files are present - and by a 3rd > party database system this is perfectly normal - because its treating a > single file as the contents of the table. > Tables defined in the metastore follow a different principle - tables are > considered to be under a directory - and all files under that directory are > the contents of that directory. > To enable seamless migration/evaluation of Hive and other databases using HMS > as a metadatabackend the ability to define a table over a single file would > be usefull. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25569) Enable table definition over a single file
[ https://issues.apache.org/jira/browse/HIVE-25569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25569: --- > Enable table definition over a single file > -- > > Key: HIVE-25569 > URL: https://issues.apache.org/jira/browse/HIVE-25569 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > Suppose there is a directory where multiple files are present - and by a 3rd > party database system this is perfectly normal - because its treating a > single file as the contents of the table. > Tables defined in the metastore follow a different principle - tables are > considered to be under a directory - and all files under that directory are > the contents of that directory. > To enable seamless migration/evaluation of Hive and other databases using HMS > as a metadatabackend the ability to define a table over a single file would > be usefull. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException
[ https://issues.apache.org/jira/browse/HIVE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-20303: -- Labels: pull-request-available (was: ) > INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws > InvalidTableException > -- > > Key: HIVE-20303 > URL: https://issues.apache.org/jira/browse/HIVE-20303 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: xhmz >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The following scenario reproduces the problem: > {code:sql} > CREATE DATABASE db2; > CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds > STRING); > INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT > EXISTS SELECT 100, 200; > {code} > The last query ({{INSERT OVERWRITE ...}}) fails with the following stack > trace: > {noformat} > 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] > ql.Driver: FAILED: SemanticException > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > org.apache.hadoop.hive.ql.parse.SemanticException: > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > {noformat} > The problem does not reproduce when the {{IF NOT EXISTS}} clause is not > present in the query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException
[ https://issues.apache.org/jira/browse/HIVE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-20303: -- Assignee: Stamatis Zampetakis > INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws > InvalidTableException > -- > > Key: HIVE-20303 > URL: https://issues.apache.org/jira/browse/HIVE-20303 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: xhmz >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The following scenario reproduces the problem: > {code:sql} > CREATE DATABASE db2; > CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds > STRING); > INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT > EXISTS SELECT 100, 200; > {code} > The last query ({{INSERT OVERWRITE ...}}) fails with the following stack > trace: > {noformat} > 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] > ql.Driver: FAILED: SemanticException > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > org.apache.hadoop.hive.ql.parse.SemanticException: > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > {noformat} > The problem does not reproduce when the {{IF NOT EXISTS}} clause is not > present in the query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException
[ https://issues.apache.org/jira/browse/HIVE-20303?focusedWorklogId=656167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656167 ] ASF GitHub Bot logged work on HIVE-20303: - Author: ASF GitHub Bot Created on: 28/Sep/21 14:08 Start Date: 28/Sep/21 14:08 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request #2679: URL: https://github.com/apache/hive/pull/2679 ### What changes were proposed in this pull request? Extract the full table reference (DB name + table name) from the AST. ### Why are the changes needed? Without these changes queries fail with `InvalidTableException`. ### Does this PR introduce _any_ user-facing change? Queries will not fail. ### How was this patch tested? `mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert2_overwrite_partitions.q -Dtest.output.overwrite` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656167) Remaining Estimate: 0h Time Spent: 10m > INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws > InvalidTableException > -- > > Key: HIVE-20303 > URL: https://issues.apache.org/jira/browse/HIVE-20303 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: xhmz >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The following scenario reproduces the problem: > {code:sql} > CREATE DATABASE db2; > CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds > STRING); > INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT > EXISTS SELECT 100, 200; > {code} > The last query ({{INSERT OVERWRITE ...}}) fails with the following stack > trace: > {noformat} > 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] > ql.Driver: FAILED: SemanticException > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > org.apache.hadoop.hive.ql.parse.SemanticException: > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > {noformat} > The problem does not reproduce when the {{IF NOT EXISTS}} clause is not > present in the query.
[jira] [Work logged] (HIVE-25541) JsonSerDe: TBLPROPERTY treating nested json as String
[ https://issues.apache.org/jira/browse/HIVE-25541?focusedWorklogId=656135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656135 ] ASF GitHub Bot logged work on HIVE-25541: - Author: ASF GitHub Bot Created on: 28/Sep/21 13:05 Start Date: 28/Sep/21 13:05 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2664: URL: https://github.com/apache/hive/pull/2664#discussion_r717554831 ## File path: serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java ## @@ -393,7 +402,16 @@ private Object visitLeafNode(final JsonNode leafNode, case DOUBLE: return Double.valueOf(leafNode.asDouble()); case STRING: - return leafNode.asText(); + if (leafNode.isValueNode()) { +return leafNode.asText(); + } else { +if (isEnabled(Feature.STRINGIFY_COMPLEX_FIELDS)) { + return leafNode.toString(); +} else { + throw new SerDeException( + "Complex field found in JSON does not match table definition: " + typeInfo.getTypeName()); Review comment: Hey @dengzhhu653 not sure what you are referring to here -- this PR is targeting complex fields with non defined Hive schema (like a map of maps which is defined as a simple map) Enabling this feature will cause the JSON reader to treat the above complex field as a String (the input type is not important here) -- does it make sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656135) Time Spent: 1h 20m (was: 1h 10m) > JsonSerDe: TBLPROPERTY treating nested json as String > - > > Key: HIVE-25541 > URL: https://issues.apache.org/jira/browse/HIVE-25541 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Native Jsonserde 'org.apache.hive.hcatalog.data.JsonSerDe' currently does not > support loading nested json into a string type directly. It requires the > declaring the column as complex type (struct, map, array) to unpack nested > json data. > Even though the data field is not a valid JSON String type there is value > treating it as plain String instead of throwing an exception as we currently > do. > {code:java} > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}} > {code} > This JIRA introduces an extra Table Property allowing to Stringify Complex > JSON values instead of forcing the User to define the complete nested > structure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException
[ https://issues.apache.org/jira/browse/HIVE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-20303: --- Summary: INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException (was: INSERT OVERWRITE TABLE db.table PARTITION () if not exists . will error as Table not found db (state=42000,code=4) ) > INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws > InvalidTableException > -- > > Key: HIVE-20303 > URL: https://issues.apache.org/jira/browse/HIVE-20303 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: xhmz >Priority: Major > > The following scenario reproduces the problem: > {code:sql} > CREATE DATABASE db2; > CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds > STRING); > INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT > EXISTS SELECT 100, 200; > {code} > The last query ({{INSERT OVERWRITE ...}}) fails with the following stack > trace: > {noformat} > 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] > ql.Driver: FAILED: SemanticException > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > org.apache.hadoop.hive.ql.parse.SemanticException: > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > {noformat} > The problem does not reproduce when the {{IF NOT EXISTS}} clause is not > present in the query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION () if not exists . will error as Table not found db (state=42000,code=40000)
[ https://issues.apache.org/jira/browse/HIVE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-20303: --- Description: The following scenario reproduces the problem: {code:sql} CREATE DATABASE db2; CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds STRING); INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT EXISTS SELECT 100, 200; {code} The last query ({{INSERT OVERWRITE ...}}) fails with the following stack trace: {noformat} 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] ql.Driver: FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 org.apache.hadoop.hive.ql.parse.SemanticException: org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) {noformat} The problem does not reproduce when the {{IF NOT EXISTS}} clause is not present in the query. was: if i use INSERT OVERWRITE TABLE db.table PARTITION () if not exists select xx, it wii error as Error: Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db (state=42000,code=4) but INSERT OVERWRITE TABLE db.table PARTITION () select, do not use [if not exists], it is ok > INSERT OVERWRITE TABLE db.table PARTITION () if not exists . will error as > Table not found db (state=42000,code=4) > -- > > Key: HIVE-20303 > URL: https://issues.apache.org/jira/browse/HIVE-20303 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: xhmz >Priority: Major > > The following scenario reproduces the problem: > {code:sql} > CREATE DATABASE db2; > CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds > STRING); > INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT > EXISTS SELECT 100, 200; > {code} > The last query ({{INSERT OVERWRITE ...}}) fails with the following stack > trace: > {noformat} > 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] > ql.Driver: FAILED: SemanticException > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > org.apache.hadoop.hive.ql.parse.SemanticException: > org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2 > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959) > at >
[jira] [Resolved] (HIVE-25378) Enable removal of old builds on hive ci
[ https://issues.apache.org/jira/browse/HIVE-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25378. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you Krisztian for reviewing the changes! > Enable removal of old builds on hive ci > --- > > Key: HIVE-25378 > URL: https://issues.apache.org/jira/browse/HIVE-25378 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We are using the github plugin to run builds on PRs > However to remove old builds that plugin needs to have periodic branch > scanning enabled - however since we also use the plugins merge mechanism; > this will cause to rediscover all open PRs after there is a new commit on the > target branch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit
[ https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-24579: -- Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master. Thanks [~kgyrtkirk] and [~nemon] for review. > Incorrect Result For Groupby With Limit > --- > > Key: HIVE-24579 > URL: https://issues.apache.org/jira/browse/HIVE-24579 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: Nemon Lou >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > {code:sql} > create table test(id int); > explain extended select id,count(*) from test group by id limit 10; > {code} > There is an TopN unexpectly for map phase, which casues incorrect result. > {code:sql} > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: test > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > GatherStats: false > Select Operator > expressions: id (type: int) > outputColumnNames: id > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: count() > keys: id (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > tag: -1 > TopN: 10 > TopN Hash Memory Usage: 0.1 > value expressions: _col1 (type: bigint) > auto parallelism: true > Execution mode: vectorized > Path -> Alias: > file:/user/hive/warehouse/test [test] > Path -> Partition: > file:/user/hive/warehouse/test > Partition > base file name: test > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns id > columns.comments > columns.types int > file.inputformat org.apache.hadoop.mapred.TextInputFormat > file.outputformat > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > location file:/user/hive/warehouse/test > name default.test > numFiles 0 > numRows 0 > rawDataSize 0 > serialization.ddl struct test { i32 id} > serialization.format 1 > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 0 > transient_lastDdlTime 1609730190 > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns id > columns.comments > columns.types int >
[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit
[ https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656081 ] ASF GitHub Bot logged work on HIVE-24579: - Author: ASF GitHub Bot Created on: 28/Sep/21 10:40 Start Date: 28/Sep/21 10:40 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #2656: URL: https://github.com/apache/hive/pull/2656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656081) Time Spent: 1h 10m (was: 1h) > Incorrect Result For Groupby With Limit > --- > > Key: HIVE-24579 > URL: https://issues.apache.org/jira/browse/HIVE-24579 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: Nemon Lou >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > {code:sql} > create table test(id int); > explain extended select id,count(*) from test group by id limit 10; > {code} > There is an TopN unexpectly for map phase, which casues incorrect result. > {code:sql} > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: test > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > GatherStats: false > Select Operator > expressions: id (type: int) > outputColumnNames: id > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: count() > keys: id (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > tag: -1 > TopN: 10 > TopN Hash Memory Usage: 0.1 > value expressions: _col1 (type: bigint) > auto parallelism: true > Execution mode: vectorized > Path -> Alias: > file:/user/hive/warehouse/test [test] > Path -> Partition: > file:/user/hive/warehouse/test > Partition > base file name: test > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns id > columns.comments > columns.types int > file.inputformat org.apache.hadoop.mapred.TextInputFormat > file.outputformat > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > location file:/user/hive/warehouse/test > name default.test > numFiles 0 > numRows 0 > rawDataSize 0 > serialization.ddl struct test { i32 id} > serialization.format 1 > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 0 > transient_lastDdlTime 1609730190 > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe >
[jira] [Commented] (HIVE-25566) Show column constraints for "DESC FORMATTED TABLE"
[ https://issues.apache.org/jira/browse/HIVE-25566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421307#comment-17421307 ] Stamatis Zampetakis commented on HIVE-25566: [~soumyakanti.das] Can you include a sample query with before and after output in the description to better understand the benefit of this change? > Show column constraints for "DESC FORMATTED TABLE" > -- > > Key: HIVE-25566 > URL: https://issues.apache.org/jira/browse/HIVE-25566 > Project: Hive > Issue Type: Improvement >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, column constraints are not shown with the data type of columns. > They are shown all together at the end, but showing them with the data type > will make the description more readable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421306#comment-17421306 ] Stamatis Zampetakis commented on HIVE-25561: [~zhengchenyu] Did you mean to write "duplicate file" instead of "duplicate line"? Are the contents of the files identical? > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit
[ https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656073 ] ASF GitHub Bot logged work on HIVE-24579: - Author: ASF GitHub Bot Created on: 28/Sep/21 09:47 Start Date: 28/Sep/21 09:47 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2656: URL: https://github.com/apache/hive/pull/2656#discussion_r717408925 ## File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out ## @@ -71,33 +71,34 @@ STAGE PLANS: mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 316 Data size: 30020 Basic stats: COMPLETE Column stats: COMPLETE -Limit - Number of rows: 5 - Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE - Reduce Output Operator -null sort order: -sort order: -Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE -TopN Hash Memory Usage: 0.1 -value expressions: _col0 (type: string), _col1 (type: double) +Reduce Output Operator Review comment: I missed that - most likely because the the row estimate was >100. In that case this doesn't seem to be a problem; however we should fix the stat estimate for the TNKO - could you open a ticket? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656073) Time Spent: 1h (was: 50m) > Incorrect Result For Groupby With Limit > --- > > Key: HIVE-24579 > URL: https://issues.apache.org/jira/browse/HIVE-24579 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: Nemon Lou >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > {code:sql} > create table test(id int); > explain extended select id,count(*) from test group by id limit 10; > {code} > There is an TopN unexpectly for map phase, which casues incorrect result. > {code:sql} > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: test > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > GatherStats: false > Select Operator > expressions: id (type: int) > outputColumnNames: id > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: count() > keys: id (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > tag: -1 > TopN: 10 > TopN Hash Memory Usage: 0.1 > value expressions: _col1 (type: bigint) > auto parallelism: true > Execution mode: vectorized > Path -> Alias: > file:/user/hive/warehouse/test [test] > Path -> Partition: > file:/user/hive/warehouse/test > Partition > base file name: test > input format: org.apache.hadoop.mapred.TextInputFormat > output format: >
[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit
[ https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656071 ] ASF GitHub Bot logged work on HIVE-24579: - Author: ASF GitHub Bot Created on: 28/Sep/21 09:40 Start Date: 28/Sep/21 09:40 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2656: URL: https://github.com/apache/hive/pull/2656#discussion_r717404007 ## File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out ## @@ -71,33 +71,34 @@ STAGE PLANS: mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 316 Data size: 30020 Basic stats: COMPLETE Column stats: COMPLETE -Limit - Number of rows: 5 - Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE - Reduce Output Operator -null sort order: -sort order: -Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE -TopN Hash Memory Usage: 0.1 -value expressions: _col0 (type: string), _col1 (type: double) +Reduce Output Operator Review comment: But we still have TopNKey operator in the Mapper (both old and new plan) it filters out the majority of the rows. This query has the same issue like the example in the jira: it has gby with limit + aggregate function in the project: ``` SELECT src.key, sum(substr(src.value,5)) GROUP BY src.key LIMIT 5 ``` If no ordering is specified we may end up with incorrect aggregations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656071) Time Spent: 50m (was: 40m) > Incorrect Result For Groupby With Limit > --- > > Key: HIVE-24579 > URL: https://issues.apache.org/jira/browse/HIVE-24579 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: Nemon Lou >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > {code:sql} > create table test(id int); > explain extended select id,count(*) from test group by id limit 10; > {code} > There is an TopN unexpectly for map phase, which casues incorrect result. > {code:sql} > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: test > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > GatherStats: false > Select Operator > expressions: id (type: int) > outputColumnNames: id > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: count() > keys: id (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > tag: -1 > TopN: 10 > TopN Hash Memory Usage: 0.1 > value expressions: _col1 (type: bigint) > auto parallelism: true > Execution mode: vectorized > Path -> Alias: > file:/user/hive/warehouse/test [test] > Path -> Partition: > file:/user/hive/warehouse/test > Partition >
[jira] [Work logged] (HIVE-25550) Increase the RM_PROGRESS column max length to fit metrics stat
[ https://issues.apache.org/jira/browse/HIVE-25550?focusedWorklogId=656067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656067 ] ASF GitHub Bot logged work on HIVE-25550: - Author: ASF GitHub Bot Created on: 28/Sep/21 09:30 Start Date: 28/Sep/21 09:30 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #2668: URL: https://github.com/apache/hive/pull/2668#discussion_r717396745 ## File path: metastore/scripts/upgrade/derby/058-HIVE-23516.derby.sql ## @@ -4,7 +4,7 @@ CREATE TABLE "APP"."REPLICATION_METRICS" ( "RM_POLICY" varchar(256) NOT NULL, "RM_DUMP_EXECUTION_ID" bigint NOT NULL, "RM_METADATA" varchar(4000), - "RM_PROGRESS" varchar(4000), + "RM_PROGRESS" varchar(24000), Review comment: these files are older ones. You can skip updating them. Only the scripts inside standalone-metastore should be updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656067) Remaining Estimate: 0h Time Spent: 10m > Increase the RM_PROGRESS column max length to fit metrics stat > -- > > Key: HIVE-25550 > URL: https://issues.apache.org/jira/browse/HIVE-25550 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Presently it fails with the following trace: > {noformat} > [[Event Name: EVENT_ALLOC_WRITE_ID; Total Number: 213; Total Time: 85347.0; > Mean: 400.6901408450704; Median: 392.0; Standard Deviation: > 33.99178239314741; Variance: 1155.4412702630862; Kurtosis: 83.69411620601193; > Skewness: 83.69411620601193; 25th Percentile: 384.0; 50th Percentile: 392.0; > 75th Percentile: 408.0; 90th Percentile: 417.0; Top 5 EventIds(EventId=Time) > {1498476=791, 1498872=533, 1497805=508, 1498808=500, 1499027=492};]]}"}]}" in > column ""RM_PROGRESS"" that has maximum length of 4000. Please correct your > data! > at > org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180) > ~{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25550) Increase the RM_PROGRESS column max length to fit metrics stat
[ https://issues.apache.org/jira/browse/HIVE-25550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25550: -- Labels: pull-request-available (was: ) > Increase the RM_PROGRESS column max length to fit metrics stat > -- > > Key: HIVE-25550 > URL: https://issues.apache.org/jira/browse/HIVE-25550 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Presently it fails with the following trace: > {noformat} > [[Event Name: EVENT_ALLOC_WRITE_ID; Total Number: 213; Total Time: 85347.0; > Mean: 400.6901408450704; Median: 392.0; Standard Deviation: > 33.99178239314741; Variance: 1155.4412702630862; Kurtosis: 83.69411620601193; > Skewness: 83.69411620601193; 25th Percentile: 384.0; 50th Percentile: 392.0; > 75th Percentile: 408.0; 90th Percentile: 417.0; Top 5 EventIds(EventId=Time) > {1498476=791, 1498872=533, 1497805=508, 1498808=500, 1499027=492};]]}"}]}" in > column ""RM_PROGRESS"" that has maximum length of 4000. Please correct your > data! > at > org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180) > ~{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version
[ https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421271#comment-17421271 ] hengtao tantai edited comment on HIVE-22098 at 9/28/21, 9:24 AM: - hi [~brahmareddy] i found this issus in non transactional tables was (Author: zergtant): hi [~brahmareddy] i found this issus in non transactional > Data loss occurs when multiple tables are join with different bucket_version > > > Key: HIVE-22098 > URL: https://issues.apache.org/jira/browse/HIVE-22098 > Project: Hive > Issue Type: Bug > Components: Operators >Affects Versions: 3.1.0, 3.1.2 >Reporter: GuangMing Lu >Priority: Blocker > Labels: data-loss, wrongresults > Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, > join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc > > > When different bucketVersion of tables do join and no of reducers is greater > than 2, the result is incorrect (*data loss*). > *Scenario 1*: Three tables join. The temporary result data of table_a in the > first table and table_b in the second table joins result is recorded as > tmp_a_b, When it joins with the third table, the bucket_version=2 of the > table created by default after hive-3.0.0, temporary data tmp_a_b initialized > the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In > the init method, the hash algorithm of selecting join column is selected > according to bucketVersion. If bucketVersion = 2 and is not an acid > operation, it will acquired the new algorithm of hash. Otherwise, the old > algorithm of hash is acquired. Because of the inconsistency of the algorithm > of hash, the partition of data allocation caused are different. At stage of > Reducer, Data with the same key can not be paired resulting in data loss. > *Scenario 2*: create two test tables, create table > table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES > ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) > TBLPROPERTIES ('bucketing_version'='2'); > when use table_bucketversion_1 to join table_bucketversion_2, partial result > data will be loss due to bucketVerison is different. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version
[ https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421271#comment-17421271 ] hengtao tantai commented on HIVE-22098: --- hi [~brahmareddy] i found this issus in non transactional > Data loss occurs when multiple tables are join with different bucket_version > > > Key: HIVE-22098 > URL: https://issues.apache.org/jira/browse/HIVE-22098 > Project: Hive > Issue Type: Bug > Components: Operators >Affects Versions: 3.1.0, 3.1.2 >Reporter: GuangMing Lu >Priority: Blocker > Labels: data-loss, wrongresults > Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, > join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc > > > When different bucketVersion of tables do join and no of reducers is greater > than 2, the result is incorrect (*data loss*). > *Scenario 1*: Three tables join. The temporary result data of table_a in the > first table and table_b in the second table joins result is recorded as > tmp_a_b, When it joins with the third table, the bucket_version=2 of the > table created by default after hive-3.0.0, temporary data tmp_a_b initialized > the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In > the init method, the hash algorithm of selecting join column is selected > according to bucketVersion. If bucketVersion = 2 and is not an acid > operation, it will acquired the new algorithm of hash. Otherwise, the old > algorithm of hash is acquired. Because of the inconsistency of the algorithm > of hash, the partition of data allocation caused are different. At stage of > Reducer, Data with the same key can not be paired resulting in data loss. > *Scenario 2*: create two test tables, create table > table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES > ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) > TBLPROPERTIES ('bucketing_version'='2'); > when use table_bucketversion_1 to join table_bucketversion_2, partial result > data will be loss due to bucketVerison is different. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25559) to_unix_timestamp udf result incorrect
[ https://issues.apache.org/jira/browse/HIVE-25559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421266#comment-17421266 ] Stamatis Zampetakis commented on HIVE-25559: [~zengxl] there are many reported issues around {{UNIX_TIMESTAMP}} function. Have you checked if this is already reported before? I guess this was caused by HIVE-20007, HIVE-12192. This problem may also affect master (Hive-4) can you verify? > to_unix_timestamp udf result incorrect > -- > > Key: HIVE-25559 > URL: https://issues.apache.org/jira/browse/HIVE-25559 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 3.1.2 >Reporter: zengxl >Assignee: zengxl >Priority: Critical > Attachments: HIVE-25559.1.branch-3.1.2patch > > > when I use *unix_timestamp* udf,What this function actually calls is > *to_unix_timestamp* udf.This return result is incorrect.Here is my SQL: > {code:java} > //代码占位符 > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/hive/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/hadoop-3.2.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Hive Session ID = 3a04a9cf-1fdb-4017-a4bb-14763a3163c7Logging initialized > using configuration in file:/usr/local/hive/conf/hive-log4j2.properties > Async: true > Hive Session ID = 92ca916b-cfde-43b5-bd86-10d50ff7d861 > Hive-on-MR is deprecated in Hive 2 and may not be available in the future > versions. Consider using a different execution engine (i.e. spark, tez) or > using Hive 1.X releases. > hive> select unix_timestamp('2021-09-24 00:00:00'); > OK > 1632441600 > Time taken: 3.729 seconds, Fetched: 1 row(s) > {code} > We see GenericUDFToUnixTimeStamp class code,I found that the fixed time zone > is set {color:#de350b}UTC{color}, not according to the user time zone.Time > zones vary with users,My time zone is {color:#de350b}Asia/Shanghai{color} > .Therefore, the function should use the user time zone Here is the code I > modified > {code:java} > //代码占位符 > SessionState ss = SessionState.get(); String timeZoneStr = > ss.getConf().get("hive.local.time.zone"); if (timeZoneStr == null || > timeZoneStr.trim().isEmpty() || timeZoneStr.toLowerCase().equals("local")) { > timeZoneStr = System.getProperty("user.timezone"); } > formatter.setTimeZone(TimeZone.getTimeZone(timeZoneStr)); > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?focusedWorklogId=656061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656061 ] ASF GitHub Bot logged work on HIVE-25561: - Author: ASF GitHub Bot Created on: 28/Sep/21 09:03 Start Date: 28/Sep/21 09:03 Worklog Time Spent: 10m Work Description: zhengchenyu commented on pull request #2674: URL: https://github.com/apache/hive/pull/2674#issuecomment-928998346 @abstractdog Can you help me review it, or give me some suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656061) Time Spent: 20m (was: 10m) > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit
[ https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656060 ] ASF GitHub Bot logged work on HIVE-24579: - Author: ASF GitHub Bot Created on: 28/Sep/21 08:59 Start Date: 28/Sep/21 08:59 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2656: URL: https://github.com/apache/hive/pull/2656#discussion_r717361733 ## File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out ## @@ -71,33 +71,34 @@ STAGE PLANS: mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 316 Data size: 30020 Basic stats: COMPLETE Column stats: COMPLETE -Limit - Number of rows: 5 - Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE - Reduce Output Operator -null sort order: -sort order: -Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE Column stats: COMPLETE -TopN Hash Memory Usage: 0.1 -value expressions: _col0 (type: string), _col1 (type: double) +Reduce Output Operator Review comment: we lost the `Limit` operator from here - as a result we will be shuffling all input rows. I think this could become more costly for larger tables than the old plan. I don't see TopN hash enabled on the reduce operator - which could possibly save the day in this case; why did we loose that as well? ## File path: ql/src/test/results/clientpositive/llap/limit_pushdown.q.out ## @@ -1075,6 +1072,13 @@ STAGE PLANS: Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 316 Data size: 30020 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: bigint) +Execution mode: vectorized, llap +LLAP IO: all inputs +Map 3 +Map Operator Tree: +TableScan + alias: src + Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE Top N Key Operator Review comment: in the old plan: did we have 2 TopN key operators in this plan which are equal? this is unrelated to this patch; but we may have an issue with its comparision - and because of that SWO is not able to simplify them -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656060) Time Spent: 40m (was: 0.5h) > Incorrect Result For Groupby With Limit > --- > > Key: HIVE-24579 > URL: https://issues.apache.org/jira/browse/HIVE-24579 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.3.7, 3.1.2, 4.0.0 >Reporter: Nemon Lou >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > {code:sql} > create table test(id int); > explain extended select id,count(*) from test group by id limit 10; > {code} > There is an TopN unexpectly for map phase, which casues incorrect result. > {code:sql} > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: test > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > GatherStats: false > Select Operator > expressions: id (type: int) > outputColumnNames: id > Statistics: Num rows: 1 Data size: 13500 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: count() > keys: id (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 13500 Basic stats: >
[jira] [Resolved] (HIVE-25558) create two tables and want to make some partitions on the table ,joins,union also
[ https://issues.apache.org/jira/browse/HIVE-25558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-25558. Resolution: Incomplete Closing this as incomplete. The summary is not enough to understand the context and the description is empty so it is impossible to understand the context. [~shivanagaraju] If it is a question please send an email to the user@hive list with adequate information. If you would like to report a bug or a feature make sure the summary is clear and the description has all the necessary details. > create two tables and want to make some partitions on the table ,joins,union > also > -- > > Key: HIVE-25558 > URL: https://issues.apache.org/jira/browse/HIVE-25558 > Project: Hive > Issue Type: Bug > Components: Clients >Affects Versions: 3.1.1 >Reporter: shiva >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25557) Hive 3.1.2 with Tez is slow to clount data in parquet format
[ https://issues.apache.org/jira/browse/HIVE-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421247#comment-17421247 ] Stamatis Zampetakis commented on HIVE-25557: I am not sure I understand if the problem is in Tez, Parquet or the combination. Is the COUNT query fast with MR and Parquet? Is the COUNT query fast with Tez and other format e.g., ORC? Please also include the plans ({{EXPLAIN}}) for the queries you are testing. > Hive 3.1.2 with Tez is slow to clount data in parquet format > > > Key: HIVE-25557 > URL: https://issues.apache.org/jira/browse/HIVE-25557 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.1.2 > Environment: Tez *0.10.1* >Reporter: katty he >Priority: Major > > recently, i use test a sql like seelct count(*) from table in Hive 3.1.2 with > Tez, and the table is in parquet format, normally, when counting, the query > engin can read metadata instead of reading the full data, but in my case, > Tez can not get count by metadata only, it will read the data, so it's slow, > when count 2 billion data, tez wil use 500s , and spend 60s to initialized, > ts that a problem? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run
[ https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656041 ] ASF GitHub Bot logged work on HIVE-25538: - Author: ASF GitHub Bot Created on: 28/Sep/21 08:10 Start Date: 28/Sep/21 08:10 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #2655: URL: https://github.com/apache/hive/pull/2655#discussion_r717331035 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -401,6 +404,18 @@ private boolean isSourceFileMismatch(FileSystem sourceFs, ReplChangeManager.File return false; } + @VisibleForTesting + private void runTestOnlyExecutions() throws IOException { Review comment: Yahh, My first try was to do so, I thought of using PowerMock, But MiniDfs has issue with it. Which is part of Hadoop, We can't bother that. The most I could pull out is a Callable into the test. So to avoid the delete or FS operations here, and in case in future that can be used later as well. Let me know if there is any other way out. :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656041) Time Spent: 1h (was: 50m) > CommitTxn replay failing during incremental run > --- > > Key: HIVE-25538 > URL: https://issues.apache.org/jira/browse/HIVE-25538 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > CommitTxn Fails during incremental run, in case the source file is deleted > post copy & before checksum validation. > {noformat} > 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] > thrift.ProcessFunction: Internal error processing commit_txn > org.apache.thrift.TException: > /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_ > (is not a directory) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:424) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) > at > org.apache.hadoop.hive.metastore.HMSHandler.commit_txn(HMSHandler.java:8652) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source) ~[?:?] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_261] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at com.sun.proxy.$Proxy55.commit_txn(Unknown Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$commit_txn.getResult(ThriftHiveMetastore.java:23159) >
[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run
[ https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656039 ] ASF GitHub Bot logged work on HIVE-25538: - Author: ASF GitHub Bot Created on: 28/Sep/21 08:08 Start Date: 28/Sep/21 08:08 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #2655: URL: https://github.com/apache/hive/pull/2655#discussion_r717329217 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -65,6 +65,8 @@ private final String copyAsUser; private FileSystem destinationFs; private final int maxParallelCopyTask; + @VisibleForTesting Review comment: Functionality wise, I think NO. It is for the devs most probably. #Copied -> The point of an annotation is that its convention and could be used in static code analysis, whereas a simple comment could not. It serves the same purpose as the normal annotations like LimitedPrivate,The InterfaceStability ones ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java ## @@ -243,6 +248,64 @@ public void testReplCM() throws Throwable { Lists.newArrayList(result, result)); } + @Test + public void testReplCommitTransactionOnSourceDeleteORC() throws Throwable { +// Run test with ORC format & with transactional true. +testReplCommitTransactionOnSourceDelete("STORED AS ORC", "'transactional'='true'"); + } + + @Test + public void testReplCommitTransactionOnSourceDeleteText() throws Throwable { +// Run test with TEXT format & with transactional true. Review comment: Yeps, Thanx Corrected -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656039) Time Spent: 50m (was: 40m) > CommitTxn replay failing during incremental run > --- > > Key: HIVE-25538 > URL: https://issues.apache.org/jira/browse/HIVE-25538 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > CommitTxn Fails during incremental run, in case the source file is deleted > post copy & before checksum validation. > {noformat} > 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] > thrift.ProcessFunction: Internal error processing commit_txn > org.apache.thrift.TException: > /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_ > (is not a directory) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:424) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) > at > org.apache.hadoop.hive.metastore.HMSHandler.commit_txn(HMSHandler.java:8652) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source) ~[?:?] > at >
[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run
[ https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656031=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656031 ] ASF GitHub Bot logged work on HIVE-25538: - Author: ASF GitHub Bot Created on: 28/Sep/21 07:53 Start Date: 28/Sep/21 07:53 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2655: URL: https://github.com/apache/hive/pull/2655#discussion_r717313439 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java ## @@ -243,6 +248,64 @@ public void testReplCM() throws Throwable { Lists.newArrayList(result, result)); } + @Test + public void testReplCommitTransactionOnSourceDeleteORC() throws Throwable { +// Run test with ORC format & with transactional true. +testReplCommitTransactionOnSourceDelete("STORED AS ORC", "'transactional'='true'"); + } + + @Test + public void testReplCommitTransactionOnSourceDeleteText() throws Throwable { +// Run test with TEXT format & with transactional true. Review comment: false? ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -401,6 +404,18 @@ private boolean isSourceFileMismatch(FileSystem sourceFs, ReplChangeManager.File return false; } + @VisibleForTesting + private void runTestOnlyExecutions() throws IOException { Review comment: Wondering if this logic can be. moved to test itself ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ## @@ -65,6 +65,8 @@ private final String copyAsUser; private FileSystem destinationFs; private final int maxParallelCopyTask; + @VisibleForTesting Review comment: If the method is public does the annotation VisibleForTesting have any impact? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656031) Time Spent: 40m (was: 0.5h) > CommitTxn replay failing during incremental run > --- > > Key: HIVE-25538 > URL: https://issues.apache.org/jira/browse/HIVE-25538 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > CommitTxn Fails during incremental run, in case the source file is deleted > post copy & before checksum validation. > {noformat} > 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] > thrift.ProcessFunction: Internal error processing commit_txn > org.apache.thrift.TException: > /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_ > (is not a directory) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:424) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) > at > org.apache.hadoop.hive.metastore.HMSHandler.commit_txn(HMSHandler.java:8652) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at
[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run
[ https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656027=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656027 ] ASF GitHub Bot logged work on HIVE-25538: - Author: ASF GitHub Bot Created on: 28/Sep/21 07:47 Start Date: 28/Sep/21 07:47 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2655: URL: https://github.com/apache/hive/pull/2655#discussion_r716418809 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java ## @@ -243,6 +248,53 @@ public void testReplCM() throws Throwable { Lists.newArrayList(result, result)); } + @Test + public void testReplCommitTransactionOnSourceDelete() throws Throwable { +String tableName = "testReplCommitTransactionOnSourceDelete"; +String[] result = new String[] { "5" }; + +// Do a bootstrap dump. +WarehouseInstance.Tuple bootStrapDump = primary.dump(primaryDbName); +replica.load(replicatedDbName, primaryDbName).run("REPL STATUS " + replicatedDbName) +.verifyResult(bootStrapDump.lastReplicationId); + +// Add some data to the table & do a incremental dump. +ReplicationTestUtils.insertRecords(primary, primaryDbName, primaryDbNameExtra, tableName, null, false, +ReplicationTestUtils.OperationType.REPL_TEST_ACID_INSERT); +WarehouseInstance.Tuple incrementalDump = primary.dump(primaryDbName); Review comment: Can you please add the tables with following property: - ORC Format (I think covered) - bucketed - text input format All these tables should have a drop table use case like you are targeting now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 656027) Time Spent: 0.5h (was: 20m) > CommitTxn replay failing during incremental run > --- > > Key: HIVE-25538 > URL: https://issues.apache.org/jira/browse/HIVE-25538 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > CommitTxn Fails during incremental run, in case the source file is deleted > post copy & before checksum validation. > {noformat} > 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] > thrift.ProcessFunction: Internal error processing commit_txn > org.apache.thrift.TException: > /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_ > (is not a directory) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:424) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) > at > org.apache.hadoop.hive.metastore.HMSHandler.commit_txn(HMSHandler.java:8652) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source) ~[?:?] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_261] > at
[jira] [Updated] (HIVE-25565) Materialized view Rebuild issue Aws EMR
[ https://issues.apache.org/jira/browse/HIVE-25565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vipin updated HIVE-25565: - Description: We have Materialized views built on top of Hudi tables which are hive-sync'd. Hive uses AWS Glue for its metastore catalog. We are running into issue whenever we are trying to "**rebuild**" Hive materialized views. Please note, creation of materialized views works fine. It's only rebuild which is failing. However, it does seem the rebuild actually seems to work behind the scenes but its throws some exception causing EMR steps to fail. Can anyone please guide us here, about any config changes that we need to do or anything. Any help will be great. The stack trace of the exception - {quote} FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.metadata.HiveException(Error while invoking FailureHook. hooks: java.lang.NullPointerException at org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45) at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) at org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283) at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:330) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)) > org.apache.hadoop.hive.ql.metadata.HiveException: Error while invoking FailureHook. hooks: > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)> at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296)> at org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)> at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616)> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386)> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)> at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)> at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)> at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)> at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)> at java.security.AccessController.doPrivileged(Native Method)> at javax.security.auth.Subject.doAs(Subject.java:422)> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)> at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:330)> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)> at java.util.concurrent.FutureTask.run(FutureTask.java:266)> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)> at java.lang.Thread.run(Thread.java:748)> > at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:302)> at org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)> at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616)> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386)> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)> at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)> at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)> at