[jira] [Updated] (HIVE-27669) Hive Acid CTAS fails incremental if no of rows inserted is > INT_MAX
[ https://issues.apache.org/jira/browse/HIVE-27669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshal Patel updated HIVE-27669: - Status: Patch Available (was: In Progress) https://github.com/apache/hive/pull/4665 > Hive Acid CTAS fails incremental if no of rows inserted is > INT_MAX > > > Key: HIVE-27669 > URL: https://issues.apache.org/jira/browse/HIVE-27669 > Project: Hive > Issue Type: Bug >Reporter: Harshal Patel >Assignee: Harshal Patel >Priority: Major > > * If a Table is created using CTAS with rows > INT_MAX then beeline eats up > the thrown error > * As replication also uses the same infra it should also do the same > instead of failing with NumberFormatException > *Note:* This is happening in the customer's environment consistently but we > are not able to reproduce it. So, we have gone through the whole code flow > and handled the error accordingly. > > Error message while incremental replication: > {code:java} > 4:12:03.230 PMINFODriver [Scheduled Query > Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task > [Stage-10066:REPL_STATE_LOG] in serial mode4:12:03.231 PMINFO > ReplState [Scheduled Query Executor(schedule:repl_sample_acid_1, > execution_id:49625)]: REPL::EVENT_LOAD: > {"dbName":"sample","eventId":"50442182","eventType":"EVENT_ALLOC_WRITE_ID","eventsLoadProgress":"2443/20424","loadTime":1687187523,"eventDuration":"159 > ms"}4:12:03.231 PM INFODriver [Scheduled Query > Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task > [Stage-10067:COLUMNSTATS] in serial mode4:12:03.488 PM INFODriver > [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: > Starting task [Stage-10068:DEPENDENCY_COLLECTION] in serial mode4:12:03.488 > PM INFODriver [Scheduled Query Executor(schedule:repl_sample_acid_1, > execution_id:49625)]: Starting task [Stage-10069:DDL] in serial > mode4:12:03.504 PM INFODriver [Scheduled Query > Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task > [Stage-10070:REPL_STATE_LOG] in serial mode4:12:03.504 PMINFO > ReplState [Scheduled Query Executor(schedule:repl_sample_acid_1, > execution_id:49625)]: REPL::EVENT_LOAD: > {"dbName":"sample","eventId":"50442183","eventType":"EVENT_UPDATE_TABLE_COL_STAT","eventsLoadProgress":"2444/20424","loadTime":1687187523,"eventDuration":"273 > ms"}4:12:03.504 PMINFODriver [Scheduled Query > Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task > [Stage-10071:DDL] in serial mode4:12:03.596 PM ERROR Task > [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: > Failedorg.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter > table. java.lang.NumberFormatException: For input string: "5744479373" at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:854) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableReplaceMode(CreateTableOperation.java:127) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:90) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:82) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5]at > org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5]at > org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:772) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:511) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:505) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at >
[jira] [Updated] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id
[ https://issues.apache.org/jira/browse/HIVE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21213: -- Release Note: Merged. Thanks. Resolution: Fixed Status: Resolved (was: Patch Available) > Acid table bootstrap replication needs to handle directory created by > compaction with txn id > > > Key: HIVE-21213 > URL: https://issues.apache.org/jira/browse/HIVE-21213 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, > HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > The current implementation of compaction uses the txn id in the directory > name. This is used to isolate the queries from reading the directory until > compaction has finished and to avoid the compactor marking used earlier. In > case of replication, during bootstrap , directory is copied as it is with the > same name from source to destination cluster. But the directory created by > compaction with txn id can not be copied as the txn list at target may be > different from source. The txn id which is valid at source may be an aborted > txn at target. So conversion logic is required to create a new directory with > valid txn at target and dump the data to the newly created directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27669) Hive Acid CTAS fails incremental if no of rows inserted is > INT_MAX
[ https://issues.apache.org/jira/browse/HIVE-27669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshal Patel updated HIVE-27669: - Description: * If a Table is created using CTAS with rows > INT_MAX then beeline eats up the thrown error * As replication also uses the same infra it should also do the same instead of failing with NumberFormatException *Note:* This is happening in the customer's environment consistently but we are not able to reproduce it. So, we have gone through the whole code flow and handled the error accordingly. Error message while incremental replication: {code:java} 4:12:03.230 PM INFODriver [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task [Stage-10066:REPL_STATE_LOG] in serial mode4:12:03.231 PMINFOReplState [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: REPL::EVENT_LOAD: {"dbName":"sample","eventId":"50442182","eventType":"EVENT_ALLOC_WRITE_ID","eventsLoadProgress":"2443/20424","loadTime":1687187523,"eventDuration":"159 ms"}4:12:03.231 PM INFODriver [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task [Stage-10067:COLUMNSTATS] in serial mode4:12:03.488 PM INFODriver [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task [Stage-10068:DEPENDENCY_COLLECTION] in serial mode4:12:03.488 PM INFODriver [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task [Stage-10069:DDL] in serial mode4:12:03.504 PM INFODriver [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task [Stage-10070:REPL_STATE_LOG] in serial mode4:12:03.504 PMINFOReplState [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: REPL::EVENT_LOAD: {"dbName":"sample","eventId":"50442183","eventType":"EVENT_UPDATE_TABLE_COL_STAT","eventsLoadProgress":"2444/20424","loadTime":1687187523,"eventDuration":"273 ms"}4:12:03.504 PMINFODriver [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task [Stage-10071:DDL] in serial mode4:12:03.596 PM ERROR Task[Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Failedorg.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.NumberFormatException: For input string: "5744479373" at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:854) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableReplaceMode(CreateTableOperation.java:127) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:90) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:82) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5]at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5]at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:772) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:511) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:505) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5]at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5]at org.apache.hadoop.hive.ql.scheduled.ScheduledQueryExecutionService$ScheduledQueryExecutor.processQuery(ScheduledQueryExecutionService.java:240) ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at org.apache.hadoop.hive.ql.scheduled.ScheduledQueryExecutionService$ScheduledQueryExecutor.run(ScheduledQueryExecutionService.java:193)
[jira] [Work started] (HIVE-27669) Hive Acid CTAS fails incremental if no of rows inserted is > INT_MAX
[ https://issues.apache.org/jira/browse/HIVE-27669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27669 started by Harshal Patel. > Hive Acid CTAS fails incremental if no of rows inserted is > INT_MAX > > > Key: HIVE-27669 > URL: https://issues.apache.org/jira/browse/HIVE-27669 > Project: Hive > Issue Type: Bug >Reporter: Harshal Patel >Assignee: Harshal Patel >Priority: Major > > * If a Table is created using CTAS with rows > INT_MAX then beeline eats up > the thrown error > * As replication also uses the same infra it should also do the same > instead of failing with NumberFormatException > *Note:* This is happening in the customer's environment consistently but we > are not able to reproduce it. So, we have gone through the whole code flow > and handled the error accordingly. > > Error message while incremental replication: > {code:java} > 4:12:03.230 PMINFODriver [Scheduled Query > Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task > [Stage-10066:REPL_STATE_LOG] in serial mode4:12:03.231 PMINFO > ReplState [Scheduled Query Executor(schedule:repl_sample_acid_1, > execution_id:49625)]: REPL::EVENT_LOAD: > {"dbName":"sample","eventId":"50442182","eventType":"EVENT_ALLOC_WRITE_ID","eventsLoadProgress":"2443/20424","loadTime":1687187523,"eventDuration":"159 > ms"}4:12:03.231 PM INFODriver [Scheduled Query > Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task > [Stage-10067:COLUMNSTATS] in serial mode4:12:03.488 PM INFODriver > [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: > Starting task [Stage-10068:DEPENDENCY_COLLECTION] in serial mode4:12:03.488 > PM INFODriver [Scheduled Query Executor(schedule:repl_sample_acid_1, > execution_id:49625)]: Starting task [Stage-10069:DDL] in serial > mode4:12:03.504 PM INFODriver [Scheduled Query > Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task > [Stage-10070:REPL_STATE_LOG] in serial mode4:12:03.504 PMINFO > ReplState [Scheduled Query Executor(schedule:repl_sample_acid_1, > execution_id:49625)]: REPL::EVENT_LOAD: > {"dbName":"sample","eventId":"50442183","eventType":"EVENT_UPDATE_TABLE_COL_STAT","eventsLoadProgress":"2444/20424","loadTime":1687187523,"eventDuration":"273 > ms"}4:12:03.504 PMINFODriver [Scheduled Query > Executor(schedule:repl_sample_acid_1, execution_id:49625)]: Starting task > [Stage-10071:DDL] in serial mode4:12:03.596 PM ERROR Task > [Scheduled Query Executor(schedule:repl_sample_acid_1, execution_id:49625)]: > Failedorg.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter > table. java.lang.NumberFormatException: For input string: "5744479373" at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:854) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableReplaceMode(CreateTableOperation.java:127) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:90) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:82) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5]at > org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5]at > org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:772) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:511) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:505) > ~[hive-exec-3.1.3000.7.1.8.15-5.jar:3.1.3000.7.1.8.15-5] at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) >
[jira] [Created] (HIVE-27669) Hive Acid CTAS fails incremental if no of rows inserted is > INT_MAX
Harshal Patel created HIVE-27669: Summary: Hive Acid CTAS fails incremental if no of rows inserted is > INT_MAX Key: HIVE-27669 URL: https://issues.apache.org/jira/browse/HIVE-27669 Project: Hive Issue Type: Bug Reporter: Harshal Patel Assignee: Harshal Patel * If a Table is created using CTAS with rows > INT_MAX then beeline eats up the thrown error * As replication also uses the same infra it should also do the same instead of failing with NumberFormatException *Note:* This is happening in the customer's environment consistently but we are not able to reproduce it. So, we have gone through the whole code flow and handled the error accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27664) AlterTableSetLocationAnalyzer threw a confusing exception "Cannot connect to namenode"
[ https://issues.apache.org/jira/browse/HIVE-27664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761913#comment-17761913 ] xiongyinke commented on HIVE-27664: --- [~daijy] Hi daijy ,could you help me take a look at this? The PR is https://github.com/apache/hive/pull/4651 . Best wishes! > AlterTableSetLocationAnalyzer threw a confusing exception "Cannot connect to > namenode" > -- > > Key: HIVE-27664 > URL: https://issues.apache.org/jira/browse/HIVE-27664 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0-beta-1 >Reporter: xiongyinke >Assignee: xiongyinke >Priority: Major > > @Override > protected void analyzeCommand(TableName tableName, Map > partitionSpec, ASTNode command) > throws SemanticException { > String newLocation = unescapeSQLString(command.getChild(0).getText()); > try { > // To make sure host/port pair is valid, the status of the location does not > matter > FileSystem.get(new URI(newLocation), conf).getFileStatus(new > Path(newLocation)); > } catch (FileNotFoundException e) { > // Only check host/port pair is valid, whether the file exist or not does not > matter > } catch (Exception e) { > throw new SemanticException("Cannot connect to namenode, please check if > host/port pair for " + newLocation + > " is valid", e); > } > When the > "FileSystem.get(new URI(newLocation), conf).getFileStatus(new > Path(newLocation))" > code throws a "Permission denied" exception, the Beeline client will receive > the confusing exception "Cannot connect to namenode, please check if > host/port pair for". In reality, the issue is not with the namenode. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27662) Incorrect parsing of nested complex types containing map during vectorized text processing
[ https://issues.apache.org/jira/browse/HIVE-27662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghav Aggarwal updated HIVE-27662: --- Description: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: Sample q file: {code:java} set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=true; create EXTERNAL table `table4` as select 'bob' as name, map( "Map_Key1", named_struct( 'Id', 'Id_Value1', 'Name', 'Name_Value1' ), "Map_Key2", named_struct( 'Id', 'Id_Value2', 'Name', 'Name_Value2' ) ) as testmarks; select * from table4; set hive.vectorized.execution.enabled=false; select * from table4; {code} Output of 1st select statement: {code:java} bob· {"Map_Key1":{"id":"Id_Value1\u0004Name_Value1","name":null},"Map_Key2":{"id":"Id_Value2\u0004Name_Value2","name":null}}{code} Output of 2nd select statement: {code:java} bob· {"Map_Key1":{"id":"Id_Value1","name":"Name_Value1"},"Map_Key2":{"id":"Id_Value2","name":"Name_Value2"}}{code} MAP Complex type is not handling the scenario where it contains a nested complex type like STRUCT, ARRAY, UNION. *To reproduce this issue:* *mvn test -Dtest=TestCliDriver -Pitests -Dqfile=`qfile_name`-pl itests/qtest -Dtest.output.overwrite* was: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: Sample q file: {code:java} set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=true; create EXTERNAL table `table4` as select 'bob' as name, map( "Map_Key1", named_struct( 'Id', 'Id_Value1', 'Name', 'Name_Value1' ), "Map_Key2", named_struct( 'Id', 'Id_Value2', 'Name', 'Name_Value2' ) ) as testmarks; select * from table4; set hive.vectorized.execution.enabled=false; select * from table4; {code} Output of 1st select statement: {code:java} bob· {"Map_Key1":{"id":"Id_Value1\u0004Name_Value1","name":null},"Map_Key2":{"id":"Id_Value2\u0004Name_Value2","name":null}}{code} Output of 2nd select statement: {code:java} bob· {"Map_Key1":{"id":"Id_Value1","name":"Name_Value1"},"Map_Key2":{"id":"Id_Value2","name":"Name_Value2"}}{code} MAP Complex type is not handling the scenario where it contains a nested complex type like STRUCT, ARRAY, UNION. > Incorrect parsing of nested complex types containing map during vectorized > text processing > -- > > Key: HIVE-27662 > URL: https://issues.apache.org/jira/browse/HIVE-27662 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > > When reading a text table with vectorization on and > hive.fetch.task.conversion as none, wrong parsing of delimiter is happening > in nested complex types containing map. For example, if a columns schema is > like: map then \u0004 char is coming in > the output. Here is a example: > > Sample q file: > > {code:java} > set hive.fetch.task.conversion=none; > set hive.vectorized.execution.enabled=true; > create EXTERNAL table `table4` as > select > 'bob' as name, > map( > "Map_Key1", > named_struct( > 'Id', > 'Id_Value1', > 'Name', > 'Name_Value1' > ), > "Map_Key2", > named_struct( > 'Id', > 'Id_Value2', > 'Name', > 'Name_Value2' > ) > ) as testmarks; > select * from table4; > set hive.vectorized.execution.enabled=false; > select * from table4; > {code} > Output of 1st select statement: > {code:java} > bob· > {"Map_Key1":{"id":"Id_Value1\u0004Name_Value1","name":null},"Map_Key2":{"id":"Id_Value2\u0004Name_Value2","name":null}}{code} > Output of 2nd select statement: > {code:java} > bob· > {"Map_Key1":{"id":"Id_Value1","name":"Name_Value1"},"Map_Key2":{"id":"Id_Value2","name":"Name_Value2"}}{code} > > MAP Complex type is not handling the scenario where it contains a nested > complex type like STRUCT, ARRAY, UNION. > > *To reproduce this issue:* > *mvn test -Dtest=TestCliDriver -Pitests -Dqfile=`qfile_name`-pl itests/qtest
[jira] [Resolved] (HIVE-27605) Backport of HIVE-19661 : switch Hive UDFs to use Re2J regex engine
[ https://issues.apache.org/jira/browse/HIVE-27605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-27605. - Fix Version/s: 3.2.0 Resolution: Fixed > Backport of HIVE-19661 : switch Hive UDFs to use Re2J regex engine > -- > > Key: HIVE-27605 > URL: https://issues.apache.org/jira/browse/HIVE-27605 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27662) Incorrect parsing of nested complex types containing map during vectorized text processing
[ https://issues.apache.org/jira/browse/HIVE-27662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghav Aggarwal updated HIVE-27662: --- Description: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: Sample q file: {code:java} set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=true; create EXTERNAL table `table4` as select 'bob' as name, map( "Map_Key1", named_struct( 'Id', 'Id_Value1', 'Name', 'Name_Value1' ), "Map_Key2", named_struct( 'Id', 'Id_Value2', 'Name', 'Name_Value2' ) ) as testmarks; select * from table4; set hive.vectorized.execution.enabled=false; select * from table4; {code} Output of 1st select statement: {code:java} bob· {"Map_Key1":{"id":"Id_Value1\u0004Name_Value1","name":null},"Map_Key2":{"id":"Id_Value2\u0004Name_Value2","name":null}}{code} Output of 2nd select statement: {code:java} bob· {"Map_Key1":{"id":"Id_Value1","name":"Name_Value1"},"Map_Key2":{"id":"Id_Value2","name":"Name_Value2"}}{code} MAP Complex type is not handling the scenario where it contains a nested complex type like STRUCT, ARRAY, UNION. was: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: Sample q file: {code:java} set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=true; create EXTERNAL table `table4` as select 'bob' as name, map( "Map_Key1", named_struct( 'Id', 'Id_Value1', 'Name', 'Name_Value1' ), "Map_Key2", named_struct( 'Id', 'Id_Value2', 'Name', 'Name_Value2' ) ) as testmarks; select * from table4; set hive.vectorized.execution.enabled=false; select * from table4; {code} Output of 1st select statement: {code:java} bob· {"Map_Key1":{"id":"Id_Value1\u0004Name_Value1","name":null},"Map_Key2":{"id":"Id_Value2\u0004Name_Value2","name":null}}{code} Output of 2nd select statement: {code:java} bob· {"Map_Key1":{"id":"Id_Value1","name":"Name_Value1"},"Map_Key2":{"id":"Id_Value2","name":"Name_Value2"}}{code} MAP Complex type is not handling the scenario where it contains a nested complex type like STRUCT, ARRAY, UNION. > Incorrect parsing of nested complex types containing map during vectorized > text processing > -- > > Key: HIVE-27662 > URL: https://issues.apache.org/jira/browse/HIVE-27662 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > > When reading a text table with vectorization on and > hive.fetch.task.conversion as none, wrong parsing of delimiter is happening > in nested complex types containing map. For example, if a columns schema is > like: map then \u0004 char is coming in > the output. Here is a example: > > Sample q file: > > {code:java} > set hive.fetch.task.conversion=none; > set hive.vectorized.execution.enabled=true; > create EXTERNAL table `table4` as > select > 'bob' as name, > map( > "Map_Key1", > named_struct( > 'Id', > 'Id_Value1', > 'Name', > 'Name_Value1' > ), > "Map_Key2", > named_struct( > 'Id', > 'Id_Value2', > 'Name', > 'Name_Value2' > ) > ) as testmarks; > select * from table4; > set hive.vectorized.execution.enabled=false; > select * from table4; > {code} > Output of 1st select statement: > {code:java} > bob· > {"Map_Key1":{"id":"Id_Value1\u0004Name_Value1","name":null},"Map_Key2":{"id":"Id_Value2\u0004Name_Value2","name":null}}{code} > Output of 2nd select statement: > {code:java} > bob· > {"Map_Key1":{"id":"Id_Value1","name":"Name_Value1"},"Map_Key2":{"id":"Id_Value2","name":"Name_Value2"}}{code} > > MAP Complex type is not handling the scenario where it contains a nested > complex type like STRUCT, ARRAY, UNION. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27662) Incorrect parsing of nested complex types containing map during vectorized text processing
[ https://issues.apache.org/jira/browse/HIVE-27662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghav Aggarwal updated HIVE-27662: --- Description: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: Sample q file: {code:java} set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=true; create EXTERNAL table `table4` as select 'bob' as name, map( "Map_Key1", named_struct( 'Id', 'Id_Value1', 'Name', 'Name_Value1' ), "Map_Key2", named_struct( 'Id', 'Id_Value2', 'Name', 'Name_Value2' ) ) as testmarks; select * from table4; set hive.vectorized.execution.enabled=false; select * from table4; {code} Output of 1st select statement: {code:java} bob· {"Map_Key1":{"id":"Id_Value1\u0004Name_Value1","name":null},"Map_Key2":{"id":"Id_Value2\u0004Name_Value2","name":null}}{code} Output of 2nd select statement: {code:java} bob· {"Map_Key1":{"id":"Id_Value1","name":"Name_Value1"},"Map_Key2":{"id":"Id_Value2","name":"Name_Value2"}}{code} MAP Complex type is not handling the scenario where it contains a nested complex type like STRUCT, ARRAY, UNION. was: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: Sample q file: {code:java} set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=true; create EXTERNAL table `table6` as select 'bob' as name, MAP( "Key1", ARRAY( 1, 2, 3 ), "Key2", ARRAY( 4, 5, 6 ) ) as testmarks; select * from table6; set hive.vectorized.execution.enabled=false; select * from table6; {code} Output of 1st select statement: {code:java} bob· {"Key1":null,"Key2":null} {code} Output of 2nd select statement: {code:java} bob· {"Key1":[1,2,3],"Key2":[4,5,6]} {code} MAP Complex type is not handling the scenario where it contains a nested complex type like STRUCT, ARRAY, UNION. > Incorrect parsing of nested complex types containing map during vectorized > text processing > -- > > Key: HIVE-27662 > URL: https://issues.apache.org/jira/browse/HIVE-27662 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > > When reading a text table with vectorization on and > hive.fetch.task.conversion as none, wrong parsing of delimiter is happening > in nested complex types containing map. For example, if a columns schema is > like: map then \u0004 char is coming in > the output. Here is a example: > > Sample q file: > > {code:java} > set hive.fetch.task.conversion=none; > set hive.vectorized.execution.enabled=true; > create EXTERNAL table `table4` as > select > 'bob' as name, > map( > "Map_Key1", > named_struct( > 'Id', > 'Id_Value1', > 'Name', > 'Name_Value1' > ), > "Map_Key2", > named_struct( > 'Id', > 'Id_Value2', > 'Name', > 'Name_Value2' > ) > ) as testmarks; > select * from table4; > set hive.vectorized.execution.enabled=false; > select * from table4; > {code} > Output of 1st select statement: > {code:java} > bob· > {"Map_Key1":{"id":"Id_Value1\u0004Name_Value1","name":null},"Map_Key2":{"id":"Id_Value2\u0004Name_Value2","name":null}}{code} > Output of 2nd select statement: > {code:java} > bob· > {"Map_Key1":{"id":"Id_Value1","name":"Name_Value1"},"Map_Key2":{"id":"Id_Value2","name":"Name_Value2"}}{code} > > MAP Complex type is not handling the scenario where it contains a nested > complex type like STRUCT, ARRAY, UNION. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27662) Incorrect parsing of nested complex types containing map during vectorized text processing
[ https://issues.apache.org/jira/browse/HIVE-27662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghav Aggarwal updated HIVE-27662: --- Description: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: Sample q file: {code:java} set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=true; create EXTERNAL table `table6` as select 'bob' as name, MAP( "Key1", ARRAY( 1, 2, 3 ), "Key2", ARRAY( 4, 5, 6 ) ) as testmarks; select * from table6; set hive.vectorized.execution.enabled=false; select * from table6; {code} Output of 1st select statement: {code:java} bob· {"Key1":null,"Key2":null} {code} Output of 2nd select statement: {code:java} bob· {"Key1":[1,2,3],"Key2":[4,5,6]} {code} MAP Complex type is not handling the scenario where it contains a nested complex type like STRUCT, ARRAY, UNION. was: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: Sample q file: {code:java} set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=true; create EXTERNAL table `table6` as select 'bob' as name, MAP( "Key1", ARRAY( 1, 2, 3 ), "Key2", ARRAY( 4, 5, 6 ) ) as testmarks; select * from table6; set hive.vectorized.execution.enabled=false; select * from table6; {code} Output of 1st select statement: {code:java} bob· {"Key1":null,"Key2":null} {code} Output of 2nd select statement: {code:java} bob· {"Key1":[1,2,3],"Key2":[4,5,6]} {code} > Incorrect parsing of nested complex types containing map during vectorized > text processing > -- > > Key: HIVE-27662 > URL: https://issues.apache.org/jira/browse/HIVE-27662 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > > When reading a text table with vectorization on and > hive.fetch.task.conversion as none, wrong parsing of delimiter is happening > in nested complex types containing map. For example, if a columns schema is > like: map then \u0004 char is coming in > the output. Here is a example: > Sample q file: > > {code:java} > set hive.fetch.task.conversion=none; > set hive.vectorized.execution.enabled=true; > create EXTERNAL table `table6` as > select > 'bob' as name, > MAP( > "Key1", > ARRAY( > 1, > 2, > 3 > ), > "Key2", > ARRAY( > 4, > 5, > 6 > ) > ) as testmarks; > select * from table6; > set hive.vectorized.execution.enabled=false; > select * from table6; {code} > Output of 1st select statement: > {code:java} > bob· {"Key1":null,"Key2":null} {code} > Output of 2nd select statement: > {code:java} > bob· {"Key1":[1,2,3],"Key2":[4,5,6]} {code} > > MAP Complex type is not handling the scenario where it contains a nested > complex type like STRUCT, ARRAY, UNION. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27662) Incorrect parsing of nested complex types containing map during vectorized text processing
[ https://issues.apache.org/jira/browse/HIVE-27662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghav Aggarwal updated HIVE-27662: --- Description: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: Sample q file: {code:java} set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=true; create EXTERNAL table `table6` as select 'bob' as name, MAP( "Key1", ARRAY( 1, 2, 3 ), "Key2", ARRAY( 4, 5, 6 ) ) as testmarks; select * from table6; set hive.vectorized.execution.enabled=false; select * from table6; {code} Output of 1st select statement: {code:java} bob· {"Key1":null,"Key2":null} {code} Output of 2nd select statement: {code:java} bob· {"Key1":[1,2,3],"Key2":[4,5,6]} {code} was: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: > Incorrect parsing of nested complex types containing map during vectorized > text processing > -- > > Key: HIVE-27662 > URL: https://issues.apache.org/jira/browse/HIVE-27662 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > > When reading a text table with vectorization on and > hive.fetch.task.conversion as none, wrong parsing of delimiter is happening > in nested complex types containing map. For example, if a columns schema is > like: map then \u0004 char is coming in > the output. Here is a example: > Sample q file: > > {code:java} > set hive.fetch.task.conversion=none; > set hive.vectorized.execution.enabled=true; > create EXTERNAL table `table6` as > select > 'bob' as name, > MAP( > "Key1", > ARRAY( > 1, > 2, > 3 > ), > "Key2", > ARRAY( > 4, > 5, > 6 > ) > ) as testmarks; > select * from table6; > set hive.vectorized.execution.enabled=false; > select * from table6; {code} > Output of 1st select statement: > {code:java} > bob· {"Key1":null,"Key2":null} {code} > Output of 2nd select statement: > {code:java} > bob· {"Key1":[1,2,3],"Key2":[4,5,6]} {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27662) Incorrect parsing of nested complex types containing map during vectorized text processing
[ https://issues.apache.org/jira/browse/HIVE-27662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghav Aggarwal updated HIVE-27662: --- Description: When reading a text table with vectorization on and hive.fetch.task.conversion as none, wrong parsing of delimiter is happening in nested complex types containing map. For example, if a columns schema is like: map then \u0004 char is coming in the output. Here is a example: was:When reading the data from text file format (with vectorizaton on) which contains multiple delimiter like ^A ^B ^C ^D etc i.e (\u0001, \u0002, \u0003, \u0004), incorrect parsing of data is happening which leads to incorrect result. > Incorrect parsing of nested complex types containing map during vectorized > text processing > -- > > Key: HIVE-27662 > URL: https://issues.apache.org/jira/browse/HIVE-27662 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > > When reading a text table with vectorization on and > hive.fetch.task.conversion as none, wrong parsing of delimiter is happening > in nested complex types containing map. For example, if a columns schema is > like: map then \u0004 char is coming in > the output. Here is a example: > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27662) Incorrect parsing of nested complex types containing map during vectorized text processing
[ https://issues.apache.org/jira/browse/HIVE-27662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghav Aggarwal updated HIVE-27662: --- Summary: Incorrect parsing of nested complex types containing map during vectorized text processing (was: Incorrect parsing of complex type during vectorized text processing of data having multiple delimiters) > Incorrect parsing of nested complex types containing map during vectorized > text processing > -- > > Key: HIVE-27662 > URL: https://issues.apache.org/jira/browse/HIVE-27662 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > > When reading the data from text file format (with vectorizaton on) which > contains multiple delimiter like ^A ^B ^C ^D etc i.e (\u0001, \u0002, \u0003, > \u0004), incorrect parsing of data is happening which leads to incorrect > result. -- This message was sent by Atlassian Jira (v8.20.10#820010)