[jira] [Updated] (HIVE-19886) Logs may be directed to 2 files if --hiveconf hive.log.file is used
[ https://issues.apache.org/jira/browse/HIVE-19886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19886: -- Labels: pull-request-available (was: ) > Logs may be directed to 2 files if --hiveconf hive.log.file is used > --- > > Key: HIVE-19886 > URL: https://issues.apache.org/jira/browse/HIVE-19886 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 3.1.0, 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Jaume M >Priority: Major > Labels: pull-request-available > > hive launch script explicitly specific log4j2 configuration file to use. The > main() methods in HiveServer2 and HiveMetastore reconfigures the logger based > on user input via --hiveconf hive.log.file. This may cause logs to end up in > 2 different files. Initial logs goes to the file specified in > hive-log4j2.properties and after logger reconfiguration the rest of the logs > goes to the file specified via --hiveconf hive.log.file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19831) Hiveserver2 should skip doAuth checks for CREATE DATABASE/TABLE if database/table already exists
[ https://issues.apache.org/jira/browse/HIVE-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19831: -- Labels: pull-request-available (was: ) > Hiveserver2 should skip doAuth checks for CREATE DATABASE/TABLE if > database/table already exists > > > Key: HIVE-19831 > URL: https://issues.apache.org/jira/browse/HIVE-19831 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 1.2.1, 2.1.0 >Reporter: Rajkumar Singh >Priority: Minor > Labels: pull-request-available > > with sqlstdauth on, Create database if exists take TOO LONG if there are too > many objects inside the database directory. Hive should not run the doAuth > checks for all the objects within database if the database already exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19831) Hiveserver2 should skip doAuth checks for CREATE DATABASE/TABLE if database/table already exists
[ https://issues.apache.org/jira/browse/HIVE-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510293#comment-16510293 ] ASF GitHub Bot commented on HIVE-19831: --- GitHub user rajkrrsingh opened a pull request: https://github.com/apache/hive/pull/372 HIVE-19831: Hiveserver2 should skip doAuth checks for CREATE DATABASE… Hiveserver2 should skip doAuth checks for CREATE DATABASE/TABLE if database/table already exists. the proposed change will skip the authorization check if the database is already exists. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajkrrsingh/hive HIVE-19831 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/372.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #372 commit f62c4bbaf9d3bdfd762492fc3fc49772ce8b625a Author: Rajkumar singh Date: 2018-06-12T22:02:41Z HIVE-19831: Hiveserver2 should skip doAuth checks for CREATE DATABASE/TABLE if database/table already exists > Hiveserver2 should skip doAuth checks for CREATE DATABASE/TABLE if > database/table already exists > > > Key: HIVE-19831 > URL: https://issues.apache.org/jira/browse/HIVE-19831 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 1.2.1, 2.1.0 >Reporter: Rajkumar Singh >Priority: Minor > Labels: pull-request-available > > with sqlstdauth on, Create database if exists take TOO LONG if there are too > many objects inside the database directory. Hive should not run the doAuth > checks for all the objects within database if the database already exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-16520: -- Labels: TODOC3.0 pull-request-available (was: TODOC3.0) > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai >Priority: Major > Labels: TODOC3.0, pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-16520-1.patch, HIVE-16520-proto-2.patch, > HIVE-16520-proto.patch, HIVE-16520.2.patch, HIVE-16520.3.patch, > HIVE-16520.4.patch > > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510342#comment-16510342 ] ASF GitHub Bot commented on HIVE-16520: --- Github user daijyc closed the pull request at: https://github.com/apache/hive/pull/173 > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai >Priority: Major > Labels: TODOC3.0, pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-16520-1.patch, HIVE-16520-proto-2.patch, > HIVE-16520-proto.patch, HIVE-16520.2.patch, HIVE-16520.3.patch, > HIVE-16520.4.patch > > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19880) Repl Load to return recoverable vs non-recoverable error codes
[ https://issues.apache.org/jira/browse/HIVE-19880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511389#comment-16511389 ] ASF GitHub Bot commented on HIVE-19880: --- GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/374 HIVE-19880 : Repl Load to return recoverable vs non-recoverable error… … You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive BUG-103748 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/374.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #374 commit baaed4a0a3ac4cefe4e8069710bc32a6e3f2c59a Author: Mahesh Kumar Behera Date: 2018-06-13T13:19:44Z HIVE-19880 : Repl Load to return recoverable vs non-recoverable error codes > Repl Load to return recoverable vs non-recoverable error codes > -- > > Key: HIVE-19880 > URL: https://issues.apache.org/jira/browse/HIVE-19880 > Project: Hive > Issue Type: Task > Components: repl >Affects Versions: 3.1.0, 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0, 4.0.0 > > Attachments: HIVE-19880.01.patch > > > To enable bootstrap of large databases, application has to have the ability > to keep retrying the bootstrap load till it encounters a fatal error. The > ability to identify if an error is fatal or not will be decided by hive and > communication of the same will happen to application via error codes. > So there should be different error codes for recoverable vs non-recoverable > failures which should be propagated to application as part of running the > repl load command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19881) Allow metadata dump for database which are not source of replication
[ https://issues.apache.org/jira/browse/HIVE-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511387#comment-16511387 ] ASF GitHub Bot commented on HIVE-19881: --- GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/373 HIVE-19881 : Allow metadata dump for database which are not source of replication … You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive BUG-105280 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/373.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #373 commit 36f6cd4b56bf36a1f5217406bd62de03ab338dda Author: Mahesh Kumar Behera Date: 2018-06-13T15:33:33Z HIVE-19881 : Allow metadata dump for database which are not source of replication > Allow metadata dump for database which are not source of replication > > > Key: HIVE-19881 > URL: https://issues.apache.org/jira/browse/HIVE-19881 > Project: Hive > Issue Type: Task > Components: repl >Affects Versions: 3.1.0, 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0, 4.0.0 > > Attachments: HIVE-19881.01..patch > > > If the dump is meta data only then allow dump even if the db is not source of > replication -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19880) Repl Load to return recoverable vs non-recoverable error codes
[ https://issues.apache.org/jira/browse/HIVE-19880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19880: -- Labels: pull-request-available (was: ) > Repl Load to return recoverable vs non-recoverable error codes > -- > > Key: HIVE-19880 > URL: https://issues.apache.org/jira/browse/HIVE-19880 > Project: Hive > Issue Type: Task > Components: repl >Affects Versions: 3.1.0, 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0, 4.0.0 > > Attachments: HIVE-19880.01.patch > > > To enable bootstrap of large databases, application has to have the ability > to keep retrying the bootstrap load till it encounters a fatal error. The > ability to identify if an error is fatal or not will be decided by hive and > communication of the same will happen to application via error codes. > So there should be different error codes for recoverable vs non-recoverable > failures which should be propagated to application as part of running the > repl load command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19881) Allow metadata dump for database which are not source of replication
[ https://issues.apache.org/jira/browse/HIVE-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19881: -- Labels: pull-request-available (was: ) > Allow metadata dump for database which are not source of replication > > > Key: HIVE-19881 > URL: https://issues.apache.org/jira/browse/HIVE-19881 > Project: Hive > Issue Type: Task > Components: repl >Affects Versions: 3.1.0, 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0, 4.0.0 > > Attachments: HIVE-19881.01..patch > > > If the dump is meta data only then allow dump even if the db is not source of > replication -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19739) Bootstrap REPL LOAD to use checkpoints to validate and skip the loaded data/metadata.
[ https://issues.apache.org/jira/browse/HIVE-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512614#comment-16512614 ] ASF GitHub Bot commented on HIVE-19739: --- Github user sankarh closed the pull request at: https://github.com/apache/hive/pull/366 > Bootstrap REPL LOAD to use checkpoints to validate and skip the loaded > data/metadata. > - > > Key: HIVE-19739 > URL: https://issues.apache.org/jira/browse/HIVE-19739 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, pull-request-available, replication > Fix For: 4.0.0 > > Attachments: HIVE-19739.01-branch-3.patch, HIVE-19739.01.patch, > HIVE-19739.02.patch, HIVE-19739.03.patch, HIVE-19739.04.patch > > > Currently. bootstrap REPL LOAD have added checkpoint identifiers in > DB/table/partition object properties once the data/metadata related to the > object is successfully loaded. > If the Db exist and is not empty, then currently we are throwing exception. > But need to support it for the retry scenario after a failure. > If there is a retry of bootstrap load using the same dump, then instead of > throwing error, we should check if any of the tables/partitions are > completely loaded using the checkpoint identifiers. If yes, then skip it or > else drop/create them again. > If the bootstrap load is performed using different dump, then it should throw > exception. > Allow bootstrap on empty Db only if ckpt property is not set. Also, if > bootstrap load is completed on the target Db, then shouldn't allow bootstrap > retry at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19723) Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
[ https://issues.apache.org/jira/browse/HIVE-19723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507617#comment-16507617 ] ASF GitHub Bot commented on HIVE-19723: --- Github user pudidic closed the pull request at: https://github.com/apache/hive/pull/369 > Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)" > - > > Key: HIVE-19723 > URL: https://issues.apache.org/jira/browse/HIVE-19723 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0 > > Attachments: HIVE-19723.1.patch, HIVE-19723.3.patch, > HIVE-19723.4.patch, HIVE-19732.2.patch > > > Spark's Arrow support only provides Timestamp at MICROSECOND granularity. > Spark 2.3.0 won't accept NANOSECOND. Switch it back to MICROSECOND. > The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need > to change the assertion to test microsecond. And we'll need to add this to > documentation on supported datatypes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19815) Repl dump should not propagate the checkpoint and repl source properties
[ https://issues.apache.org/jira/browse/HIVE-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507694#comment-16507694 ] ASF GitHub Bot commented on HIVE-19815: --- Github user sankarh closed the pull request at: https://github.com/apache/hive/pull/367 > Repl dump should not propagate the checkpoint and repl source properties > > > Key: HIVE-19815 > URL: https://issues.apache.org/jira/browse/HIVE-19815 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0, 4.0.0 > > Attachments: HIVE-19815.01.patch, HIVE-19815.02.patch > > > For replication scenarios of A-> B -> C the repl dump on B should not include > the checkpoint property when dumping out table information. > Alter tables/partitions during incremental should not propagate this as well. > Also should not propagate the the db level parameters set by replication > internally. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19853) Arrow serializer needs to create a TimeStampMicroTZVector instead of TimeStampMicroVector
[ https://issues.apache.org/jira/browse/HIVE-19853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507849#comment-16507849 ] ASF GitHub Bot commented on HIVE-19853: --- GitHub user pudidic opened a pull request: https://github.com/apache/hive/pull/371 HIVE-19853: Arrow serializer needs to create a TimeStampMicroTZVector… … instead of TimeStampMicroVector You can merge this pull request into a Git repository by running: $ git pull https://github.com/pudidic/hive HIVE-19853 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/371.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #371 commit f785b6d5603d94b126a9611b4a583e4803dd54f7 Author: Teddy Choi Date: 2018-06-11T09:29:57Z HIVE-19853: Arrow serializer needs to create a TimeStampMicroTZVector instead of TimeStampMicroVector > Arrow serializer needs to create a TimeStampMicroTZVector instead of > TimeStampMicroVector > - > > Key: HIVE-19853 > URL: https://issues.apache.org/jira/browse/HIVE-19853 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-19853.1.patch > > > HIVE-19723 changed nanosecond to microsecond in Arrow serialization. However, > it needs to be microsecond with time zone. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19853) Arrow serializer needs to create a TimeStampMicroTZVector instead of TimeStampMicroVector
[ https://issues.apache.org/jira/browse/HIVE-19853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19853: -- Labels: pull-request-available (was: ) > Arrow serializer needs to create a TimeStampMicroTZVector instead of > TimeStampMicroVector > - > > Key: HIVE-19853 > URL: https://issues.apache.org/jira/browse/HIVE-19853 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-19853.1.patch > > > HIVE-19723 changed nanosecond to microsecond in Arrow serialization. However, > it needs to be microsecond with time zone. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump
[ https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19725: -- Labels: Repl pull-request-available (was: Repl) > Add ability to dump non-native tables in replication metadata dump > -- > > Key: HIVE-19725 > URL: https://issues.apache.org/jira/browse/HIVE-19725 > Project: Hive > Issue Type: Task > Components: repl >Affects Versions: 3.0.0, 3.1.0, 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: Repl, pull-request-available > Fix For: 3.1.0, 3.0.1, 4.0.0 > > Attachments: HIVE-19725.01.patch > > > if hive.repl.dump.metadata.only is set to true, allow dumping non native > tables also. This will be used by DAS. > Data dump for non-native tables should never be allowed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump
[ https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493102#comment-16493102 ] ASF GitHub Bot commented on HIVE-19725: --- GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/361 HIVE-19725 : Add ability to dump non-native tables in replication metadata dump if hive.repl.dump.metadata.only is set to true, allow dumping non native tables also. This will be used by DAS. Data dump for non-native tables should never be allowed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive BUG-103509 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/361.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #361 commit ecb7e9637660f69e1503bd8311359b0bb4bad543 Author: Mahesh Kumar Behera Date: 2018-05-29T04:43:52Z HIVE-19725 : Add ability to dump non-native tables in replication metadata dump > Add ability to dump non-native tables in replication metadata dump > -- > > Key: HIVE-19725 > URL: https://issues.apache.org/jira/browse/HIVE-19725 > Project: Hive > Issue Type: Task > Components: repl >Affects Versions: 3.0.0, 3.1.0, 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: Repl, pull-request-available > Fix For: 3.1.0, 3.0.1, 4.0.0 > > Attachments: HIVE-19725.01.patch > > > if hive.repl.dump.metadata.only is set to true, allow dumping non native > tables also. This will be used by DAS. > Data dump for non-native tables should never be allowed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19499) Bootstrap REPL LOAD shall add tasks to create checkpoints for db/tables/partitions.
[ https://issues.apache.org/jira/browse/HIVE-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491724#comment-16491724 ] ASF GitHub Bot commented on HIVE-19499: --- Github user sankarh closed the pull request at: https://github.com/apache/hive/pull/352 > Bootstrap REPL LOAD shall add tasks to create checkpoints for > db/tables/partitions. > --- > > Key: HIVE-19499 > URL: https://issues.apache.org/jira/browse/HIVE-19499 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, pull-request-available, replication > Fix For: 3.1.0 > > Attachments: HIVE-19499.01.patch, HIVE-19499.02.patch > > > Currently. bootstrap REPL LOAD expect the target database to be empty or not > exist to start bootstrap load. > But, this adds overhead when there is a failure in between bootstrap load and > there is no way to resume it from where it fails. So, it is needed to create > checkpoints in table/partitions to skip the completely loaded objects. > Use the fully qualified path of the dump directory as a checkpoint > identifier. This should be added to the table / partition properties in hive > via a task, as the last task in the DAG for table / partition creation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19661) switch Hive UDFs to use Re2J regex engine
[ https://issues.apache.org/jira/browse/HIVE-19661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494371#comment-16494371 ] ASF GitHub Bot commented on HIVE-19661: --- Github user rajkrrsingh closed the pull request at: https://github.com/apache/hive/pull/358 > switch Hive UDFs to use Re2J regex engine > - > > Key: HIVE-19661 > URL: https://issues.apache.org/jira/browse/HIVE-19661 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-19661.patch > > > Java regex engine can be very slow in some cases e.g. > https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19661) switch Hive UDFs to use Re2J regex engine
[ https://issues.apache.org/jira/browse/HIVE-19661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494377#comment-16494377 ] ASF GitHub Bot commented on HIVE-19661: --- GitHub user rajkrrsingh opened a pull request: https://github.com/apache/hive/pull/362 HIVE-19661: switch Hive UDFs to use Re2J regex engine. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajkrrsingh/hive HIVE-19661 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/362.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #362 commit e3280ec23c7ec4a4a69197a776e8cc1b32c53630 Author: Rajkumar singh Date: 2018-05-29T22:14:51Z HIVE-19661: switch Hive UDFs to use Re2J regex engine. > switch Hive UDFs to use Re2J regex engine > - > > Key: HIVE-19661 > URL: https://issues.apache.org/jira/browse/HIVE-19661 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-19661.patch > > > Java regex engine can be very slow in some cases e.g. > https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19661) switch Hive UDFs to use Re2J regex engine
[ https://issues.apache.org/jira/browse/HIVE-19661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19661: -- Labels: pull-request-available (was: ) > switch Hive UDFs to use Re2J regex engine > - > > Key: HIVE-19661 > URL: https://issues.apache.org/jira/browse/HIVE-19661 > Project: Hive > Issue Type: Bug >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > Labels: pull-request-available > > Java regex engine can be very slow in some cases e.g. > https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19661) switch Hive UDFs to use Re2J regex engine
[ https://issues.apache.org/jira/browse/HIVE-19661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491923#comment-16491923 ] ASF GitHub Bot commented on HIVE-19661: --- GitHub user rajkrrsingh opened a pull request: https://github.com/apache/hive/pull/358 HIVE-19661 : switch Hive UDFs to use Re2J regex engine You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajkrrsingh/hive HIVE-19661 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/358.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #358 commit 41cff8d9c5d4ffed7d75388013a2618e787ef7fc Author: Rajkumar singhDate: 2018-05-27T05:18:04Z HIVE-19661 : switch Hive UDFs to use Re2J regex engine commit da1a73b40896f84b920db3c90212fa3bbf375a95 Author: Rajkumar singh Date: 2018-05-27T05:19:35Z HIVE-19661 : switch Hive UDFs to use Re2J regex engine > switch Hive UDFs to use Re2J regex engine > - > > Key: HIVE-19661 > URL: https://issues.apache.org/jira/browse/HIVE-19661 > Project: Hive > Issue Type: Bug >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > Labels: pull-request-available > > Java regex engine can be very slow in some cases e.g. > https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19776) HiveServer2.startHiveServer2 retries of start has concurrency issues
[ https://issues.apache.org/jira/browse/HIVE-19776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19776: -- Labels: pull-request-available (was: ) > HiveServer2.startHiveServer2 retries of start has concurrency issues > > > Key: HIVE-19776 > URL: https://issues.apache.org/jira/browse/HIVE-19776 > Project: Hive > Issue Type: Improvement >Reporter: Thejas M Nair >Assignee: Thejas M Nair >Priority: Major > Labels: pull-request-available > > HS2 starts the thrift binary/http servers in background, while it proceeds to > do other setup (eg create zookeeper entries). If there is a ZK error and it > attempts to stop and start in the retry loop within > HiveServer2.startHiveServer2, the retry fails because the thrift server > doesn't get stopped if it was still getting initialized. > The thrift server initialization and stopping needs to be synchronized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19776) HiveServer2.startHiveServer2 retries of start has concurrency issues
[ https://issues.apache.org/jira/browse/HIVE-19776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499205#comment-16499205 ] ASF GitHub Bot commented on HIVE-19776: --- GitHub user thejasmn opened a pull request: https://github.com/apache/hive/pull/363 HIVE-19776 1.patch You can merge this pull request into a Git repository by running: $ git pull https://github.com/thejasmn/hive HIVE-19776 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/363.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #363 commit 85c7505d3719415fb39707c43053fcf19f4fe838 Author: Thejas M Nair Date: 2018-06-02T19:46:08Z HIVE-19776 1.patch > HiveServer2.startHiveServer2 retries of start has concurrency issues > > > Key: HIVE-19776 > URL: https://issues.apache.org/jira/browse/HIVE-19776 > Project: Hive > Issue Type: Improvement >Reporter: Thejas M Nair >Assignee: Thejas M Nair >Priority: Major > Labels: pull-request-available > > HS2 starts the thrift binary/http servers in background, while it proceeds to > do other setup (eg create zookeeper entries). If there is a ZK error and it > attempts to stop and start in the retry loop within > HiveServer2.startHiveServer2, the retry fails because the thrift server > doesn't get stopped if it was still getting initialized. > The thrift server initialization and stopping needs to be synchronized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501408#comment-16501408 ] ASF GitHub Bot commented on HIVE-16391: --- GitHub user jerryshao opened a pull request: https://github.com/apache/hive/pull/364 HIVE-16391: Add a new classifier for hive-exec to be used by Spark This fix adding a new classifier for hive-exec artifact (`core-spark`), which is specifically used for Spark. Details in [SPARK-20202](https://issues.apache.org/jira/browse/SPARK-20202). This is because original hive-exec packages many transitive dependencies into shaded jar without relocation, this makes conflicts in Spark. Spark only needs to relocate protobuf and kryo jar. So here propose to add a new classifier to generate a new artifact only for Spark. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/hive 1.2-spark-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/364.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #364 commit bb27b260d82fa0a77d9fea3c123f2af8f1ea88aa Author: jerryshao Date: 2018-06-05T06:59:37Z HIVE-16391: Add a new classifier for hive-exec to be used by Spark > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin >Priority: Major > Labels: pull-request-available > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-16391: -- Labels: pull-request-available (was: ) > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin >Priority: Major > Labels: pull-request-available > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19812) Disable external table replication by default via a configuration property
[ https://issues.apache.org/jira/browse/HIVE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502940#comment-16502940 ] ASF GitHub Bot commented on HIVE-19812: --- GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/365 HIVE-19812 : Disable external table replication by default via a coniguration property use a hive config property to allow external table replication. set this property by default to prevent external table replication. for metadata only hive repl always export metadata for external tables. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive BUG-104223 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/365.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #365 commit 45ef2edd4a82276e16e2a60d49e655210f3b4c21 Author: Mahesh Kumar Behera Date: 2018-06-06T04:05:28Z HIVE-19812 : Disable external table replication by default via a configuration property > Disable external table replication by default via a configuration property > -- > > Key: HIVE-19812 > URL: https://issues.apache.org/jira/browse/HIVE-19812 > Project: Hive > Issue Type: Task > Components: repl >Affects Versions: 3.1.0, 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0, 4.0.0 > > > use a hive config property to allow external table replication. set this > property by default to prevent external table replication. > for metadata only hive repl always export metadata for external tables. > > REPL_DUMP_EXTERNAL_TABLES("hive.repl.dump.include.external.tables", false, > "Indicates if repl dump should include information about external tables. It > should be \n" > + "used in conjunction with 'hive.repl.dump.metadata.only' set to false. if > 'hive.repl.dump.metadata.only' \n" > + " is set to true then this config parameter has no effect as external table > meta data is flushed \n" > + " always by default.") > This should be done for only replication dump and not for export -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19812) Disable external table replication by default via a configuration property
[ https://issues.apache.org/jira/browse/HIVE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19812: -- Labels: pull-request-available (was: ) > Disable external table replication by default via a configuration property > -- > > Key: HIVE-19812 > URL: https://issues.apache.org/jira/browse/HIVE-19812 > Project: Hive > Issue Type: Task > Components: repl >Affects Versions: 3.1.0, 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0, 4.0.0 > > > use a hive config property to allow external table replication. set this > property by default to prevent external table replication. > for metadata only hive repl always export metadata for external tables. > > REPL_DUMP_EXTERNAL_TABLES("hive.repl.dump.include.external.tables", false, > "Indicates if repl dump should include information about external tables. It > should be \n" > + "used in conjunction with 'hive.repl.dump.metadata.only' set to false. if > 'hive.repl.dump.metadata.only' \n" > + " is set to true then this config parameter has no effect as external table > meta data is flushed \n" > + " always by default.") > This should be done for only replication dump and not for export -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19739) Bootstrap REPL LOAD to use checkpoints to validate and skip the loaded data/metadata.
[ https://issues.apache.org/jira/browse/HIVE-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502943#comment-16502943 ] ASF GitHub Bot commented on HIVE-19739: --- GitHub user sankarh opened a pull request: https://github.com/apache/hive/pull/366 HIVE-19739: Bootstrap REPL LOAD to use checkpoints to validate and skip the loaded data/metadata. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sankarh/hive HIVE-19739 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/366.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #366 commit 89d6d9957c200ebad3a78a38400ea70efa5abdc2 Author: Sankar Hariappan Date: 2018-06-04T21:35:38Z HIVE-19739: Bootstrap REPL LOAD to use checkpoints to validate and skip the loaded data/metadata. > Bootstrap REPL LOAD to use checkpoints to validate and skip the loaded > data/metadata. > - > > Key: HIVE-19739 > URL: https://issues.apache.org/jira/browse/HIVE-19739 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, pull-request-available, replication > Fix For: 4.0.0 > > > Currently. bootstrap REPL LOAD have added checkpoint identifiers in > DB/table/partition object properties once the data/metadata related to the > object is successfully loaded. > If the Db exist and is not empty, then currently we are throwing exception. > But need to support it for the retry scenario after a failure. > If there is a retry of bootstrap load using the same dump, then instead of > throwing error, we should check if any of the tables/partitions are > completely loaded using the checkpoint identifiers. If yes, then skip it or > else drop/create them again. > If the bootstrap load is performed using different dump, then it should throw > exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19569) alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable()
[ https://issues.apache.org/jira/browse/HIVE-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19569: -- Labels: pull-request-available (was: ) > alter table db1.t1 rename db2.t2 generates > MetaStoreEventListener.onDropTable() > --- > > Key: HIVE-19569 > URL: https://issues.apache.org/jira/browse/HIVE-19569 > Project: Hive > Issue Type: Bug > Components: Metastore, Standalone Metastore, Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-19569.01.patch > > > When renaming a table within the same DB, this operation causes > {{MetaStoreEventListener.onAlterTable()}} to fire but when changing DB name > for a table it causes {{MetaStoreEventListener.onDropTable()}} + > {{MetaStoreEventListener.onCreateTable()}}. > The files from original table are moved to new table location. > This creates confusing semantics since any logic in {{onDropTable()}} doesn't > know about the larger context, i.e. that there will be a matching > {{onCreateTable()}}. > In particular, this causes a problem for Acid tables since files moved from > old table use WriteIDs that are not meaningful with the context of new table. > Current implementation is due to replication. This should ideally be changed > to raise a "not supported" error for tables that are marked for replication. > cc [~sankarh] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19569) alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable()
[ https://issues.apache.org/jira/browse/HIVE-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504187#comment-16504187 ] ASF GitHub Bot commented on HIVE-19569: --- GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/368 HIVE-19569 : alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable() changed create/drop table to alter table event You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive BUG-104447 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/368.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #368 commit 498da7cfb697309a30cbfc723a708bfaaa9ef29e Author: Mahesh Kumar Behera Date: 2018-06-06T04:57:39Z HIVE-19569 : alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable() > alter table db1.t1 rename db2.t2 generates > MetaStoreEventListener.onDropTable() > --- > > Key: HIVE-19569 > URL: https://issues.apache.org/jira/browse/HIVE-19569 > Project: Hive > Issue Type: Bug > Components: Metastore, Standalone Metastore, Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-19569.01.patch > > > When renaming a table within the same DB, this operation causes > {{MetaStoreEventListener.onAlterTable()}} to fire but when changing DB name > for a table it causes {{MetaStoreEventListener.onDropTable()}} + > {{MetaStoreEventListener.onCreateTable()}}. > The files from original table are moved to new table location. > This creates confusing semantics since any logic in {{onDropTable()}} doesn't > know about the larger context, i.e. that there will be a matching > {{onCreateTable()}}. > In particular, this causes a problem for Acid tables since files moved from > old table use WriteIDs that are not meaningful with the context of new table. > Current implementation is due to replication. This should ideally be changed > to raise a "not supported" error for tables that are marked for replication. > cc [~sankarh] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19708) Repl copy retrying with cm path even if the failure is due to network issue
[ https://issues.apache.org/jira/browse/HIVE-19708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19708: -- Labels: pull-request-available (was: ) > Repl copy retrying with cm path even if the failure is due to network issue > --- > > Key: HIVE-19708 > URL: https://issues.apache.org/jira/browse/HIVE-19708 > Project: Hive > Issue Type: Task > Components: Hive, HiveServer2, repl >Affects Versions: 3.1.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0 > > Attachments: HIVE-19708.01.patch, HIVE-19708.02.patch > > > * During repl load > ** for filesystem based copying of file if the copy fails due to a > connection error to source Name Node, we should recreate the filesystem > object. > ** the retry logic for local file copy should be triggered using the > original source file path ( and not the CM root path ) since failure can be > due to network issues between DFSClient and NN. > * When listing files in tables / partition to include them in _files, we > should add retry logic when failure occurs. FileSystem object here also > should be recreated since the existing one might be in inconsistent state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19708) Repl copy retrying with cm path even if the failure is due to network issue
[ https://issues.apache.org/jira/browse/HIVE-19708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492284#comment-16492284 ] ASF GitHub Bot commented on HIVE-19708: --- GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/359 HIVE-19708 : Repl copy retrying with cm path even if the failure is d… You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive BUG-102280 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/359.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #359 commit fc7377c8f265402f2bdc19ef79f0cf94b4fa5c44 Author: Mahesh Kumar BeheraDate: 2018-05-25T03:43:52Z HIVE-19708 : Repl copy retrying with cm path even if the failure is due to network issue > Repl copy retrying with cm path even if the failure is due to network issue > --- > > Key: HIVE-19708 > URL: https://issues.apache.org/jira/browse/HIVE-19708 > Project: Hive > Issue Type: Task > Components: Hive, HiveServer2, repl >Affects Versions: 3.1.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0 > > Attachments: HIVE-19708.01.patch, HIVE-19708.02.patch > > > * During repl load > ** for filesystem based copying of file if the copy fails due to a > connection error to source Name Node, we should recreate the filesystem > object. > ** the retry logic for local file copy should be triggered using the > original source file path ( and not the CM root path ) since failure can be > due to network issues between DFSClient and NN. > * When listing files in tables / partition to include them in _files, we > should add retry logic when failure occurs. FileSystem object here also > should be recreated since the existing one might be in inconsistent state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19723) Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
[ https://issues.apache.org/jira/browse/HIVE-19723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492365#comment-16492365 ] ASF GitHub Bot commented on HIVE-19723: --- GitHub user pudidic opened a pull request: https://github.com/apache/hive/pull/360 HIVE-19723: Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)" Spark's Arrow support only provides Timestamp at MICROSECOND granularity. Spark 2.3.0 won't accept NANOSECOND. Switch it back to MICROSECOND. The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need to change the assertion to test microsecond. And we'll need to add this to documentation on supported datatypes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pudidic/hive HIVE-19723 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/360.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #360 commit 53e4224e05b2b6d96b19451716c41ae1eae7df68 Author: Teddy ChoiDate: 2018-05-28T06:56:07Z HIVE-19723: Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)" > Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)" > - > > Key: HIVE-19723 > URL: https://issues.apache.org/jira/browse/HIVE-19723 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-19723.1.patch > > > Spark's Arrow support only provides Timestamp at MICROSECOND granularity. > Spark 2.3.0 won't accept NANOSECOND. Switch it back to MICROSECOND. > The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need > to change the assertion to test microsecond. And we'll need to add this to > documentation on supported datatypes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19723) Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
[ https://issues.apache.org/jira/browse/HIVE-19723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19723: -- Labels: pull-request-available (was: ) > Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)" > - > > Key: HIVE-19723 > URL: https://issues.apache.org/jira/browse/HIVE-19723 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-19723.1.patch > > > Spark's Arrow support only provides Timestamp at MICROSECOND granularity. > Spark 2.3.0 won't accept NANOSECOND. Switch it back to MICROSECOND. > The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need > to change the assertion to test microsecond. And we'll need to add this to > documentation on supported datatypes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17840) HiveMetaStore eats exception if transactionalListeners.notifyEvent fail
[ https://issues.apache.org/jira/browse/HIVE-17840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529716#comment-16529716 ] ASF GitHub Bot commented on HIVE-17840: --- GitHub user sankarh opened a pull request: https://github.com/apache/hive/pull/385 HIVE-17840: HiveMetaStore eats exception if transactionalListeners.notifyEvent fail. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sankarh/hive HIVE-17840 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/385.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #385 commit 0dbccfe51b83fa1f1842d96fb133be0c5bef4ebf Author: Sankar Hariappan Date: 2018-07-02T11:09:51Z HIVE-17840: HiveMetaStore eats exception if transactionalListeners.notifyEvent fail. > HiveMetaStore eats exception if transactionalListeners.notifyEvent fail > --- > > Key: HIVE-17840 > URL: https://issues.apache.org/jira/browse/HIVE-17840 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Daniel Dai >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > > For example, in add_partitions_core, if there's exception in > MetaStoreListenerNotifier.notifyEvent(transactionalListeners,), > transaction rollback but no exception thrown. Client will assume add > partition is successful and take a positive path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-17840) HiveMetaStore eats exception if transactionalListeners.notifyEvent fail
[ https://issues.apache.org/jira/browse/HIVE-17840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-17840: -- Labels: pull-request-available (was: ) > HiveMetaStore eats exception if transactionalListeners.notifyEvent fail > --- > > Key: HIVE-17840 > URL: https://issues.apache.org/jira/browse/HIVE-17840 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Daniel Dai >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > > For example, in add_partitions_core, if there's exception in > MetaStoreListenerNotifier.notifyEvent(transactionalListeners,), > transaction rollback but no exception thrown. Client will assume add > partition is successful and take a positive path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-17593) DataWritableWriter strip spaces for CHAR type before writing, but predicate generator doesn't do same thing.
[ https://issues.apache.org/jira/browse/HIVE-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-17593: -- Labels: pull-request-available (was: ) > DataWritableWriter strip spaces for CHAR type before writing, but predicate > generator doesn't do same thing. > > > Key: HIVE-17593 > URL: https://issues.apache.org/jira/browse/HIVE-17593 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.0, 3.0.0 >Reporter: Junjie Chen >Assignee: Junjie Chen >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0 > > Attachments: HIVE-17593.patch > > > DataWritableWriter strip spaces for CHAR type before writing. While when > generating predicate, it does NOT do same striping which should cause data > missing! > In current version, it doesn't cause data missing since predicate is not well > push down to parquet due to HIVE-17261. > Please see ConvertAstTosearchArg.java, getTypes treats CHAR and STRING as > same which will build a predicate with tail spaces. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17593) DataWritableWriter strip spaces for CHAR type before writing, but predicate generator doesn't do same thing.
[ https://issues.apache.org/jira/browse/HIVE-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528513#comment-16528513 ] ASF GitHub Bot commented on HIVE-17593: --- GitHub user cjjnjust opened a pull request: https://github.com/apache/hive/pull/383 HIVE-17593: DataWritableWriter strip spaces for CHAR type which cause… Parquet DataWritableWriter strip tailing spaces for HiveChar type, which cause predicate push down failed to work due to ConvertAstToSearchArg constructs predicate with tailing space. Actually, according to HiveChar definition, it should contains padded value. ParquetOutputFormat can handle tailing spaces through encoding. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cjjnjust/hive HIVE-17593 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/383.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #383 commit 03230c732d657706c6a95f90e16ed5c81d411af7 Author: Chen, Junjie Date: 2018-06-29T23:32:52Z HIVE-17593: DataWritableWriter strip spaces for CHAR type which cause PPD not work > DataWritableWriter strip spaces for CHAR type before writing, but predicate > generator doesn't do same thing. > > > Key: HIVE-17593 > URL: https://issues.apache.org/jira/browse/HIVE-17593 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.0, 3.0.0 >Reporter: Junjie Chen >Assignee: Junjie Chen >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0 > > Attachments: HIVE-17593.patch > > > DataWritableWriter strip spaces for CHAR type before writing. While when > generating predicate, it does NOT do same striping which should cause data > missing! > In current version, it doesn't cause data missing since predicate is not well > push down to parquet due to HIVE-17261. > Please see ConvertAstTosearchArg.java, getTypes treats CHAR and STRING as > same which will build a predicate with tail spaces. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20025) Clean-up of event files created by HiveProtoLoggingHook.
[ https://issues.apache.org/jira/browse/HIVE-20025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-20025: -- Labels: Hive hooks pull-request-available (was: Hive hooks) > Clean-up of event files created by HiveProtoLoggingHook. > > > Key: HIVE-20025 > URL: https://issues.apache.org/jira/browse/HIVE-20025 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: Hive, hooks, pull-request-available > Fix For: 4.0.0 > > > Currently, HiveProtoLoggingHook write event data to hdfs. The number of files > can grow to very large numbers. > Since the files are created under a folder with Date being a part of the > path, hive should have a way to clean up data older than a certain configured > time / date. This can be a job that can run with as little frequency as just > once a day. > This time should be set to 1 week default. There should also be a sane upper > bound of # of files so that when a large cluster generates a lot of files > during a spike, we don't force the cluster fall over. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20025) Clean-up of event files created by HiveProtoLoggingHook.
[ https://issues.apache.org/jira/browse/HIVE-20025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529341#comment-16529341 ] ASF GitHub Bot commented on HIVE-20025: --- GitHub user sankarh opened a pull request: https://github.com/apache/hive/pull/384 HIVE-20025: Clean-up of event files created by HiveProtoLoggingHook. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sankarh/hive HIVE-20025 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/384.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #384 commit 52c24baa28ed305f3be2b47f6246ffede0f08e6e Author: Sankar Hariappan Date: 2018-07-01T17:18:06Z HIVE-20025: Clean-up of event files created by HiveProtoLoggingHook. > Clean-up of event files created by HiveProtoLoggingHook. > > > Key: HIVE-20025 > URL: https://issues.apache.org/jira/browse/HIVE-20025 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: Hive, hooks, pull-request-available > Fix For: 4.0.0 > > > Currently, HiveProtoLoggingHook write event data to hdfs. The number of files > can grow to very large numbers. > Since the files are created under a folder with Date being a part of the > path, hive should have a way to clean up data older than a certain configured > time / date. This can be a job that can run with as little frequency as just > once a day. > This time should be set to 1 week default. There should also be a sane upper > bound of # of files so that when a large cluster generates a lot of files > during a spike, we don't force the cluster fall over. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19316) StatsTask fails due to ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-19316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19316: -- Labels: pull-request-available (was: ) > StatsTask fails due to ClassCastException > - > > Key: HIVE-19316 > URL: https://issues.apache.org/jira/browse/HIVE-19316 > Project: Hive > Issue Type: Bug > Components: Statistics >Reporter: Rui Li >Assignee: Jaume M >Priority: Major > Labels: pull-request-available > > The stack trace: > {noformat} > 2018-04-26T20:17:37,674 ERROR [pool-7-thread-11] > metastore.RetryingHMSHandler: java.lang.ClassCastException: > org.apache.hadoop.hive.metastore.api.LongColumnStatsData cannot be cast to > org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector > at > org.apache.hadoop.hive.metastore.columnstats.merge.LongColumnStatsMerger.merge(LongColumnStatsMerger.java:30) > at > org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.mergeColStats(MetaStoreUtils.java:1052) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:7202) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy26.set_aggr_stats_for(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:16795) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:16779) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19316) StatsTask fails due to ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-19316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520623#comment-16520623 ] ASF GitHub Bot commented on HIVE-19316: --- GitHub user beltran opened a pull request: https://github.com/apache/hive/pull/378 HIVE-19316: StatsTask fails due to ClassCastException You can merge this pull request into a Git repository by running: $ git pull https://github.com/beltran/hive HIVE-19316 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/378.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #378 commit a9566b22761aa1da14585ce829e6d65f9d272f48 Author: Jaume Marhuenda Date: 2018-06-22T17:46:35Z HIVE-19316: StatsTask fails due to ClassCastException > StatsTask fails due to ClassCastException > - > > Key: HIVE-19316 > URL: https://issues.apache.org/jira/browse/HIVE-19316 > Project: Hive > Issue Type: Bug > Components: Statistics >Reporter: Rui Li >Assignee: Jaume M >Priority: Major > Labels: pull-request-available > > The stack trace: > {noformat} > 2018-04-26T20:17:37,674 ERROR [pool-7-thread-11] > metastore.RetryingHMSHandler: java.lang.ClassCastException: > org.apache.hadoop.hive.metastore.api.LongColumnStatsData cannot be cast to > org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector > at > org.apache.hadoop.hive.metastore.columnstats.merge.LongColumnStatsMerger.merge(LongColumnStatsMerger.java:30) > at > org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.mergeColStats(MetaStoreUtils.java:1052) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:7202) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy26.set_aggr_stats_for(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:16795) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:16779) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-19970) Replication dump has a NPE when table is empty
[ https://issues.apache.org/jira/browse/HIVE-19970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-19970: -- Labels: pull-request-available (was: ) > Replication dump has a NPE when table is empty > -- > > Key: HIVE-19970 > URL: https://issues.apache.org/jira/browse/HIVE-19970 > Project: Hive > Issue Type: Task > Components: repl >Affects Versions: 3.1.0, 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0, 4.0.0 > > Attachments: HIVE-19970.01.patch > > > if table directory or partition directory is missing ..dump is throwing NPE > instead of file missing exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19970) Replication dump has a NPE when table is empty
[ https://issues.apache.org/jira/browse/HIVE-19970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521408#comment-16521408 ] ASF GitHub Bot commented on HIVE-19970: --- GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/379 HIVE-19970 : Replication dump has a NPE when table is empty You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive BUG-105903 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/379.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #379 commit a1ad4c1d068ad1cd5ce6b8ec7170bff1b1a4f1b1 Author: Mahesh Kumar Behera Date: 2018-06-22T20:04:15Z HIVE-19970 : Replication dump has a NPE when table is empty > Replication dump has a NPE when table is empty > -- > > Key: HIVE-19970 > URL: https://issues.apache.org/jira/browse/HIVE-19970 > Project: Hive > Issue Type: Task > Components: repl >Affects Versions: 3.1.0, 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0, 4.0.0 > > Attachments: HIVE-19970.01.patch > > > if table directory or partition directory is missing ..dump is throwing NPE > instead of file missing exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20001) With doas set to true, running select query as hrt_qa user on external table fails due to permission denied to read /warehouse/tablespace/managed directory.
[ https://issues.apache.org/jira/browse/HIVE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-20001: -- Labels: pull-request-available (was: ) > With doas set to true, running select query as hrt_qa user on external table > fails due to permission denied to read /warehouse/tablespace/managed > directory. > > > Key: HIVE-20001 > URL: https://issues.apache.org/jira/browse/HIVE-20001 > Project: Hive > Issue Type: Bug >Reporter: Jaume M >Assignee: Jaume M >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20001.1.patch > > > Hive: With doas set to true, running select query as hrt_qa user on external > table fails due to permission denied to read /warehouse/tablespace/managed > directory. > Steps: > 1. Create a external table. > 2. Set doas to true. > 3. run select count(*) using user hrt_qa. > Table creation query. > {code} > beeline -n hrt_qa -p pwd -u > "jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit" > --outputformat=tsv -e "drop table if exists test_table purge; > create external table test_table(id int, age int) row format delimited fields > terminated by '|' stored as textfile; > load data inpath '/tmp/table1.dat' overwrite into table test_table; > {code} > select count(*) query execution fails > {code} > beeline -n hrt_qa -p pwd -u > "jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit" > --outputformat=tsv -e "select count(*) from test_table where age>30 and > id<10100;" > 2018-06-22 10:22:29,328|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: Class path contains > multiple SLF4J bindings. > 2018-06-22 10:22:29,330|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: See > http://www.slf4j.org/codes.html#multiple_bindings for an explanation. > 2018-06-22 10:22:29,335|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: Actual binding is of > type [org.apache.logging.slf4j.Log4jLoggerFactory] > 2018-06-22 10:22:31,408|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|Format tsv is deprecated, > please use tsv2 > 2018-06-22 10:22:31,529|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|Connecting to > jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit > 2018-06-22 10:22:32,031|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|18/06/22 10:22:32 [main]: > INFO jdbc.HiveConnection: Connected to > ctr-e138-1518143905142-375925-01-04.hwx.site:10001 > 2018-06-22 10:22:34,130|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|18/06/22 10:22:34 [main]: > WARN jdbc.HiveConnection: Failed to connect to > ctr-e138-1518143905142-375925-01-04.hwx.site:10001 > 2018-06-22 10:22:34,244|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|18/06/22 10:22:34 [main]: > WARN jdbc.HiveConnection: Could not open client transport with JDBC Uri: > jdbc:hive2://ctr-e138-1518143905142-375925-01-04.hwx.site:10001/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit: > Failed to open new session: > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:java.security.AccessControlException: Permission > denied: user=hrt_qa, access=READ, >
[jira] [Commented] (HIVE-20001) With doas set to true, running select query as hrt_qa user on external table fails due to permission denied to read /warehouse/tablespace/managed directory.
[ https://issues.apache.org/jira/browse/HIVE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524295#comment-16524295 ] ASF GitHub Bot commented on HIVE-20001: --- GitHub user beltran opened a pull request: https://github.com/apache/hive/pull/380 HIVE-20001: With doas set to true, running select query as hrt_qa use… …r on external table fails due to permission denied to read /warehouse/tablespace/managed directory You can merge this pull request into a Git repository by running: $ git pull https://github.com/beltran/hive HIVE-20001 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/380.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #380 commit 3e9dd9a73ae9d33e2f291819b0e10e4296f2b568 Author: Jaume Marhuenda Date: 2018-06-26T22:42:14Z HIVE-20001: With doas set to true, running select query as hrt_qa user on external table fails due to permission denied to read /warehouse/tablespace/managed directory > With doas set to true, running select query as hrt_qa user on external table > fails due to permission denied to read /warehouse/tablespace/managed > directory. > > > Key: HIVE-20001 > URL: https://issues.apache.org/jira/browse/HIVE-20001 > Project: Hive > Issue Type: Bug >Reporter: Jaume M >Assignee: Jaume M >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20001.1.patch > > > Hive: With doas set to true, running select query as hrt_qa user on external > table fails due to permission denied to read /warehouse/tablespace/managed > directory. > Steps: > 1. Create a external table. > 2. Set doas to true. > 3. run select count(*) using user hrt_qa. > Table creation query. > {code} > beeline -n hrt_qa -p pwd -u > "jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit" > --outputformat=tsv -e "drop table if exists test_table purge; > create external table test_table(id int, age int) row format delimited fields > terminated by '|' stored as textfile; > load data inpath '/tmp/table1.dat' overwrite into table test_table; > {code} > select count(*) query execution fails > {code} > beeline -n hrt_qa -p pwd -u > "jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit" > --outputformat=tsv -e "select count(*) from test_table where age>30 and > id<10100;" > 2018-06-22 10:22:29,328|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: Class path contains > multiple SLF4J bindings. > 2018-06-22 10:22:29,330|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: See > http://www.slf4j.org/codes.html#multiple_bindings for an explanation. > 2018-06-22 10:22:29,335|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: Actual binding is of > type [org.apache.logging.slf4j.Log4jLoggerFactory] > 2018-06-22 10:22:31,408|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|Format tsv is deprecated, > please use tsv2 > 2018-06-22 10:22:31,529|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|Connecting to > jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit > 2018-06-22 10:22:32,031|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|18/06/22 10:22:32 [main]: > INFO jdbc.HiveConnection: Connected to > ctr-e138-1518143905142-375925-01-04.hwx.site:10001 > 2018-06-22
[jira] [Commented] (HIVE-20001) With doas set to true, running select query as hrt_qa user on external table fails due to permission denied to read /warehouse/tablespace/managed directory.
[ https://issues.apache.org/jira/browse/HIVE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530503#comment-16530503 ] ASF GitHub Bot commented on HIVE-20001: --- GitHub user beltran opened a pull request: https://github.com/apache/hive/pull/389 HIVE-20001: With doas set to true, running select query as hrt_qa use… …r on external table fails due to permission denied to read /warehouse/tablespace/managed directory You can merge this pull request into a Git repository by running: $ git pull https://github.com/beltran/hive HIVE-20001-3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/389.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #389 commit 764d76de31f83b2c5985ad29410135faf4e32998 Author: Jaume Marhuenda Date: 2018-07-01T03:47:15Z HIVE-20001: With doas set to true, running select query as hrt_qa user on external table fails due to permission denied to read /warehouse/tablespace/managed directory > With doas set to true, running select query as hrt_qa user on external table > fails due to permission denied to read /warehouse/tablespace/managed > directory. > > > Key: HIVE-20001 > URL: https://issues.apache.org/jira/browse/HIVE-20001 > Project: Hive > Issue Type: Bug >Reporter: Jaume M >Assignee: Jaume M >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20001.1.patch, HIVE-20001.1.patch, > HIVE-20001.2.patch, HIVE-20001.3.patch > > > Hive: With doas set to true, running select query as hrt_qa user on external > table fails due to permission denied to read /warehouse/tablespace/managed > directory. > Steps: > 1. Create a external table. > 2. Set doas to true. > 3. run select count(*) using user hrt_qa. > Table creation query. > {code} > beeline -n hrt_qa -p pwd -u > "jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit" > --outputformat=tsv -e "drop table if exists test_table purge; > create external table test_table(id int, age int) row format delimited fields > terminated by '|' stored as textfile; > load data inpath '/tmp/table1.dat' overwrite into table test_table; > {code} > select count(*) query execution fails > {code} > beeline -n hrt_qa -p pwd -u > "jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit" > --outputformat=tsv -e "select count(*) from test_table where age>30 and > id<10100;" > 2018-06-22 10:22:29,328|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: Class path contains > multiple SLF4J bindings. > 2018-06-22 10:22:29,330|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: See > http://www.slf4j.org/codes.html#multiple_bindings for an explanation. > 2018-06-22 10:22:29,335|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: Actual binding is of > type [org.apache.logging.slf4j.Log4jLoggerFactory] > 2018-06-22 10:22:31,408|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|Format tsv is deprecated, > please use tsv2 > 2018-06-22 10:22:31,529|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|Connecting to > jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit > 2018-06-22 10:22:32,031|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|18/06/22 10:22:32 [main]: > INFO jdbc.HiveConnection: Connected to >
[jira] [Commented] (HIVE-20001) With doas set to true, running select query as hrt_qa user on external table fails due to permission denied to read /warehouse/tablespace/managed directory.
[ https://issues.apache.org/jira/browse/HIVE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530276#comment-16530276 ] ASF GitHub Bot commented on HIVE-20001: --- GitHub user beltran opened a pull request: https://github.com/apache/hive/pull/387 HIVE-20001: With doas set to true, running select query as hrt_qa use… …r on external table fails due to permission denied to read /warehouse/tablespace/managed directory Special attention to whether the appropriate create/upgrade sql scripts have been modified. You can merge this pull request into a Git repository by running: $ git pull https://github.com/beltran/hive HIVE-20001-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/387.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #387 commit 7b722374af80172f947f78a80a11565d13ddd4a7 Author: Jaume Marhuenda Date: 2018-07-01T03:47:15Z HIVE-20001: With doas set to true, running select query as hrt_qa user on external table fails due to permission denied to read /warehouse/tablespace/managed directory > With doas set to true, running select query as hrt_qa user on external table > fails due to permission denied to read /warehouse/tablespace/managed > directory. > > > Key: HIVE-20001 > URL: https://issues.apache.org/jira/browse/HIVE-20001 > Project: Hive > Issue Type: Bug >Reporter: Jaume M >Assignee: Jaume M >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20001.1.patch, HIVE-20001.1.patch > > > Hive: With doas set to true, running select query as hrt_qa user on external > table fails due to permission denied to read /warehouse/tablespace/managed > directory. > Steps: > 1. Create a external table. > 2. Set doas to true. > 3. run select count(*) using user hrt_qa. > Table creation query. > {code} > beeline -n hrt_qa -p pwd -u > "jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit" > --outputformat=tsv -e "drop table if exists test_table purge; > create external table test_table(id int, age int) row format delimited fields > terminated by '|' stored as textfile; > load data inpath '/tmp/table1.dat' overwrite into table test_table; > {code} > select count(*) query execution fails > {code} > beeline -n hrt_qa -p pwd -u > "jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit" > --outputformat=tsv -e "select count(*) from test_table where age>30 and > id<10100;" > 2018-06-22 10:22:29,328|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: Class path contains > multiple SLF4J bindings. > 2018-06-22 10:22:29,330|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: See > http://www.slf4j.org/codes.html#multiple_bindings for an explanation. > 2018-06-22 10:22:29,335|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: Actual binding is of > type [org.apache.logging.slf4j.Log4jLoggerFactory] > 2018-06-22 10:22:31,408|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|Format tsv is deprecated, > please use tsv2 > 2018-06-22 10:22:31,529|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|Connecting to > jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit > 2018-06-22 10:22:32,031|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|18/06/22 10:22:32 [main]: > INFO jdbc.HiveConnection:
[jira] [Commented] (HIVE-20057) For ALTER TABLE t SET TBLPROPERTIES ('EXTERNAL'='TRUE'); `TBL_TYPE` attribute change not reflecting for non-CAPS
[ https://issues.apache.org/jira/browse/HIVE-20057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530379#comment-16530379 ] ASF GitHub Bot commented on HIVE-20057: --- GitHub user animenon opened a pull request: https://github.com/apache/hive/pull/388 HIVE-20057: Fix Hive table conversion DESCRIBE table bug Fix for #HIVE-20057 Issue: `Table Type` wrongly shown as `MANAGED_TABLE` after converting table from MANAGED to EXTERNAL using ` ALTER TABLE t SET TBLPROPERTIES ('EXTERNAL'='True')` _(this is shown correctly only for `'EXTERNAL'='TRUE`)_ You can merge this pull request into a Git repository by running: $ git pull https://github.com/animenon/hive patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/388.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #388 commit 1a9674645c3b4e3080f5278f6bea3126b1cebbac Author: Anirudh Date: 2018-07-02T19:53:03Z HIVE-20057: Fix Hive table conversion DESCRIBE table bug `equals` to `equalsIgnoreCase` > For ALTER TABLE t SET TBLPROPERTIES ('EXTERNAL'='TRUE'); `TBL_TYPE` attribute > change not reflecting for non-CAPS > > > Key: HIVE-20057 > URL: https://issues.apache.org/jira/browse/HIVE-20057 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Anirudh >Assignee: Anirudh >Priority: Minor > Labels: pull-request-available > > Hive EXTERNAL table shown as MANAGED after conversion using > > ALTER TABLE t SET TBLPROPERTIES ('EXTERNAL'='True') > > The DESCRIBE FORMATTED shows: > Table Type: MANAGED_TABLE > Table Parameters: > EXTERNAL True > > This is actually a External table but shown wrongly as 'True' was used in > place of 'TRUE' in the ALTER statement. > Issue explained here: > [StakOverflow - Hive Table is MANAGED or > EXTERNAL|https://stackoverflow.com/questions/51103317/hive-table-is-managed-or-external/51142873#51142873] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20057) For ALTER TABLE t SET TBLPROPERTIES ('EXTERNAL'='TRUE'); `TBL_TYPE` attribute change not reflecting for non-CAPS
[ https://issues.apache.org/jira/browse/HIVE-20057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-20057: -- Labels: pull-request-available (was: ) > For ALTER TABLE t SET TBLPROPERTIES ('EXTERNAL'='TRUE'); `TBL_TYPE` attribute > change not reflecting for non-CAPS > > > Key: HIVE-20057 > URL: https://issues.apache.org/jira/browse/HIVE-20057 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Anirudh >Assignee: Anirudh >Priority: Minor > Labels: pull-request-available > > Hive EXTERNAL table shown as MANAGED after conversion using > > ALTER TABLE t SET TBLPROPERTIES ('EXTERNAL'='True') > > The DESCRIBE FORMATTED shows: > Table Type: MANAGED_TABLE > Table Parameters: > EXTERNAL True > > This is actually a External table but shown wrongly as 'True' was used in > place of 'TRUE' in the ALTER statement. > Issue explained here: > [StakOverflow - Hive Table is MANAGED or > EXTERNAL|https://stackoverflow.com/questions/51103317/hive-table-is-managed-or-external/51142873#51142873] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20001) With doas set to true, running select query as hrt_qa user on external table fails due to permission denied to read /warehouse/tablespace/managed directory.
[ https://issues.apache.org/jira/browse/HIVE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530360#comment-16530360 ] ASF GitHub Bot commented on HIVE-20001: --- Github user beltran closed the pull request at: https://github.com/apache/hive/pull/387 > With doas set to true, running select query as hrt_qa user on external table > fails due to permission denied to read /warehouse/tablespace/managed > directory. > > > Key: HIVE-20001 > URL: https://issues.apache.org/jira/browse/HIVE-20001 > Project: Hive > Issue Type: Bug >Reporter: Jaume M >Assignee: Jaume M >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20001.1.patch, HIVE-20001.1.patch, > HIVE-20001.2.patch > > > Hive: With doas set to true, running select query as hrt_qa user on external > table fails due to permission denied to read /warehouse/tablespace/managed > directory. > Steps: > 1. Create a external table. > 2. Set doas to true. > 3. run select count(*) using user hrt_qa. > Table creation query. > {code} > beeline -n hrt_qa -p pwd -u > "jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit" > --outputformat=tsv -e "drop table if exists test_table purge; > create external table test_table(id int, age int) row format delimited fields > terminated by '|' stored as textfile; > load data inpath '/tmp/table1.dat' overwrite into table test_table; > {code} > select count(*) query execution fails > {code} > beeline -n hrt_qa -p pwd -u > "jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit" > --outputformat=tsv -e "select count(*) from test_table where age>30 and > id<10100;" > 2018-06-22 10:22:29,328|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: Class path contains > multiple SLF4J bindings. > 2018-06-22 10:22:29,330|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: See > http://www.slf4j.org/codes.html#multiple_bindings for an explanation. > 2018-06-22 10:22:29,335|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|SLF4J: Actual binding is of > type [org.apache.logging.slf4j.Log4jLoggerFactory] > 2018-06-22 10:22:31,408|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|Format tsv is deprecated, > please use tsv2 > 2018-06-22 10:22:31,529|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|Connecting to > jdbc:hive2://ctr-e138-1518143905142-375925-01-06.hwx.site:2181,ctr-e138-1518143905142-375925-01-05.hwx.site:2181,ctr-e138-1518143905142-375925-01-07.hwx.site:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit > 2018-06-22 10:22:32,031|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|18/06/22 10:22:32 [main]: > INFO jdbc.HiveConnection: Connected to > ctr-e138-1518143905142-375925-01-04.hwx.site:10001 > 2018-06-22 10:22:34,130|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|18/06/22 10:22:34 [main]: > WARN jdbc.HiveConnection: Failed to connect to > ctr-e138-1518143905142-375925-01-04.hwx.site:10001 > 2018-06-22 10:22:34,244|INFO|Thread-126|machine.py:111 - > tee_pipe()||b3a493ec-99be-483e-91fe-4b701ec27ebc|18/06/22 10:22:34 [main]: > WARN jdbc.HiveConnection: Could not open client transport with JDBC Uri: > jdbc:hive2://ctr-e138-1518143905142-375925-01-04.hwx.site:10001/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@example.com;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=/etc/security/serverKeys/hivetruststore.jks;trustStorePassword=changeit: > Failed to open new session: > org.apache.hadoop.hive.ql.metadata.HiveException: >
[jira] [Updated] (HIVE-20052) Arrow serde should fill ArrowColumnVector(Decimal) with the given schema precision/scale
[ https://issues.apache.org/jira/browse/HIVE-20052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-20052: -- Labels: pull-request-available (was: ) > Arrow serde should fill ArrowColumnVector(Decimal) with the given schema > precision/scale > > > Key: HIVE-20052 > URL: https://issues.apache.org/jira/browse/HIVE-20052 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20052.patch > > > Arrow serde should fill ArrowColumnVector with given precision and scale. > When it serializes negative values into Arrow, it throws exceptions that the > precision of the value is not same with the precision of Arrow decimal vector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20052) Arrow serde should fill ArrowColumnVector(Decimal) with the given schema precision/scale
[ https://issues.apache.org/jira/browse/HIVE-20052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530163#comment-16530163 ] ASF GitHub Bot commented on HIVE-20052: --- GitHub user pudidic opened a pull request: https://github.com/apache/hive/pull/386 HIVE-20052: Arrow serde should fill ArrowColumnVector(Decimal) with t… …he given schema precision/scale (Teddy Choi) You can merge this pull request into a Git repository by running: $ git pull https://github.com/pudidic/hive HIVE-20052 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/386.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #386 commit 574485f81609820601bb20557de50143ec56a0d7 Author: Teddy Choi Date: 2018-07-02T16:41:40Z HIVE-20052: Arrow serde should fill ArrowColumnVector(Decimal) with the given schema precision/scale (Teddy Choi) > Arrow serde should fill ArrowColumnVector(Decimal) with the given schema > precision/scale > > > Key: HIVE-20052 > URL: https://issues.apache.org/jira/browse/HIVE-20052 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20052.patch > > > Arrow serde should fill ArrowColumnVector with given precision and scale. > When it serializes negative values into Arrow, it throws exceptions that the > precision of the value is not same with the precision of Arrow decimal vector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19340) Disable timeout of transactions opened by replication task at target cluster
[ https://issues.apache.org/jira/browse/HIVE-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16455740#comment-16455740 ] ASF GitHub Bot commented on HIVE-19340: --- GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/337 HIVE-19340 : Disable timeout of transactions opened by replication ta… The transactions opened by applying EVENT_OPEN_TXN should never be aborted automatically due to time-out. Aborting of transaction started by replication task may leads to inconsistent state at target which needs additional overhead to clean-up. So, it is proposed to mark the transactions opened by replication task as special ones and shouldn't be aborted if heart beat is lost. This helps to ensure all ABORT and COMMIT events will always find the corresponding txn at target to operate. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive BUG-92700 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/337.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #337 commit 317d29c8455ad8aaccf1689c66d79f7bab41cde7 Author: Mahesh Kumar BeheraDate: 2018-04-27T03:24:08Z HIVE-19340 : Disable timeout of transactions opened by replication task at target cluster > Disable timeout of transactions opened by replication task at target cluster > > > Key: HIVE-19340 > URL: https://issues.apache.org/jira/browse/HIVE-19340 > Project: Hive > Issue Type: Sub-task > Components: repl, Transactions >Affects Versions: 3.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: ACID, DR, pull-request-available, replication > Fix For: 3.0.0 > > Attachments: HIVE-19340.01.patch > > > The transactions opened by applying EVENT_OPEN_TXN should never be aborted > automatically due to time-out. Aborting of transaction started by replication > task may leads to inconsistent state at target which needs additional > overhead to clean-up. So, it is proposed to mark the transactions opened by > replication task as special ones and shouldn't be aborted if heart beat is > lost. This helps to ensure all ABORT and COMMIT events will always find the > corresponding txn at target to operate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19130) NPE is thrown when REPL LOAD applied drop partition event.
[ https://issues.apache.org/jira/browse/HIVE-19130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463378#comment-16463378 ] ASF GitHub Bot commented on HIVE-19130: --- Github user sankarh closed the pull request at: https://github.com/apache/hive/pull/332 > NPE is thrown when REPL LOAD applied drop partition event. > -- > > Key: HIVE-19130 > URL: https://issues.apache.org/jira/browse/HIVE-19130 > Project: Hive > Issue Type: Bug > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-19130.01.patch > > > During incremental replication, if we split the events batch as follows, then > the REPL LOAD on second batch throws NPE. > Batch-1: CREATE_TABLE(t1) -> ADD_PARTITION(t1.p1) -> DROP_PARTITION (t1.p1) > Batch-2: DROP_TABLE(t1) -> CREATE_TABLE(t1) -> ADD_PARTITION(t1.p1) -> > DROP_PARTITION (t1.p1) > {code} > 2018-04-05 16:20:36,531 ERROR [HiveServer2-Background-Pool: Thread-107044]: > metadata.Hive (Hive.java:getTable(1219)) - Table catalog_sales_new not found: > new5_tpcds_real_bin_partitioned_orc_1000.catalog_sales_new table not found > 2018-04-05 16:20:36,538 ERROR [HiveServer2-Background-Pool: Thread-107044]: > exec.DDLTask (DDLTask.java:failed(540)) - > org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.exec.DDLTask.dropPartitions(DDLTask.java:4016) > at > org.apache.hadoop.hive.ql.exec.DDLTask.dropTableOrPartitions(DDLTask.java:3983) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:341) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:162) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1765) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1506) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1303) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1165) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:197) > at > org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76) > at > org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at > org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByExpr(Hive.java:2613) > at > org.apache.hadoop.hive.ql.exec.DDLTask.dropPartitions(DDLTask.java:4008) > ... 23 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18988) Support bootstrap replication of ACID tables
[ https://issues.apache.org/jira/browse/HIVE-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463377#comment-16463377 ] ASF GitHub Bot commented on HIVE-18988: --- Github user sankarh closed the pull request at: https://github.com/apache/hive/pull/331 > Support bootstrap replication of ACID tables > > > Key: HIVE-18988 > URL: https://issues.apache.org/jira/browse/HIVE-18988 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, DR, pull-request-available, replication > Fix For: 3.0.0, 3.1.0 > > Attachments: HIVE-18988.01-branch-3.patch, HIVE-18988.01.patch, > HIVE-18988.02.patch, HIVE-18988.03.patch, HIVE-18988.04.patch, > HIVE-18988.05.patch, HIVE-18988.06.patch, HIVE-18988.07.patch > > > Bootstrapping of ACID tables, need special handling to replicate a stable > state of data. > - If ACID feature enables, then perform bootstrap dump for ACID tables with > in read txn. > -> Dump table/partition metadata. > -> Get the list of valid data files for a table using same logic as read txn > do. > -> Dump latest ValidWriteIdList as per current read txn. > - Set the valid last replication state such that it doesn't miss any open > txn started after triggering bootstrap dump. > - If any txns on-going which was opened before triggering bootstrap dump, > then it is not guaranteed that if open_txn event captured for these txns. > Also, if these txns are opened for streaming ingest case, then dumped ACID > table data may include data of open txns which impact snapshot isolation at > target. To avoid that, bootstrap dump should wait for timeout (new > configuration: hive.repl.bootstrap.dump.open.txn.timeout). After timeout, > just force abort those txns and continue. > - If any txns force aborted belongs to a streaming ingest case, then dumped > ACID table data may have aborted data too. So, it is necessary to replicate > the aborted write ids to target to mark those data invalid for any readers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18864) ValidWriteIdList snapshot seems incorrect if obtained after allocating writeId by current transaction.
[ https://issues.apache.org/jira/browse/HIVE-18864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463376#comment-16463376 ] ASF GitHub Bot commented on HIVE-18864: --- Github user sankarh closed the pull request at: https://github.com/apache/hive/pull/316 > ValidWriteIdList snapshot seems incorrect if obtained after allocating > writeId by current transaction. > -- > > Key: HIVE-18864 > URL: https://issues.apache.org/jira/browse/HIVE-18864 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18864.01.patch, HIVE-18864.02.patch > > > For multi-statement txns, it is possible that write on a table happens after > a read. Let's see the below scenario. > # Committed txn=9 writes on table T1 with writeId=5. > # Open txn=10. ValidTxnList(open:null, txn_HWM=10), > # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5). > # Open txn=11, writes on table T1 with writeid=6. > # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5). > # Write table T1 from txn=10 with writeId=7. > # Read table T1 from txn=10. {color:#d04437}*ValidWriteIdList(open:null, > write_HWM=7)*. – This read will able to see rows added by txn=11 which is > still open.{color} > {color:#d04437}So, it is needed to rebuild the open/aborted list of > ValidWriteIdList based on txn_HWM. Any writeId allocated by txnId > txn_HWM > should be marked as open. In this example, *ValidWriteIdList(open:6, > write_HWM=7)* should be generated.{color} > {color:#33}cc{color} [~ekoifman], [~thejas] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-16480) ORC file with empty array and array fails to read
[ https://issues.apache.org/jira/browse/HIVE-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-16480: -- Labels: pull-request-available (was: ) > ORC file with empty array and array fails to read > > > Key: HIVE-16480 > URL: https://issues.apache.org/jira/browse/HIVE-16480 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: David Capwell >Assignee: Owen O'Malley > Labels: pull-request-available > > We have a schema that has a array in it. We were unable to read this > file and digging into ORC it seems that the issue is when the array is empty. > Here is the stack trace > {code:title=EmptyList.log|borderStyle=solid} > ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work > with type float > java.io.IOException: Error reading file: > /var/folders/t8/t5x1031d7mn17f6xpwnkkv_4gn/T/1492619355819-0/file-float.orc > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) > ~[hive-orc-2.1.1.jar:2.1.1] > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135) > ~[hive-exec-2.1.1.jar:2.1.1] > at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.8.0_121] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[na:1.8.0_121] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_121] > at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121] > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > [junit-4.12.jar:4.12] > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > [junit-4.12.jar:4.12] > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > [junit-4.12.jar:4.12] > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > [junit-4.12.jar:4.12] > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > [junit-4.12.jar:4.12] > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > [junit-4.12.jar:4.12] > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12] > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237) > [junit-rt.jar:na] > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > [junit-rt.jar:na] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.8.0_121] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[na:1.8.0_121] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_121] > at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121] > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) > [idea_rt.jar:na] > Caused by: java.io.EOFException: Read past EOF for compressed stream Stream > for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0 > at > org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) > ~[hive-orc-2.1.1.jar:2.1.1] > at > org.apache.orc.impl.SerializationUtils.readFloat(SerializationUtils.java:78) > ~[hive-orc-2.1.1.jar:2.1.1] > at > org.apache.orc.impl.TreeReaderFactory$FloatTreeReader.nextVector(TreeReaderFactory.java:619) > ~[hive-orc-2.1.1.jar:2.1.1] > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > ~[hive-orc-2.1.1.jar:2.1.1] > at > org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154) >
[jira] [Commented] (HIVE-16480) ORC file with empty array and array fails to read
[ https://issues.apache.org/jira/browse/HIVE-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304707#comment-16304707 ] ASF GitHub Bot commented on HIVE-16480: --- GitHub user omalley opened a pull request: https://github.com/apache/hive/pull/285 HIVE-16480 (ORC-285) Empty vector batches of floats or doubles gets EOFException. Signed-off-by: Owen O'MalleyYou can merge this pull request into a Git repository by running: $ git pull https://github.com/omalley/hive hive-16480-2.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/285.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #285 commit 43d7fe2f0fc9baeb311814da1f7a65cfd546145b Author: Owen O'Malley Date: 2017-12-27T17:45:25Z HIVE-16480 (ORC-285) Empty vector batches of floats or doubles gets EOFException. Signed-off-by: Owen O'Malley > ORC file with empty array and array fails to read > > > Key: HIVE-16480 > URL: https://issues.apache.org/jira/browse/HIVE-16480 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: David Capwell >Assignee: Owen O'Malley > Labels: pull-request-available > > We have a schema that has a array in it. We were unable to read this > file and digging into ORC it seems that the issue is when the array is empty. > Here is the stack trace > {code:title=EmptyList.log|borderStyle=solid} > ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work > with type float > java.io.IOException: Error reading file: > /var/folders/t8/t5x1031d7mn17f6xpwnkkv_4gn/T/1492619355819-0/file-float.orc > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) > ~[hive-orc-2.1.1.jar:2.1.1] > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135) > ~[hive-exec-2.1.1.jar:2.1.1] > at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.8.0_121] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[na:1.8.0_121] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_121] > at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121] > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > [junit-4.12.jar:4.12] > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > [junit-4.12.jar:4.12] > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > [junit-4.12.jar:4.12] > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > [junit-4.12.jar:4.12] > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > [junit-4.12.jar:4.12] > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > [junit-4.12.jar:4.12] > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > [junit-4.12.jar:4.12] > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12] > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237) > [junit-rt.jar:na] > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > [junit-rt.jar:na] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.8.0_121] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[na:1.8.0_121] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_121] > at
[jira] [Commented] (HIVE-17580) Remove dependency of get_fields_with_environment_context API to serde
[ https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314309#comment-16314309 ] ASF GitHub Bot commented on HIVE-17580: --- GitHub user vihangk1 opened a pull request: https://github.com/apache/hive/pull/287 HIVE-17580 : Remove standalone-metastore's dependency with serdes Removing the dependency on serdes for the metastore requires a series of changes. I have created multiple commits which hopefully would be easier to review. Each major commit has a descriptive commit message to give a high level idea of what the change is doing. There are still some bits which need to be completed but it would be good to a review. Overview of all the changes done: 1. Creates a new module called serde-api under storage-api like discussed. Although I think we can keep it separate as well. 2. Moved List, Map, Struct, Constant, Primitive, Union ObjectInspectors to serde-api 3. Moved PrimitiveTypeInfo, PrimitiveTypeEntry and TypeInfo to serde-api. 4. Moved TypeInfoParser, TypeInfoFactory to serde-api 5. Added a new class which reading avro storage schema by copying the code from AvroSerde and AvroSerdeUtils. The parsing is done such that String value is first converted into TypeInfos and then into FieldSchemas bypassing the need for ObjectInspectors. In theory we could get rid of TypeInfos as well but that path was getting too difficult with lot of duplicate code between Hive and metastore. 6. Introduces a default storage schema reader. I noticed that most of the serdes use the same logic to parse the metadata information. This code should be refactored to a common place instead of having many copies (one in standalone hms and another set in multiple serdes) 7. Moved HiveChar, HiveVarchar, HiveCharWritable, HiveVarcharWritable to storage-api. I noticed that HiveDecimal is already in storage-api. It probably makes sense to move the other primitive types (timestamp, interval etc)to storage-api as well but it requires storage-api to be upgraded to Java 8. 8. Adds a basic test for the schema reader. I plan to add more tests as this code is reviewed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vihangk1/hive vihangk1_HIVE-17580 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/287.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #287 commit bbfb7dc44904db74a840167c02b07f50a6010b69 Author: Vihang KarajgaonkarDate: 2017-11-09T00:52:39Z HIVE-17580 : Remove dependency of get_fields_with_environment_context API to serde commit d54879845eff10c19bc17bda9e09dda16f6fa295 Author: Vihang Karajgaonkar Date: 2017-11-29T00:54:23Z Moved List, Map, Struct OI to storage-api commit a12d6c7ba3de598c6b6f75da1bd4efcac43036b1 Author: Vihang Karajgaonkar Date: 2017-11-29T04:19:39Z Moved ConstantObjectInspector PrimitiveObjectInspector and UnionObjectInspector commit 13fb832fc2d51958e75d5e609f6781f87449aed8 Author: Vihang Karajgaonkar Date: 2017-12-28T01:25:59Z Moved PrimitiveTypeInfo to serde-api In order to move PrimitiveTypeInfo we need to move the PrimitiveTypeEntry as well. PrimitiveTypeEntry depends on PrimitiveObjectInspectorUtils which cannot be pulled into serde-api. Hence the static final maps are moved to PrimitiveEntry and we provide static access methods to these maps along with the register method to add the key value pairs in the maps commit 9cbc789fd3f4ced7ce66a7313c451b75a154976f Author: Vihang Karajgaonkar Date: 2017-12-28T20:51:39Z Moved the other TypeInfos to serde-api In order to move the other TypeInfo classes to serde-api we need to move the serdeConstants.java as well. This is a thrift generated class. This commit copies the serde.thrift instead of moving. The only reason I did not move it is in case of backwards compatibility reasons (in case someone is using the thrift file location to do something). If it is okay to move serde.thrift from serde module to serde-api we can delete it from serde module in a separate change. The other concern is there are some TypeInfo classes which do some validation like VarCharTypeInfo, DecimalTypeInfo. The validating methods use the actual type implementation like HiveChar, HiveDecimal etc to ensure that the params are under the correct limits. This creates a problem since we cannot bring in the type implementations as well to serde-api. Currently, I have marked these as TODO and commented them out. commit 23aa899a90d17648e560d39602a3bd29bf53661e Author: Vihang Karajgaonkar Date:
[jira] [Updated] (HIVE-17580) Remove dependency of get_fields_with_environment_context API to serde
[ https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-17580: -- Labels: pull-request-available (was: ) > Remove dependency of get_fields_with_environment_context API to serde > - > > Key: HIVE-17580 > URL: https://issues.apache.org/jira/browse/HIVE-17580 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Labels: pull-request-available > > {{get_fields_with_environment_context}} metastore API uses {{Deserializer}} > class to access the fields metadata for the cases where it is stored along > with the data files (avro tables). The problem is Deserializer classes is > defined in hive-serde module and in order to make metastore independent of > Hive we will have to remove this dependency (atleast we should change it to > runtime dependency instead of compile time). > The other option is investigate if we can use SearchArgument to provide this > functionality. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18338) [Client, JDBC] Asynchronous interface through hive JDBC.
[ https://issues.apache.org/jira/browse/HIVE-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18338: -- Labels: pull-request-available (was: ) > [Client, JDBC] Asynchronous interface through hive JDBC. > > > Key: HIVE-18338 > URL: https://issues.apache.org/jira/browse/HIVE-18338 > Project: Hive > Issue Type: Improvement > Components: Clients, JDBC >Affects Versions: 2.3.2 >Reporter: Amruth S >Assignee: Amruth S >Priority: Minor > Labels: pull-request-available > > Lot of users are struggling and rewriting a lot of boiler plate over thrift > to get pure asynchronous capability. > The idea is to expose operation handle, so that clients can persist it and > later can latch on to the same execution. > Let me know your ideas around this. We have solved this already at our org by > tweaking HiveStatement.java. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18338) [Client, JDBC] Asynchronous interface through hive JDBC.
[ https://issues.apache.org/jira/browse/HIVE-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303270#comment-16303270 ] ASF GitHub Bot commented on HIVE-18338: --- GitHub user amrk7s opened a pull request: https://github.com/apache/hive/pull/284 HIVE-18338 Exposing asynchronous execution through hive-jdbc client **Problem statement** Hive JDBC currently exposes 2 methods related to asynchronous execution **executeAsync()** - to trigger a query execution and return immediately. **waitForOperationToComplete()** - which waits till the current execution is complete **blocking the user thread**. This has one problem - If the client process goes down, there is no way to resume queries although hive server is completely asynchronous. **Proposal** If operation handle could be exposed, we can latch on to an active execution of a query. **Code changes** Operation handle is exposed. So client can keep a copy. latchSync() and latchAsync() methods take an operation handle and try to latch on to the current execution in hive server if present You can merge this pull request into a Git repository by running: $ git pull https://github.com/Flipkart/hive async_jdbc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/284.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #284 commit 9afa0ae9a9e2c38be3fbdadab230fdd399ab8e5b Author: amrk7sDate: 2017-12-25T13:20:22Z HIVE-18338 Exposing asynchronous execution through hive-jdbc client > [Client, JDBC] Asynchronous interface through hive JDBC. > > > Key: HIVE-18338 > URL: https://issues.apache.org/jira/browse/HIVE-18338 > Project: Hive > Issue Type: Improvement > Components: Clients, JDBC >Affects Versions: 2.3.2 >Reporter: Amruth S >Assignee: Amruth S >Priority: Minor > Labels: pull-request-available > > Lot of users are struggling and rewriting a lot of boiler plate over thrift > to get pure asynchronous capability. > The idea is to expose operation handle, so that clients can persist it and > later can latch on to the same execution. > Let me know your ideas around this. We have solved this already at our org by > tweaking HiveStatement.java. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18341) Add repl load support for adding "raw" namespace for TDE with same encryption keys
[ https://issues.apache.org/jira/browse/HIVE-18341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321674#comment-16321674 ] ASF GitHub Bot commented on HIVE-18341: --- GitHub user anishek opened a pull request: https://github.com/apache/hive/pull/289 HIVE-18341: Add repl load support for adding "raw" namespace for TDE with same encryption keys You can merge this pull request into a Git repository by running: $ git pull https://github.com/anishek/hive HIVE-18341 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/289.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #289 commit 14b92575fdc97434ec65ad0ce1c54c5f352a992c Author: Anishek AgarwalDate: 2017-12-26T14:11:39Z HIVE-18341: Add repl load support for adding "raw" namespace for TDE with same encryption keys > Add repl load support for adding "raw" namespace for TDE with same encryption > keys > -- > > Key: HIVE-18341 > URL: https://issues.apache.org/jira/browse/HIVE-18341 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Labels: pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18341.0.patch, HIVE-18341.1.patch > > > https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html#Running_as_the_superuser > "a new virtual path prefix, /.reserved/raw/, that gives superusers direct > access to the underlying block data in the filesystem. This allows superusers > to distcp data without needing having access to encryption keys, and also > avoids the overhead of decrypting and re-encrypting data." > We need to introduce a new option in "Repl Load" command that will change the > files being copied in distcp to have this "/.reserved/raw/" namespace before > the file paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18341) Add repl load support for adding "raw" namespace for TDE with same encryption keys
[ https://issues.apache.org/jira/browse/HIVE-18341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18341: -- Labels: pull-request-available (was: ) > Add repl load support for adding "raw" namespace for TDE with same encryption > keys > -- > > Key: HIVE-18341 > URL: https://issues.apache.org/jira/browse/HIVE-18341 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Labels: pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18341.0.patch, HIVE-18341.1.patch > > > https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html#Running_as_the_superuser > "a new virtual path prefix, /.reserved/raw/, that gives superusers direct > access to the underlying block data in the filesystem. This allows superusers > to distcp data without needing having access to encryption keys, and also > avoids the overhead of decrypting and re-encrypting data." > We need to introduce a new option in "Repl Load" command that will change the > files being copied in distcp to have this "/.reserved/raw/" namespace before > the file paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18352) introduce a METADATAONLY option while doing REPL DUMP to allow integrations of other tools
[ https://issues.apache.org/jira/browse/HIVE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312651#comment-16312651 ] ASF GitHub Bot commented on HIVE-18352: --- GitHub user anishek opened a pull request: https://github.com/apache/hive/pull/286 HIVE-18352: introduce a METADATAONLY option while doing REPL DUMP to allow integrations of other tools You can merge this pull request into a Git repository by running: $ git pull https://github.com/anishek/hive HIVE-18352 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/286.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #286 commit c03814bd857cfd70a40aa7a5ec674e73cbfc63f9 Author: Anishek AgarwalDate: 2018-01-03T10:27:04Z HIVE-18352: introduce a METADATAONLY option while doing REPL DUMP to allow integrations of other tools > introduce a METADATAONLY option while doing REPL DUMP to allow integrations > of other tools > --- > > Key: HIVE-18352 > URL: https://issues.apache.org/jira/browse/HIVE-18352 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Labels: pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18352.0.patch > > > * Introduce a METADATAONLY option as part of the REPL DUMP command which will > only try and dump out events for DDL changes, this will be faster as we wont > need scan of files on HDFS for DML changes. > * Additionally since we are only going to dump metadata operations, it might > be useful to include acid tables as well via an option as well. This option > can be removed when ACID support is complete via HIVE-18320 > it will be good to support the "WITH" clause as part of REPL DUMP command as > well (repl dump already supports it viaHIVE-17757) to achieve the above as > that will prevent less changes to the syntax of the statement and provide > more flexibility in future to include additional options as well. > {code} > REPL DUMP [db_name] {FROM [event_id]} {TO [event_id]} {WITH > (['key'='value'],.)} > {code} > This will enable other tools like security / schema registry / metadata > discovery to use replication related subsystem for their needs as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18352) introduce a METADATAONLY option while doing REPL DUMP to allow integrations of other tools
[ https://issues.apache.org/jira/browse/HIVE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18352: -- Labels: pull-request-available (was: ) > introduce a METADATAONLY option while doing REPL DUMP to allow integrations > of other tools > --- > > Key: HIVE-18352 > URL: https://issues.apache.org/jira/browse/HIVE-18352 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Labels: pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18352.0.patch > > > * Introduce a METADATAONLY option as part of the REPL DUMP command which will > only try and dump out events for DDL changes, this will be faster as we wont > need scan of files on HDFS for DML changes. > * Additionally since we are only going to dump metadata operations, it might > be useful to include acid tables as well via an option as well. This option > can be removed when ACID support is complete via HIVE-18320 > it will be good to support the "WITH" clause as part of REPL DUMP command as > well (repl dump already supports it viaHIVE-17757) to achieve the above as > that will prevent less changes to the syntax of the statement and provide > more flexibility in future to include additional options as well. > {code} > REPL DUMP [db_name] {FROM [event_id]} {TO [event_id]} {WITH > (['key'='value'],.)} > {code} > This will enable other tools like security / schema registry / metadata > discovery to use replication related subsystem for their needs as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18423) Hive should support usage of external tables using jdbc
[ https://issues.apache.org/jira/browse/HIVE-18423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320480#comment-16320480 ] ASF GitHub Bot commented on HIVE-18423: --- GitHub user msydoron opened a pull request: https://github.com/apache/hive/pull/288 HIVE-18423 Added full support for jdbc external tables in hive. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shmushkis/hive master_yoni Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/288.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #288 commit 4e5dbb01f8d509bc18a82943c27ec62691baa0f4 Author: msydoronDate: 2017-12-19T09:02:57Z Integrate jethro jdbc updates into master_yoni commit b18008e5bea82528bbd269da129831b3c333ff9c Author: msydoron Date: 2017-12-19T11:21:15Z jethro integration added missing file HiveRelColumnsAlignment.java commit 960f5bbe6bfcbfc9447d6b386a184827edb74a03 Author: msydoron Date: 2017-12-19T13:01:27Z Updated MyRules to use the jdbc convention from the converter. commit 0a290845600a19692f8de8bbb99f344a49633a38 Author: msydoron Date: 2017-12-20T14:21:59Z Added support for hive quering through jdbc for all jethro types. Removed dead code and refactor commit 52317dba6ec78f825db789ea976f91de864dbb1e Author: msydoron Date: 2017-12-21T11:57:54Z Fixed count(*) for HiveSqlCountAggFunction Fixed MySortRule Fixed addLimitToQuery() for sorted queries commit b4a6c87cfa3aba0513d11a21f4e00ed55d02b3be Author: msydoron Date: 2017-12-24T16:20:48Z Invoke the jdbc rules after calcite invokes its rules commit 3a2d1e5af73ba1147cf3f697d6e3c27f3f9ce262 Author: msydoron Date: 2017-12-31T15:04:34Z Added proto support for jethro 'show functions' commit cd59a56d03eb816a51057643654cba04036d9289 Author: msydoron Date: 2018-01-02T13:16:18Z Fixed some issues raised by 'show functions' support commit 4637c02f8708a468ac9b93c52e04b24bf478b590 Author: msydoron Date: 2018-01-02T16:20:40Z Initialize the dialect where it generated commit 23ab2538312f414838245130ae8a5d61fe35844a Author: msydoron Date: 2018-01-08T11:08:24Z Code refactor > Hive should support usage of external tables using jdbc > --- > > Key: HIVE-18423 > URL: https://issues.apache.org/jira/browse/HIVE-18423 > Project: Hive > Issue Type: Improvement >Reporter: Jonathan Doron >Assignee: Jonathan Doron > Labels: pull-request-available > Fix For: 3.0.0 > > > Hive should support the usage of external jdbc tables(and not only external > tables that hold queries), so an Hive user would be able to use the external > table as an hive internal table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18423) Hive should support usage of external tables using jdbc
[ https://issues.apache.org/jira/browse/HIVE-18423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18423: -- Labels: pull-request-available (was: ) > Hive should support usage of external tables using jdbc > --- > > Key: HIVE-18423 > URL: https://issues.apache.org/jira/browse/HIVE-18423 > Project: Hive > Issue Type: Improvement >Reporter: Jonathan Doron >Assignee: Jonathan Doron > Labels: pull-request-available > Fix For: 3.0.0 > > > Hive should support the usage of external jdbc tables(and not only external > tables that hold queries), so an Hive user would be able to use the external > table as an hive internal table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17982) Move metastore specific itests
[ https://issues.apache.org/jira/browse/HIVE-17982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327860#comment-16327860 ] ASF GitHub Bot commented on HIVE-17982: --- Github user asfgit closed the pull request at: https://github.com/apache/hive/pull/279 > Move metastore specific itests > -- > > Key: HIVE-17982 > URL: https://issues.apache.org/jira/browse/HIVE-17982 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-17982.2.patch, HIVE-17982.patch > > > There are a number of tests in itests/hive-unit/.../metastore that are > metastore specific. I suspect they were initially placed in itests only > because the metastore pulling in a few plugins from ql. > Given that we need to be able to release the metastore separately, we need to > be able to test it completely as a standalone entity. So I propose to move a > number of the itests over into standalone-metastore. I will only move tests > that are isolated to the metastore. Anything that tests wider functionality > I plan to leave in itests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17983) Make the standalone metastore generate tarballs etc.
[ https://issues.apache.org/jira/browse/HIVE-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327915#comment-16327915 ] ASF GitHub Bot commented on HIVE-17983: --- GitHub user alanfgates opened a pull request: https://github.com/apache/hive/pull/291 HIVE-17983 Make the standalone metastore generate tarballs etc. See JIRA for full comments. You can merge this pull request into a Git repository by running: $ git pull https://github.com/alanfgates/hive hive17983 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/291.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #291 commit 1ba9b62d9ef488355e1a97dbc7237c1472349a24 Author: Alan GatesDate: 2017-10-19T23:49:38Z HIVE-17983 Make the standalone metastore generate tarballs etc. > Make the standalone metastore generate tarballs etc. > > > Key: HIVE-17983 > URL: https://issues.apache.org/jira/browse/HIVE-17983 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Major > Labels: pull-request-available > > In order to be separately installable the standalone metastore needs its own > tarballs, startup scripts, etc. All of the SQL installation and upgrade > scripts also need to move from metastore to standalone-metastore. > I also plan to create Dockerfiles for different database types so that > developers can test the SQL installation and upgrade scripts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-17983) Make the standalone metastore generate tarballs etc.
[ https://issues.apache.org/jira/browse/HIVE-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-17983: -- Labels: pull-request-available (was: ) > Make the standalone metastore generate tarballs etc. > > > Key: HIVE-17983 > URL: https://issues.apache.org/jira/browse/HIVE-17983 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Major > Labels: pull-request-available > > In order to be separately installable the standalone metastore needs its own > tarballs, startup scripts, etc. All of the SQL installation and upgrade > scripts also need to move from metastore to standalone-metastore. > I also plan to create Dockerfiles for different database types so that > developers can test the SQL installation and upgrade scripts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17331) Path must be used as key type of the pathToAlises
[ https://issues.apache.org/jira/browse/HIVE-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327999#comment-16327999 ] ASF GitHub Bot commented on HIVE-17331: --- GitHub user dosoft opened a pull request: https://github.com/apache/hive/pull/292 HIVE-17331: Use Path instead of String as key type of the pathToAliases You can merge this pull request into a Git repository by running: $ git pull https://github.com/dosoft/hive HIVE-17331 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/292.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #292 commit 661897e01fe48a0f80c3ee4e9168667b7d926ba9 Author: Oleg DanilovDate: 2017-08-16T10:34:39Z HIVE-17331: Use Path instead of String as key type of the pathToAliases > Path must be used as key type of the pathToAlises > - > > Key: HIVE-17331 > URL: https://issues.apache.org/jira/browse/HIVE-17331 > Project: Hive > Issue Type: Bug >Reporter: Oleg Danilov >Assignee: Oleg Danilov >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-17331.patch > > > This code uses String instead of Path as key type of the pathToAliases map, > so seems like get(String) always null. > +*GenMapRedUtils.java*+ > {code:java} > for (int pos = 0; pos < size; pos++) { > String taskTmpDir = taskTmpDirLst.get(pos); > TableDesc tt_desc = tt_descLst.get(pos); > MapWork mWork = plan.getMapWork(); > if (mWork.getPathToAliases().get(taskTmpDir) == null) { > taskTmpDir = taskTmpDir.intern(); > Path taskTmpDirPath = > StringInternUtils.internUriStringsInPath(new Path(taskTmpDir)); > mWork.removePathToAlias(taskTmpDirPath); > mWork.addPathToAlias(taskTmpDirPath, taskTmpDir); > mWork.addPathToPartitionInfo(taskTmpDirPath, new > PartitionDesc(tt_desc, null)); > mWork.getAliasToWork().put(taskTmpDir, topOperators.get(pos)); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18505) Added external hive configuration to prepDb in TxnDbUtil
[ https://issues.apache.org/jira/browse/HIVE-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333565#comment-16333565 ] ASF GitHub Bot commented on HIVE-18505: --- Github user chandulal closed the pull request at: https://github.com/apache/hive/pull/293 > Added external hive configuration to prepDb in TxnDbUtil > > > Key: HIVE-18505 > URL: https://issues.apache.org/jira/browse/HIVE-18505 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Chandu Kavar >Assignee: Chandu Kavar >Priority: Minor > Labels: pull-request-available > > In Hive Metastore, We have TxtDbUtil.java and it contains few utils required > for tests. > There is prepDb() method, it is creating connection and execute some system > queries in order to prepare db. While creating connection it's create new > HiveConf object and not taking configs from outside. > TxtDbUtil.java should also contains prepDb method that can accept external > hive configs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18505) Added external hive configuration to prepDb in TxnDbUtil
[ https://issues.apache.org/jira/browse/HIVE-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333411#comment-16333411 ] ASF GitHub Bot commented on HIVE-18505: --- GitHub user chandulal opened a pull request: https://github.com/apache/hive/pull/293 HIVE-18505 : Adding prepDb method that accept hive configs from outside You can merge this pull request into a Git repository by running: $ git pull https://github.com/chandulal/apache-hive master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/293.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #293 commit 08d8de29831c9bea975107d5264894812337803b Author: Chandu KavarDate: 2018-01-21T06:20:15Z HIVE-18505 : Adding prepDb method that accept hive configs from outside > Added external hive configuration to prepDb in TxnDbUtil > > > Key: HIVE-18505 > URL: https://issues.apache.org/jira/browse/HIVE-18505 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Chandu Kavar >Assignee: Chandu Kavar >Priority: Minor > Labels: pull-request-available > > In Hive Metastore, We have TxtDbUtil.java and it contains few utils required > for tests. > There is prepDb() method, it is creating connection and execute some system > queries in order to prepare db. While creating connection it's create new > HiveConf object and not taking configs from outside. > TxtDbUtil.java should also contains prepDb method that can accept external > hive configs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18505) Added external hive configuration to prepDb in TxnDbUtil
[ https://issues.apache.org/jira/browse/HIVE-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18505: -- Labels: pull-request-available (was: ) > Added external hive configuration to prepDb in TxnDbUtil > > > Key: HIVE-18505 > URL: https://issues.apache.org/jira/browse/HIVE-18505 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Chandu Kavar >Assignee: Chandu Kavar >Priority: Minor > Labels: pull-request-available > > In Hive Metastore, We have TxtDbUtil.java and it contains few utils required > for tests. > There is prepDb() method, it is creating connection and execute some system > queries in order to prepare db. While creating connection it's create new > HiveConf object and not taking configs from outside. > TxtDbUtil.java should also contains prepDb method that can accept external > hive configs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-15631) Optimize for hive client logs , you can filter the log for each session itself.
[ https://issues.apache.org/jira/browse/HIVE-15631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334045#comment-16334045 ] ASF GitHub Bot commented on HIVE-15631: --- GitHub user Tartarus0zm opened a pull request: https://github.com/apache/hive/pull/295 HIVE-15631 When the Hive client is started, the sessionid is printed from the console. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Tartarus0zm/hive console_sessionid Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/295.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #295 commit 1d0c2d606013498b246f5a30b1972ad8b10cb9a2 Author: TartarusDate: 2018-01-22T08:47:56Z HIVE-15631 When the Hive client is started, the sessionid is printed from the console. > Optimize for hive client logs , you can filter the log for each session > itself. > --- > > Key: HIVE-15631 > URL: https://issues.apache.org/jira/browse/HIVE-15631 > Project: Hive > Issue Type: Improvement > Components: CLI, Clients, Hive >Reporter: tartarus >Assignee: tartarus >Priority: Major > Labels: pull-request-available > Attachments: HIVE_15631.patch, image-2018-01-22-16-37-26-065.png, > image-2018-01-22-16-38-20-502.png > > Original Estimate: 24h > Remaining Estimate: 24h > > We have several hadoop cluster, about 15 thousand nodes. Every day we use > hive to submit above 100 thousand jobs. > So we have a large file of hive logs on every client host every day, but i > don not know the logs of my session submitted was which line. > So i hope to print the hive.session.id on every line of logs, and then i > could use grep to find the logs of my session submitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17580) Remove dependency of get_fields_with_environment_context API to serde
[ https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333955#comment-16333955 ] ASF GitHub Bot commented on HIVE-17580: --- GitHub user vihangk1 opened a pull request: https://github.com/apache/hive/pull/294 HIVE-17580 Remove dependency of get_fields_with_environment_context API to serde This is an alternative approach to the solve the dependencies with serdes for get_fields HMS API. The earlier attempt for HIVE-17580 was very disruptive since it attempted to move TypeInfo, and various Type implementations to storage-api and also created another module called serde-api. This patch is a lot more cleaner and less disruptive. Instead of moving TypeInfo, it creates similar classes in standalone-metastore. The PR is broken into multiple commits with descriptive commit messages. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vihangk1/hive vihangk1_HIVE-17580v2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/294.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #294 commit 708443af3f6356ab73133e271cf00e3418ced8ef Author: Vihang KarajgaonkarDate: 2018-01-21T23:54:04Z Added MetastoreTypeInfo similar to TypeInfo This patch adds classes similar to TypeInfo called MetastoreTypeInfo in standalone-metastore. Ideally, we should move TypeInfo to standalone-metastore since they store the information about types. However, moving TypeInfo to standalone-metastore is non-trivial effort primarily because of the below reasons: 1. TypeInfo is annotated as Public API. This means we can only alter/move these classes in a compatible way. 2. Directly moving these classes is not straight-forward because TypeInfo uses PrimitiveEntry class which internally maps the TypeInfo to Type implementations. Ideally metastore should not use Type implementation which makes it harder to move the TypeInfo directly. However, if we are ready to break compatibility, then TypeInfo broken such that it doesn't use PrimitiveEntry directly. In such a world TypeInfo will store just what it needs to store. Metadata of Types i.e the type category, its qualified name, whether its a parameterized type or not and if yes, how do we validate the parameters. I am assuming that breaking TypeInfo is a no-go and hence I am copying the relevant code from TypeInfo to Metastore and calling it MetastoreTypeInfo. MetastoreTypeInfo and its sub-classes are used by TypeInfoParser (also copied) to parse the column type strings into TypeInfos. commit 6ec0efa59408c355cfa9aec7fd9dd59d3545aff2 Author: Vihang Karajgaonkar Date: 2018-01-03T19:45:32Z Add avro storeage schema reader This commit adds a AvroStorageSchemaReader which reads the Avro schema files both for external schema and regular avro tables. Most of the util methods are in AvroSchemaUtils class which has methods copied from AvroSerDeUtils. Some of the needed classes like SchemaResolutionProblem, InstanceCache, SchemaToTypeInfo, TypeInfoToSchema are also copied from Hive. The constants defined in AvroSerde are copied in AvroSerdeConstants. The class AvroFieldSchemaGenerator converts the AvroSchema into List of FieldSchema which is returned by the AvroStorageSchemaReader Avro schema reader uses MetastoreTypeInfo and MetastoreTypeInfoParser introduced earlier commit b0f6d1df1ddb627e0f3c1cff3a164c9397337be0 Author: Vihang Karajgaonkar Date: 2018-01-04T01:02:40Z Introduce default storage schema reader This change introduces a default storage schema reader which copies the common code from serdes initialization method and uses it to parse the column name, type and comments from the table properties. For custom storage schema reades like Avro we will have to add more schema readers as and when required commit 5ae977a0bf3fd54389671bed86322d3d4652bc20 Author: Vihang Karajgaonkar Date: 2018-01-04T19:18:03Z Integrates the avro schema reader into the DefaultStorageaSchemaReader commit 2074b16e12c1bdc7ef3781f50e01ab4dd4c71890 Author: Vihang Karajgaonkar Date: 2018-01-05T02:38:28Z Added a test for getFields method in standalone-metastore commit 4159b5ee9852b41a64489274040e79dbddad54f1 Author: Vihang Karajgaonkar Date: 2018-01-22T07:16:13Z HIVE-18508 : Port schema changes from HIVE-14498 to standalone-metastore > Remove dependency of get_fields_with_environment_context API to serde > - > > Key:
[jira] [Commented] (HIVE-14660) ArrayIndexOutOfBoundsException on delete
[ https://issues.apache.org/jira/browse/HIVE-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340937#comment-16340937 ] ASF GitHub Bot commented on HIVE-14660: --- Github user bonnetb closed the pull request at: https://github.com/apache/hive/pull/100 > ArrayIndexOutOfBoundsException on delete > > > Key: HIVE-14660 > URL: https://issues.apache.org/jira/browse/HIVE-14660 > Project: Hive > Issue Type: Bug > Components: Query Processor, Transactions >Affects Versions: 1.2.1 >Reporter: Benjamin BONNET >Assignee: Benjamin BONNET >Priority: Major > Labels: pull-request-available > Attachments: HIVE-14660.1-banch-1.2.patch > > > Hi, > DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException. > That bug occurs at Reduce phase when there are less reducers than the number > of the table buckets. > In order to reproduce, create a simple ACID table : > {code:sql} > CREATE TABLE test (`cle` bigint,`valeur` string) > PARTITIONED BY (`annee` string) > CLUSTERED BY (cle) INTO 5 BUCKETS > TBLPROPERTIES ('transactional'='true'); > {code} > Populate it with lines distributed among all buckets, with random values and > a few partitions. > Force the Reducers to be less than the buckets : > {code:sql} > set mapred.reduce.tasks=1; > {code} > Then execute a delete that will remove many lines from all the buckets. > {code:sql} > DELETE FROM test WHERE valeur<'some_value'; > {code} > Then you will get an ArrayIndexOutOfBoundsException : > {code} > 2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 5 > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343) > ... 17 more > {code} > Adding logs into FileSinkOperator, one sees the operator deals with buckets > 0, 1, 2, 3, 4, then 0 again and it fails at line 769 : actually each time you > switch bucket, you move forwards in a 5 (number of buckets) elements array. > So when you get bucket 0 for the second time, you get out of the array... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-14660) ArrayIndexOutOfBoundsException on delete
[ https://issues.apache.org/jira/browse/HIVE-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-14660: -- Labels: pull-request-available (was: ) > ArrayIndexOutOfBoundsException on delete > > > Key: HIVE-14660 > URL: https://issues.apache.org/jira/browse/HIVE-14660 > Project: Hive > Issue Type: Bug > Components: Query Processor, Transactions >Affects Versions: 1.2.1 >Reporter: Benjamin BONNET >Assignee: Benjamin BONNET >Priority: Major > Labels: pull-request-available > Attachments: HIVE-14660.1-banch-1.2.patch > > > Hi, > DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException. > That bug occurs at Reduce phase when there are less reducers than the number > of the table buckets. > In order to reproduce, create a simple ACID table : > {code:sql} > CREATE TABLE test (`cle` bigint,`valeur` string) > PARTITIONED BY (`annee` string) > CLUSTERED BY (cle) INTO 5 BUCKETS > TBLPROPERTIES ('transactional'='true'); > {code} > Populate it with lines distributed among all buckets, with random values and > a few partitions. > Force the Reducers to be less than the buckets : > {code:sql} > set mapred.reduce.tasks=1; > {code} > Then execute a delete that will remove many lines from all the buckets. > {code:sql} > DELETE FROM test WHERE valeur<'some_value'; > {code} > Then you will get an ArrayIndexOutOfBoundsException : > {code} > 2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 5 > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343) > ... 17 more > {code} > Adding logs into FileSinkOperator, one sees the operator deals with buckets > 0, 1, 2, 3, 4, then 0 again and it fails at line 769 : actually each time you > switch bucket, you move forwards in a 5 (number of buckets) elements array. > So when you get bucket 0 for the second time, you get out of the array... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-14660) ArrayIndexOutOfBoundsException on delete
[ https://issues.apache.org/jira/browse/HIVE-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340945#comment-16340945 ] ASF GitHub Bot commented on HIVE-14660: --- GitHub user bonnetb opened a pull request: https://github.com/apache/hive/pull/299 HIVE-14660 : ArrayIndexOutOfBounds on delete See https://issues.apache.org/jira/browse/HIVE-14660 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bonnetb/hive HIVE-14660 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/299.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #299 commit 323f4bfa92835921780c057082b440bf54a7f5c8 Author: Benjamin BONNETDate: 2016-08-27T20:20:15Z HIVE-14660 : ArrayIndexOutOfBounds on delete > ArrayIndexOutOfBoundsException on delete > > > Key: HIVE-14660 > URL: https://issues.apache.org/jira/browse/HIVE-14660 > Project: Hive > Issue Type: Bug > Components: Query Processor, Transactions >Affects Versions: 1.2.1 >Reporter: Benjamin BONNET >Assignee: Benjamin BONNET >Priority: Major > Labels: pull-request-available > Attachments: HIVE-14660.1-banch-1.2.patch > > > Hi, > DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException. > That bug occurs at Reduce phase when there are less reducers than the number > of the table buckets. > In order to reproduce, create a simple ACID table : > {code:sql} > CREATE TABLE test (`cle` bigint,`valeur` string) > PARTITIONED BY (`annee` string) > CLUSTERED BY (cle) INTO 5 BUCKETS > TBLPROPERTIES ('transactional'='true'); > {code} > Populate it with lines distributed among all buckets, with random values and > a few partitions. > Force the Reducers to be less than the buckets : > {code:sql} > set mapred.reduce.tasks=1; > {code} > Then execute a delete that will remove many lines from all the buckets. > {code:sql} > DELETE FROM test WHERE valeur<'some_value'; > {code} > Then you will get an ArrayIndexOutOfBoundsException : > {code} > 2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 5 > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343) > ... 17 more > {code} > Adding logs into FileSinkOperator, one sees the operator deals with buckets > 0, 1, 2, 3, 4, then 0 again and it fails at
[jira] [Updated] (HIVE-17331) Path must be used as key type of the pathToAlises
[ https://issues.apache.org/jira/browse/HIVE-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-17331: -- Labels: pull-request-available (was: ) > Path must be used as key type of the pathToAlises > - > > Key: HIVE-17331 > URL: https://issues.apache.org/jira/browse/HIVE-17331 > Project: Hive > Issue Type: Bug >Reporter: Oleg Danilov >Assignee: Oleg Danilov >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-17331.patch > > > This code uses String instead of Path as key type of the pathToAliases map, > so seems like get(String) always null. > +*GenMapRedUtils.java*+ > {code:java} > for (int pos = 0; pos < size; pos++) { > String taskTmpDir = taskTmpDirLst.get(pos); > TableDesc tt_desc = tt_descLst.get(pos); > MapWork mWork = plan.getMapWork(); > if (mWork.getPathToAliases().get(taskTmpDir) == null) { > taskTmpDir = taskTmpDir.intern(); > Path taskTmpDirPath = > StringInternUtils.internUriStringsInPath(new Path(taskTmpDir)); > mWork.removePathToAlias(taskTmpDirPath); > mWork.addPathToAlias(taskTmpDirPath, taskTmpDir); > mWork.addPathToPartitionInfo(taskTmpDirPath, new > PartitionDesc(tt_desc, null)); > mWork.getAliasToWork().put(taskTmpDir, topOperators.get(pos)); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17331) Path must be used as key type of the pathToAlises
[ https://issues.apache.org/jira/browse/HIVE-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327994#comment-16327994 ] ASF GitHub Bot commented on HIVE-17331: --- Github user dosoft closed the pull request at: https://github.com/apache/hive/pull/233 > Path must be used as key type of the pathToAlises > - > > Key: HIVE-17331 > URL: https://issues.apache.org/jira/browse/HIVE-17331 > Project: Hive > Issue Type: Bug >Reporter: Oleg Danilov >Assignee: Oleg Danilov >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-17331.patch > > > This code uses String instead of Path as key type of the pathToAliases map, > so seems like get(String) always null. > +*GenMapRedUtils.java*+ > {code:java} > for (int pos = 0; pos < size; pos++) { > String taskTmpDir = taskTmpDirLst.get(pos); > TableDesc tt_desc = tt_descLst.get(pos); > MapWork mWork = plan.getMapWork(); > if (mWork.getPathToAliases().get(taskTmpDir) == null) { > taskTmpDir = taskTmpDir.intern(); > Path taskTmpDirPath = > StringInternUtils.internUriStringsInPath(new Path(taskTmpDir)); > mWork.removePathToAlias(taskTmpDirPath); > mWork.addPathToAlias(taskTmpDirPath, taskTmpDir); > mWork.addPathToPartitionInfo(taskTmpDirPath, new > PartitionDesc(tt_desc, null)); > mWork.getAliasToWork().put(taskTmpDir, topOperators.get(pos)); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID
[ https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325734#comment-16325734 ] ASF GitHub Bot commented on HIVE-18192: --- GitHub user sankarh opened a pull request: https://github.com/apache/hive/pull/290 HIVE-18192: Introduce WriteID per table rather than using global transaction ID You can merge this pull request into a Git repository by running: $ git pull https://github.com/sankarh/hive HIVE-18192 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/290.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #290 commit ced6d749a7e65c42ed31a8af9e38faaf5941251a Author: Sankar HariappanDate: 2018-01-03T05:47:38Z HIVE-18192: Introduce WriteID per table rather than using global transaction ID > Introduce WriteID per table rather than using global transaction ID > --- > > Key: HIVE-18192 > URL: https://issues.apache.org/jira/browse/HIVE-18192 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: Sankar Hariappan > Labels: ACID, DR, pull-request-available > Fix For: 3.0.0 > > > To support ACID replication, we will be introducing a per table write Id > which will replace the transaction id in the primary key for each row in a > ACID table. > The current primary key is determined via > > which will move to > > a persistable map of global txn id -> to table -> write id for that table has > to be maintained to now allow Snapshot isolation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID
[ https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18192: -- Labels: ACID DR pull-request-available (was: ACID DR) > Introduce WriteID per table rather than using global transaction ID > --- > > Key: HIVE-18192 > URL: https://issues.apache.org/jira/browse/HIVE-18192 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: Sankar Hariappan > Labels: ACID, DR, pull-request-available > Fix For: 3.0.0 > > > To support ACID replication, we will be introducing a per table write Id > which will replace the transaction id in the primary key for each row in a > ACID table. > The current primary key is determined via > > which will move to > > a persistable map of global txn id -> to table -> write id for that table has > to be maintained to now allow Snapshot isolation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18478) Drop of temp table creating recycle files at CM path
[ https://issues.apache.org/jira/browse/HIVE-18478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18478: -- Labels: pull-request-available (was: ) > Drop of temp table creating recycle files at CM path > > > Key: HIVE-18478 > URL: https://issues.apache.org/jira/browse/HIVE-18478 > Project: Hive > Issue Type: Sub-task > Components: Hive, HiveServer2 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Minor > Labels: pull-request-available > Fix For: 3.0.0 > > > Drop TEMP table operation invokes deleteDir which moves the file to $CMROOT > which is not needed as temp tables need not be replicated -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18478) Drop of temp table creating recycle files at CM path
[ https://issues.apache.org/jira/browse/HIVE-18478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339437#comment-16339437 ] ASF GitHub Bot commented on HIVE-18478: --- GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/298 HIVE-18478: Avoiding creation of CM recycle file in case of temp table In case of drop table, truncate table, load table etc, the table info is deleted which can cause issues during replication. To solve this, the old files are stored in the CM directory to be used by replication later. But for temporary tables, replication is not done and thus these files creation is not required. So extra checks are added to avoid creation of recycle files in case of temp tables. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive HIVE-18478 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/298.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #298 commit 2a5dd87618c1a323a230eafe1906a7ad8b9e7af7 Author: Mahesh Kumar BeheraDate: 2018-01-25T16:06:27Z HIVE-18478: Avoiding creation of CM recycle file in case of temp table > Drop of temp table creating recycle files at CM path > > > Key: HIVE-18478 > URL: https://issues.apache.org/jira/browse/HIVE-18478 > Project: Hive > Issue Type: Sub-task > Components: Hive, HiveServer2 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Minor > Labels: pull-request-available > Fix For: 3.0.0 > > > Drop TEMP table operation invokes deleteDir which moves the file to $CMROOT > which is not needed as temp tables need not be replicated -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-15631) Optimize for hive client logs , you can filter the log for each session itself.
[ https://issues.apache.org/jira/browse/HIVE-15631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339270#comment-16339270 ] ASF GitHub Bot commented on HIVE-15631: --- Github user Tartarus0zm closed the pull request at: https://github.com/apache/hive/pull/295 > Optimize for hive client logs , you can filter the log for each session > itself. > --- > > Key: HIVE-15631 > URL: https://issues.apache.org/jira/browse/HIVE-15631 > Project: Hive > Issue Type: Improvement > Components: CLI, Clients, Hive >Reporter: tartarus >Assignee: tartarus >Priority: Major > Labels: pull-request-available > Attachments: HIVE-15631.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > We have several hadoop cluster, about 15 thousand nodes. Every day we use > hive to submit above 100 thousand jobs. > So we have a large file of hive logs on every client host every day, but i > don not know the logs of my session submitted was which line. > So i hope to print the hive.session.id on every line of logs, and then i > could use grep to find the logs of my session submitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18543) Add print sessionid in console
[ https://issues.apache.org/jira/browse/HIVE-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18543: -- Labels: pull-request-available (was: ) > Add print sessionid in console > -- > > Key: HIVE-18543 > URL: https://issues.apache.org/jira/browse/HIVE-18543 > Project: Hive > Issue Type: Improvement > Components: CLI, Clients >Affects Versions: 2.3.2 > Environment: CentOS6.5 > Hive-1.2.1 > Hive-2.3.2 >Reporter: tartarus >Assignee: tartarus >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Hive client log file already contains sessionid information, but the console > does not have sessionid information, the user can not be associated with the > log well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18543) Add print sessionid in console
[ https://issues.apache.org/jira/browse/HIVE-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339276#comment-16339276 ] ASF GitHub Bot commented on HIVE-18543: --- GitHub user Tartarus0zm opened a pull request: https://github.com/apache/hive/pull/297 HIVE-18543 Add print sessionid in console You can merge this pull request into a Git repository by running: $ git pull https://github.com/Tartarus0zm/hive patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/297.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #297 commit 2af02a96de32ede92921a086b04e68e209a27b39 Author: zhangmangDate: 2018-01-25T14:23:51Z Add print sessionid in console > Add print sessionid in console > -- > > Key: HIVE-18543 > URL: https://issues.apache.org/jira/browse/HIVE-18543 > Project: Hive > Issue Type: Improvement > Components: CLI, Clients >Affects Versions: 2.3.2 > Environment: CentOS6.5 > Hive-1.2.1 > Hive-2.3.2 >Reporter: tartarus >Assignee: tartarus >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Hive client log file already contains sessionid information, but the console > does not have sessionid information, the user can not be associated with the > log well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18467) support whole warehouse dump / load + create/drop database events
[ https://issues.apache.org/jira/browse/HIVE-18467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18467: -- Labels: pull-request-available (was: ) > support whole warehouse dump / load + create/drop database events > - > > Key: HIVE-18467 > URL: https://issues.apache.org/jira/browse/HIVE-18467 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18467.0.patch > > > A complete hive warehouse might be required to replicate to a DR site for > certain use cases and rather than allowing only a database name in the REPL > DUMP commands, we should allow dumping of all databases using the "*" option > as in > _REPL DUMP *_ > On the repl load side there will not be an option to specify the database > name when loading from a location used to dump multiple databases, hence only > _REPL LOAD FROM [location]_ would be supported when dumping via _REPL DUMP *_ > Additionally, incremental dumps will go through all events across databases > in a warehouse and hence CREATE / DROP Database events have to be serialized > correctly to allow repl load to create them correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18467) support whole warehouse dump / load + create/drop database events
[ https://issues.apache.org/jira/browse/HIVE-18467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344709#comment-16344709 ] ASF GitHub Bot commented on HIVE-18467: --- GitHub user anishek opened a pull request: https://github.com/apache/hive/pull/300 HIVE-18467: support whole warehouse dump / load + create/drop database events You can merge this pull request into a Git repository by running: $ git pull https://github.com/anishek/hive HIVE-18467 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/300.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #300 commit 59e3afde676cb7666bc202858c2bcdc6958b0861 Author: Anishek AgarwalDate: 2018-01-19T08:01:28Z HIVE-18467: support whole warehouse dump / load + create/drop database events > support whole warehouse dump / load + create/drop database events > - > > Key: HIVE-18467 > URL: https://issues.apache.org/jira/browse/HIVE-18467 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18467.0.patch > > > A complete hive warehouse might be required to replicate to a DR site for > certain use cases and rather than allowing only a database name in the REPL > DUMP commands, we should allow dumping of all databases using the "*" option > as in > _REPL DUMP *_ > On the repl load side there will not be an option to specify the database > name when loading from a location used to dump multiple databases, hence only > _REPL LOAD FROM [location]_ would be supported when dumping via _REPL DUMP *_ > Additionally, incremental dumps will go through all events across databases > in a warehouse and hence CREATE / DROP Database events have to be serialized > correctly to allow repl load to create them correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18031) Support replication for Alter Database operation.
[ https://issues.apache.org/jira/browse/HIVE-18031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346321#comment-16346321 ] ASF GitHub Bot commented on HIVE-18031: --- Github user sankarh closed the pull request at: https://github.com/apache/hive/pull/280 > Support replication for Alter Database operation. > - > > Key: HIVE-18031 > URL: https://issues.apache.org/jira/browse/HIVE-18031 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, pull-request-available, replication > Fix For: 3.0.0 > > Attachments: HIVE-18031.01.patch, HIVE-18031.02.patch > > > Currently alter database operations to alter the database properties or owner > info are not generating any events due to which it is not getting replicated. > Need to add an event for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18581) Replication events should use lower case db object names
[ https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349878#comment-16349878 ] ASF GitHub Bot commented on HIVE-18581: --- Github user anishek closed the pull request at: https://github.com/apache/hive/pull/304 > Replication events should use lower case db object names > > > Key: HIVE-18581 > URL: https://issues.apache.org/jira/browse/HIVE-18581 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: anishek >Assignee: anishek >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch > > > events generated by replication should include the database / tables / > partitions / function names in lower case. this will prevent other > applications to explicitly do case insensitive match of objects using names. > in hive all db object names as specified above are explicitly converted to > lower case when comparing between objects of same types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18478) Data files deleted from temp table should not be recycled to CM path
[ https://issues.apache.org/jira/browse/HIVE-18478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349949#comment-16349949 ] ASF GitHub Bot commented on HIVE-18478: --- Github user maheshk114 closed the pull request at: https://github.com/apache/hive/pull/298 > Data files deleted from temp table should not be recycled to CM path > > > Key: HIVE-18478 > URL: https://issues.apache.org/jira/browse/HIVE-18478 > Project: Hive > Issue Type: Sub-task > Components: Hive, HiveServer2 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Minor > Labels: pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18478.01.patch, HIVE-18478.02.patch, > HIVE-18478.03.patch > > > Drop TEMP table operation invokes deleteDir which moves the file to $CMROOT > which is not needed as temp tables need not be replicated -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18543) Add print sessionid in console
[ https://issues.apache.org/jira/browse/HIVE-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346607#comment-16346607 ] ASF GitHub Bot commented on HIVE-18543: --- GitHub user Tartarus0zm opened a pull request: https://github.com/apache/hive/pull/303 HIVE-18543 Add print sessionid in console You can merge this pull request into a Git repository by running: $ git pull https://github.com/Tartarus0zm/hive console_sessionid Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/303.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #303 commit 9a89f7774a113ba11e42ed1183d60c4ea86bb67d Author: TartarusDate: 2018-01-31T11:00:44Z HIVE-18543 Add print sessionid in console > Add print sessionid in console > -- > > Key: HIVE-18543 > URL: https://issues.apache.org/jira/browse/HIVE-18543 > Project: Hive > Issue Type: Improvement > Components: CLI, Clients >Affects Versions: 2.3.2 > Environment: CentOS6.5 > Hive-1.2.1 > Hive-2.3.2 >Reporter: tartarus >Assignee: tartarus >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE_18543.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Hive client log file already contains sessionid information, but the console > does not have sessionid information, the user can not be associated with the > log well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18581) Replication events should use lower case db object names
[ https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18581: -- Labels: pull-request-available (was: ) > Replication events should use lower case db object names > > > Key: HIVE-18581 > URL: https://issues.apache.org/jira/browse/HIVE-18581 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: anishek >Assignee: anishek >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-18581.0.patch > > > events generated by replication should include the database / tables / > partitions / function names in lower case. this will prevent other > applications to explicitly do case insensitive match of objects using names. > in hive all db object names as specified above are explicitly converted to > lower case when comparing between objects of same types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18581) Replication events should use lower case db object names
[ https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348164#comment-16348164 ] ASF GitHub Bot commented on HIVE-18581: --- GitHub user anishek opened a pull request: https://github.com/apache/hive/pull/304 HIVE-18581: Replication events should use lower case db object names You can merge this pull request into a Git repository by running: $ git pull https://github.com/anishek/hive HIVE-18581 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/304.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #304 commit 553f0de490d49b77294a70875264819b387e2a45 Author: Anishek AgarwalDate: 2018-01-31T10:12:31Z HIVE-18581: Replication events should use lower case db object names > Replication events should use lower case db object names > > > Key: HIVE-18581 > URL: https://issues.apache.org/jira/browse/HIVE-18581 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: anishek >Assignee: anishek >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-18581.0.patch > > > events generated by replication should include the database / tables / > partitions / function names in lower case. this will prevent other > applications to explicitly do case insensitive match of objects using names. > in hive all db object names as specified above are explicitly converted to > lower case when comparing between objects of same types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18679) create/replicate open transaction event
[ https://issues.apache.org/jira/browse/HIVE-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360819#comment-16360819 ] ASF GitHub Bot commented on HIVE-18679: --- GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/305 HIVE-18679 : create/replicate open transaction event EVENT_OPEN_TXN: Source Warehouse: Create new event type EVENT_OPEN_TXN with related message format etc. When any transaction is opened either by auto-commit mode or multi-statement mode, need to capture this event. Repl dump should read this event from EventNotificationTable and dump the message. Target Warehouse: Repl load should read the event from the dump and get the message. Open a txn in target warehouse. Create a map of source txn ID against target txn ID and persist the same in metastore. There should be one map per replication policy (DBName.* incase of DB level replication, DBName.TableName incase of table level replication) You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive BUG-95520 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/305.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #305 commit 4a855de860ccec6e37e4c16ebbddee575e9ae2f2 Author: Mahesh Kumar BeheraDate: 2018-02-12T05:07:01Z HIVE-18679 : create/replicate open transaction event commit c745a4066b31075004b96200da079b4dd4fd2743 Author: Mahesh Kumar Behera Date: 2018-02-12T14:29:54Z HIVE-18679 : create/replicate open transaction event : rebased with Alan's change > create/replicate open transaction event > --- > > Key: HIVE-18679 > URL: https://issues.apache.org/jira/browse/HIVE-18679 > Project: Hive > Issue Type: Bug > Components: repl, Transactions >Affects Versions: 3.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > > *EVENT_OPEN_TXN:* > *Source Warehouse:* > - Create new event type EVENT_OPEN_TXN with related message format etc. > - When any transaction is opened either by auto-commit mode or > multi-statement mode, need to capture this event. > - Repl dump should read this event from EventNotificationTable and dump the > message. > *Target Warehouse:* > - Repl load should read the event from the dump and get the message. > - Open a txn in target warehouse. > - Create a map of source txn ID against target txn ID and persist the same > in metastore. There should be one map per replication policy (DBName.* incase > of DB level replication, DBName.TableName incase of table level replication) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18679) create/replicate open transaction event
[ https://issues.apache.org/jira/browse/HIVE-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-18679: -- Labels: pull-request-available (was: ) > create/replicate open transaction event > --- > > Key: HIVE-18679 > URL: https://issues.apache.org/jira/browse/HIVE-18679 > Project: Hive > Issue Type: Bug > Components: repl, Transactions >Affects Versions: 3.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > > *EVENT_OPEN_TXN:* > *Source Warehouse:* > - Create new event type EVENT_OPEN_TXN with related message format etc. > - When any transaction is opened either by auto-commit mode or > multi-statement mode, need to capture this event. > - Repl dump should read this event from EventNotificationTable and dump the > message. > *Target Warehouse:* > - Repl load should read the event from the dump and get the message. > - Open a txn in target warehouse. > - Create a map of source txn ID against target txn ID and persist the same > in metastore. There should be one map per replication policy (DBName.* incase > of DB level replication, DBName.TableName incase of table level replication) -- This message was sent by Atlassian JIRA (v7.6.3#76005)