[jira] [Created] (HIVE-21000) Upgrade thrift to at least 0.10.0
Zoltan Haindrich created HIVE-21000: --- Summary: Upgrade thrift to at least 0.10.0 Key: HIVE-21000 URL: https://issues.apache.org/jira/browse/HIVE-21000 Project: Hive Issue Type: Improvement Reporter: Zoltan Haindrich I was looking into some compile profiles for tables with lots of columns; and it turned out that [thrift 0.9.3 is allocating a List|https://github.com/apache/hive/blob/8e30b5e029570407d8a1db67d322a95db705750e/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FieldSchema.java#L348] during every hashcode calculation; but luckily THRIFT-2877 is improving on that - so I propose to upgrade to at least 0.10.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20999) LLAP IO: MutableQuantiles is contended heavily
Gopal V created HIVE-20999: -- Summary: LLAP IO: MutableQuantiles is contended heavily Key: HIVE-20999 URL: https://issues.apache.org/jira/browse/HIVE-20999 Project: Hive Issue Type: Bug Components: llap Affects Versions: 3.1.1 Reporter: Gopal V MutableQuantiles::add() is synchronized across all threads. {code} IO-Elevator-Thread-0 [DAEMON] State: BLOCKED CPU usage on sample: 316ms org.apache.hadoop.metrics2.lib.MutableQuantiles.add(long) MutableQuantiles.java:133 org.apache.hadoop.hive.llap.metrics.LlapDaemonIOMetrics.addDecodeBatchTime(long) LlapDaemonIOMetrics.java:98 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedColumnBatch) EncodedDataConsumer.java:89 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(Object) EncodedDataConsumer.java:34 org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(int, StripeInformation, OrcProto$RowIndex[], List, List, boolean[], boolean[], Consumer) EncodedReaderImpl.java:530 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead() OrcEncodedDataReader.java:407 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #491: HIVE-20953: Fix testcase TestReplicationScenariosAcr...
Github user ashutosh-bapat closed the pull request at: https://github.com/apache/hive/pull/491 ---
[jira] [Created] (HIVE-20998) HiveStrictManagedMigration utility should update DB/Table location as last migration steps
Jason Dere created HIVE-20998: - Summary: HiveStrictManagedMigration utility should update DB/Table location as last migration steps Key: HIVE-20998 URL: https://issues.apache.org/jira/browse/HIVE-20998 Project: Hive Issue Type: Sub-task Reporter: Jason Dere Assignee: Jason Dere When processing a database or table, the HiveStrictManagedMigration utility currently changes the database/table locations as the first step in processing that database/table. Unfortunately if an error occurs while processing this database or table, then there may still be migration work that needs to continue for that db/table by running the migration again. However the migration tool only processes dbs/tables that have the old warehouse location, then the tool will skip over the db/table when the migration is run again. One fix here is to set the new location as the last step after all of the migration work is done: - The new table location will not be set until all of its partitions have been successfully migrated. - The new database location will not be set until all of its tables have been successfully migrated. For existing migrations that failed with an error, the following workaround can be done so that the db/tables can be re-processed by the migration tool: 1) Use the migration tool logs to find which databases/tables failed during processing. 2) For each db/table, change location of of the database and table back to old location: ALTER DATABASE tpcds_bin_partitioned_orc_10 SET LOCATION 'hdfs://ns1/apps/hive/warehouse/tpcds_bin_partitioned_orc_10.db'; ALTER TABLE tpcds_bin_partitioned_orc_10.store_sales SET LOCATION 'hdfs://ns1/apps/hive/warehouse/tpcds_bin_partitioned_orc_10.db/store_sales'; 2) Rerun the migration tool -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20997) Make Druid Cluster start on random ports.
slim bouguerra created HIVE-20997: - Summary: Make Druid Cluster start on random ports. Key: HIVE-20997 URL: https://issues.apache.org/jira/browse/HIVE-20997 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra Assignee: slim bouguerra As of now Druid Tests will run in a Single batch. To avoid timeouts we need to support batching of tests. As suggested by [~vihangk1] it will be better to start the Druid tests setups on a totally random ports. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20996) MV rewriting not triggering
Vineet Garg created HIVE-20996: -- Summary: MV rewriting not triggering Key: HIVE-20996 URL: https://issues.apache.org/jira/browse/HIVE-20996 Project: Hive Issue Type: Improvement Reporter: Vineet Garg {code:sql} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.strict.checks.cartesian.product=false; set hive.stats.fetch.column.stats=true; set hive.materializedview.rewriting=true; create table emps_n3 ( empid int, deptno int, name varchar(256), salary float, commission int) stored as orc TBLPROPERTIES ('transactional'='true'); insert into emps_n3 values (100, 10, 'Bill', 1, 1000), (200, 20, 'Eric', 8000, 500), (150, 10, 'Sebastian', 7000, null), (110, 10, 'Theodore', 1, 250), (120, 10, 'Bill', 1, 250); analyze table emps_n3 compute statistics for columns; alter table emps_n3 add constraint pk1 primary key (empid) disable novalidate rely; create materialized view mv1_n2 as select empid, deptno from emps_n3 group by empid, deptno; explain select empid, deptno from emps_n3 group by empid, deptno; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20995) Add mini Druid to the list of tests
slim bouguerra created HIVE-20995: - Summary: Add mini Druid to the list of tests Key: HIVE-20995 URL: https://issues.apache.org/jira/browse/HIVE-20995 Project: Hive Issue Type: Test Reporter: slim bouguerra Assignee: slim bouguerra -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #501: HIVE-20994: Upgrade arrow version to 0.10.0 in branc...
GitHub user pudidic opened a pull request: https://github.com/apache/hive/pull/501 HIVE-20994: Upgrade arrow version to 0.10.0 in branch-3 (Teddy Choi) You can merge this pull request into a Git repository by running: $ git pull https://github.com/pudidic/hive HIVE-20994 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/501.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #501 commit c0c621f1e18b51c38173ae1036c2852dddee89e5 Author: Teddy Choi Date: 2018-12-03T16:03:11Z HIVE-20994: Upgrade arrow version to 0.10.0 in branch-3 (Teddy Choi) ---
[jira] [Created] (HIVE-20994) Upgrade arrow version to 0.10.0 in branch-3
Teddy Choi created HIVE-20994: - Summary: Upgrade arrow version to 0.10.0 in branch-3 Key: HIVE-20994 URL: https://issues.apache.org/jira/browse/HIVE-20994 Project: Hive Issue Type: Improvement Reporter: Teddy Choi Assignee: Teddy Choi Fix For: 3.2.0 HIVE-20751 upgraded arrow version in Hive 4, but its patch has conflicts with Hive 3. It needs to be rebased. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: When do the deltas of a transaction become observable?
Thanks Gopal, that was very helpful. Granville On Mon, 26 Nov 2018 at 08:14, Gopal Vijayaraghavan wrote: > > >release of the locks) but I can't seem to find it. As it's a > transactional > >system I'd expect we observe both deltas or none at all, at the point > of > >successful commit. > > In Hive's internals, "observe" is slightly different from "use". Hive ACID > system > can see a file on HDFS and then ignore it, because it is from the > "future". > > You can sort of start from this line > > > https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java#L70 > > and work backwards. > > >I had done some basic tests to determine if the observation semantics > were > >tied to the metadata in the database product for the transactional > system > >but I could only determine write IDs were influencing this, e.g. if > write > >ID = 7 for a given table, then the read would consist of all deltas > with a > >write ID < 7. > > Yes, you're on the right track. There's a mapping from txn_id -> write_id > (per-table), maintained by the writers (i.e if a txn commits, then the > write_id is visible). > > For each table, in each query, there's a snapshot taken which has a > min:max and list of exceptions. > > When a query starts it sees that all txns below 5 are all committed or > cleaned, therefore all <=5 is good. > > It knows that highest known txn is 10, so all >10 is to be ignored. > > And between 5 & 10, it knows that 7 is aborted and 8 is still open (i.e > exceptions). > > So if it sees a delta_11 dir, it ignores it, If it sees a delta_8, it > ignores it. > > The "ACID" implementation hides future updates in plain sight and doesn't > need HDFS to be able to rename multiple dirs together. > > Most of that smarts is in the split-generation, not in the commit > (however, the commit does something else to detect write-conflicts which is > its own thing). > > >If someone could point me in the right direction, or correct my > >understanding then I would greatly appreciate it. > > This implementation is built with the txn -> write_id indirection to > support cross-replication between say an east-coast cluster to a west-coast > cluster, > each owning primary data-sets on their own coasts. > > Cheers, > Gopal > > >
[GitHub] hive pull request #500: Hive 20966 : Support incremental / bootstrap replica...
GitHub user maheshk114 opened a pull request: https://github.com/apache/hive/pull/500 Hive 20966 : Support incremental / bootstrap replication to a target cluster with hive.strict.managed.tables enabled You can merge this pull request into a Git repository by running: $ git pull https://github.com/maheshk114/hive HIVE-20966 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/500.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #500 commit c3a82a05592c0e61143750e615c2dd6c1feca6c5 Author: Mahesh Kumar Behera Date: 2018-11-28T11:24:40Z HIVE-20966 : Support incremental replication to a target cluster with hive.strict.managed.tables enabled commit 2ef813ac9b7c87119bcba656d7060fd12caaa6b1 Author: Mahesh Kumar Behera Date: 2018-12-01T05:05:24Z HIVE-20966 : Support incremental replication to a target cluster with hive.strict.managed.tables enabled - fixed alter table issues commit 4b2a6f4a17c84084dbfd4f6aaeb9faea12773233 Author: Mahesh Kumar Behera Date: 2018-12-03T05:39:35Z HIVE-20884 : Bootstrap of tables to target with hive.strict.managed.tables enabled. commit c561ea22b03badeee9f223179e9199cbcb456e63 Author: Mahesh Kumar Behera Date: 2018-12-03T08:33:05Z HIVE-20966 : Support incremental replication to a target cluster with hive.strict.managed.tables enabled - review comment fix ---
Re: [ANNOUNCE] New committer: Bharathkrishna Guruvayoor Murali
Congratulations Bharath! On Mon, Dec 3, 2018 at 8:45 AM Peter Vary wrote: > Congratulations! > > > On Dec 3, 2018, at 05:32, Sankar Hariappan > wrote: > > > > Congrats Bharath! > > > > Best regards > > Sankar > > > > > > > > > > > > > > > > > > > > On 03/12/18, 7:38 AM, "Vihang Karajgaonkar" > wrote: > > > >> Congratulations Bharath! > >> > >> On Sun, Dec 2, 2018 at 9:33 AM Sahil Takiar > wrote: > >> > >>> Congrats Bharath! > >>> > >>> On Sun, Dec 2, 2018 at 11:14 AM Andrew Sherman > >>> wrote: > >>> > Congratulations Bharath! > > On Sat, Dec 1, 2018 at 10:26 AM Ashutosh Chauhan < > hashut...@apache.org> > wrote: > > > Apache Hive's Project Management Committee (PMC) has invited > > Bharathkrishna > > Guruvayoor Murali to become a committer, and we are pleased to > announce > > that > > he has accepted. > > > > Bharath, welcome, thank you for your contributions, and we look > forward > > your > > further interactions with the community! > > > > Ashutosh Chauhan (on behalf of the Apache Hive PMC) > > > > >>> > >>> > >>> -- > >>> Sahil Takiar > >>> Software Engineer > >>> takiar.sa...@gmail.com | (510) 673-0309 > >>> > >