[jira] [Created] (HIVE-21000) Upgrade thrift to at least 0.10.0

2018-12-03 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-21000:
---

 Summary: Upgrade thrift to at least 0.10.0
 Key: HIVE-21000
 URL: https://issues.apache.org/jira/browse/HIVE-21000
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


I was looking into some compile profiles for tables with lots of columns; and 
it turned out that [thrift 0.9.3 is allocating a 
List|https://github.com/apache/hive/blob/8e30b5e029570407d8a1db67d322a95db705750e/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FieldSchema.java#L348]
 during every hashcode calculation; but luckily THRIFT-2877 is improving on 
that - so I propose to upgrade to at least 0.10.0 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20999) LLAP IO: MutableQuantiles is contended heavily

2018-12-03 Thread Gopal V (JIRA)
Gopal V created HIVE-20999:
--

 Summary: LLAP IO: MutableQuantiles is contended heavily
 Key: HIVE-20999
 URL: https://issues.apache.org/jira/browse/HIVE-20999
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 3.1.1
Reporter: Gopal V


MutableQuantiles::add() is synchronized across all threads.

{code}
IO-Elevator-Thread-0 [DAEMON] State: BLOCKED CPU usage on sample: 316ms
org.apache.hadoop.metrics2.lib.MutableQuantiles.add(long) 
MutableQuantiles.java:133
org.apache.hadoop.hive.llap.metrics.LlapDaemonIOMetrics.addDecodeBatchTime(long)
 LlapDaemonIOMetrics.java:98
org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedColumnBatch)
 EncodedDataConsumer.java:89
org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(Object) 
EncodedDataConsumer.java:34
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(int,
 StripeInformation, OrcProto$RowIndex[], List, List, boolean[], boolean[], 
Consumer) EncodedReaderImpl.java:530
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead() 
OrcEncodedDataReader.java:407
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #491: HIVE-20953: Fix testcase TestReplicationScenariosAcr...

2018-12-03 Thread ashutosh-bapat
Github user ashutosh-bapat closed the pull request at:

https://github.com/apache/hive/pull/491


---


[jira] [Created] (HIVE-20998) HiveStrictManagedMigration utility should update DB/Table location as last migration steps

2018-12-03 Thread Jason Dere (JIRA)
Jason Dere created HIVE-20998:
-

 Summary: HiveStrictManagedMigration utility should update DB/Table 
location as last migration steps
 Key: HIVE-20998
 URL: https://issues.apache.org/jira/browse/HIVE-20998
 Project: Hive
  Issue Type: Sub-task
Reporter: Jason Dere
Assignee: Jason Dere


When processing a database or table, the HiveStrictManagedMigration utility 
currently changes the database/table locations as the first step in processing 
that database/table. Unfortunately if an error occurs while processing this 
database or table, then there may still be migration work that needs to 
continue for that db/table by running the migration again. However the 
migration tool only processes dbs/tables that have the old warehouse location, 
then the tool will skip over the db/table when the migration is run again.
 One fix here is to set the new location as the last step after all of the 
migration work is done:
 - The new table location will not be set until all of its partitions have been 
successfully migrated.
 - The new database location will not be set until all of its tables have been 
successfully migrated.

For existing migrations that failed with an error, the following workaround can 
be done so that the db/tables can be re-processed by the migration tool:
 1) Use the migration tool logs to find which databases/tables failed during 
processing.
 2) For each db/table, change location of of the database and table back to old 
location:
 ALTER DATABASE tpcds_bin_partitioned_orc_10 SET LOCATION 
'hdfs://ns1/apps/hive/warehouse/tpcds_bin_partitioned_orc_10.db';
 ALTER TABLE tpcds_bin_partitioned_orc_10.store_sales SET LOCATION 
'hdfs://ns1/apps/hive/warehouse/tpcds_bin_partitioned_orc_10.db/store_sales';
 2) Rerun the migration tool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20997) Make Druid Cluster start on random ports.

2018-12-03 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20997:
-

 Summary: Make Druid Cluster start on random ports.
 Key: HIVE-20997
 URL: https://issues.apache.org/jira/browse/HIVE-20997
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
Assignee: slim bouguerra


As of now Druid Tests will run in a Single batch.

To avoid timeouts we need to support batching of tests.

As suggested by [~vihangk1] it will be better to start the Druid tests setups 
on a totally random ports.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20996) MV rewriting not triggering

2018-12-03 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-20996:
--

 Summary: MV rewriting not triggering
 Key: HIVE-20996
 URL: https://issues.apache.org/jira/browse/HIVE-20996
 Project: Hive
  Issue Type: Improvement
Reporter: Vineet Garg


{code:sql}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.strict.checks.cartesian.product=false;
set hive.stats.fetch.column.stats=true;
set hive.materializedview.rewriting=true;

create table emps_n3 (
  empid int,
  deptno int,
  name varchar(256),
  salary float,
  commission int)
stored as orc TBLPROPERTIES ('transactional'='true');

insert into emps_n3 values (100, 10, 'Bill', 1, 1000), (200, 20, 'Eric', 
8000, 500),
  (150, 10, 'Sebastian', 7000, null), (110, 10, 'Theodore', 1, 250), (120, 
10, 'Bill', 1, 250);

analyze table emps_n3 compute statistics for columns;

alter table emps_n3 add constraint pk1 primary key (empid) disable novalidate 
rely;

create materialized view mv1_n2 as
select empid, deptno from emps_n3 group by empid, deptno;


explain
select empid, deptno from emps_n3 group by empid, deptno;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20995) Add mini Druid to the list of tests

2018-12-03 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20995:
-

 Summary: Add mini Druid to the list of tests
 Key: HIVE-20995
 URL: https://issues.apache.org/jira/browse/HIVE-20995
 Project: Hive
  Issue Type: Test
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #501: HIVE-20994: Upgrade arrow version to 0.10.0 in branc...

2018-12-03 Thread pudidic
GitHub user pudidic opened a pull request:

https://github.com/apache/hive/pull/501

HIVE-20994: Upgrade arrow version to 0.10.0 in branch-3 (Teddy Choi)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pudidic/hive HIVE-20994

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/501.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #501


commit c0c621f1e18b51c38173ae1036c2852dddee89e5
Author: Teddy Choi 
Date:   2018-12-03T16:03:11Z

HIVE-20994: Upgrade arrow version to 0.10.0 in branch-3 (Teddy Choi)




---


[jira] [Created] (HIVE-20994) Upgrade arrow version to 0.10.0 in branch-3

2018-12-03 Thread Teddy Choi (JIRA)
Teddy Choi created HIVE-20994:
-

 Summary: Upgrade arrow version to 0.10.0 in branch-3
 Key: HIVE-20994
 URL: https://issues.apache.org/jira/browse/HIVE-20994
 Project: Hive
  Issue Type: Improvement
Reporter: Teddy Choi
Assignee: Teddy Choi
 Fix For: 3.2.0


HIVE-20751 upgraded arrow version in Hive 4, but its patch has conflicts with 
Hive 3. It needs to be rebased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: When do the deltas of a transaction become observable?

2018-12-03 Thread Granville Barnett
Thanks Gopal, that was very helpful.

Granville

On Mon, 26 Nov 2018 at 08:14, Gopal Vijayaraghavan 
wrote:

>
> >release of the locks) but I can't seem to find it. As it's a
> transactional
> >system I'd expect we observe both deltas or none at all, at the point
> of
> >successful commit.
>
> In Hive's internals, "observe" is slightly different from "use". Hive ACID
> system
> can see a file on HDFS and then ignore it, because it is from the
> "future".
>
> You can sort of start from this line
>
>
> https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java#L70
>
> and work backwards.
>
> >I had done some basic tests to determine if the observation semantics
> were
> >tied to the metadata in the database product for the transactional
> system
> >but I could only determine write IDs were influencing this, e.g. if
> write
> >ID = 7 for a given table, then the read would consist of all deltas
> with a
> >write ID < 7.
>
> Yes, you're on the right track. There's a mapping from txn_id -> write_id
> (per-table), maintained by the writers (i.e if a txn commits, then the
> write_id is visible).
>
> For each table, in each query, there's a snapshot taken which has a
> min:max and list of exceptions.
>
> When a query starts it sees that all txns below 5 are all committed or
> cleaned, therefore all <=5 is good.
>
> It knows that highest known txn is 10, so all >10 is to be ignored.
>
> And between 5 & 10, it knows that 7 is aborted and 8 is still open (i.e
> exceptions).
>
> So if it sees a delta_11 dir, it ignores it, If it sees a delta_8, it
> ignores it.
>
> The "ACID" implementation hides future updates in plain sight and doesn't
> need HDFS to be able to rename multiple dirs together.
>
> Most of that smarts is in the split-generation, not in the commit
> (however, the commit does something else to detect write-conflicts which is
> its own thing).
>
> >If someone could point me in the right direction, or correct my
> >understanding then I would greatly appreciate it.
>
> This implementation is built with the txn -> write_id indirection to
> support cross-replication between say an east-coast cluster to a west-coast
> cluster,
> each owning primary data-sets on their own coasts.
>
> Cheers,
> Gopal
>
>
>


[GitHub] hive pull request #500: Hive 20966 : Support incremental / bootstrap replica...

2018-12-03 Thread maheshk114
GitHub user maheshk114 opened a pull request:

https://github.com/apache/hive/pull/500

Hive 20966 : Support incremental / bootstrap replication to a target 
cluster with hive.strict.managed.tables enabled



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maheshk114/hive HIVE-20966

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/500.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #500


commit c3a82a05592c0e61143750e615c2dd6c1feca6c5
Author: Mahesh Kumar Behera 
Date:   2018-11-28T11:24:40Z

HIVE-20966 : Support incremental replication to a target cluster with 
hive.strict.managed.tables enabled

commit 2ef813ac9b7c87119bcba656d7060fd12caaa6b1
Author: Mahesh Kumar Behera 
Date:   2018-12-01T05:05:24Z

HIVE-20966 : Support incremental replication to a target cluster with 
hive.strict.managed.tables enabled - fixed alter table issues

commit 4b2a6f4a17c84084dbfd4f6aaeb9faea12773233
Author: Mahesh Kumar Behera 
Date:   2018-12-03T05:39:35Z

HIVE-20884 : Bootstrap of tables to target with hive.strict.managed.tables 
enabled.

commit c561ea22b03badeee9f223179e9199cbcb456e63
Author: Mahesh Kumar Behera 
Date:   2018-12-03T08:33:05Z

HIVE-20966 : Support incremental replication to a target cluster with 
hive.strict.managed.tables enabled - review comment fix




---


Re: [ANNOUNCE] New committer: Bharathkrishna Guruvayoor Murali

2018-12-03 Thread Marta Kuczora
Congratulations Bharath!

On Mon, Dec 3, 2018 at 8:45 AM Peter Vary 
wrote:

> Congratulations!
>
> > On Dec 3, 2018, at 05:32, Sankar Hariappan 
> wrote:
> >
> > Congrats Bharath!
> >
> > Best regards
> > Sankar
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 03/12/18, 7:38 AM, "Vihang Karajgaonkar" 
> wrote:
> >
> >> Congratulations Bharath!
> >>
> >> On Sun, Dec 2, 2018 at 9:33 AM Sahil Takiar 
> wrote:
> >>
> >>> Congrats Bharath!
> >>>
> >>> On Sun, Dec 2, 2018 at 11:14 AM Andrew Sherman
> >>>  wrote:
> >>>
>  Congratulations Bharath!
> 
>  On Sat, Dec 1, 2018 at 10:26 AM Ashutosh Chauhan <
> hashut...@apache.org>
>  wrote:
> 
> > Apache Hive's Project Management Committee (PMC) has invited
> > Bharathkrishna
> > Guruvayoor Murali to become a committer, and we are pleased to
> announce
> > that
> > he has accepted.
> >
> > Bharath, welcome, thank you for your contributions, and we look
> forward
> > your
> > further interactions with the community!
> >
> > Ashutosh Chauhan (on behalf of the Apache Hive PMC)
> >
> 
> >>>
> >>>
> >>> --
> >>> Sahil Takiar
> >>> Software Engineer
> >>> takiar.sa...@gmail.com | (510) 673-0309
> >>>
>
>