from:"Sushanth Sowmyan"

[jira] [Created] (HIVE-17700) Update committer list

2017-10-04 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-17700:
---

 Summary: Update committer list
 Key: HIVE-17700
 URL: https://issues.apache.org/jira/browse/HIVE-17700
 Project: Hive
  Issue Type: Bug
Reporter: Aihua Xu
Assignee: Aihua Xu
Priority: Minor


Please update committer list:
Name: Aihua Xu
Apache ID: aihuaxu
Organization: Cloudera

Name: Yongzhi Chen
Apache ID: ychena
Organization: Cloudera



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: [Announce] New committer: Anishek Agarwal

2017-09-30 Thread Sushanth Sowmyan

Welcome aboard! :)

On Sep 30, 2017 3:27 AM, "Barna Zsombor Klara" 
wrote:

> Congratulations Anishek!
>
> Rajesh Balamohan  (időpont: 2017. szept. 30., Szo,
> 2:25) ezt írta:
>
> > Congrats Anishek!!
> >
> > ~Rajesh.B
> >
> > On Sat, Sep 30, 2017 at 4:30 AM, Vaibhav Gumashta <
> > vgumas...@hortonworks.com
> > > wrote:
> >
> > > Congratulations Anishek!
> > >
> > >
> > > On 9/29/17, 3:57 PM, "Thejas Nair"  wrote:
> > >
> > > >Congrats Anishek!
> > > >
> > > >On Fri, Sep 29, 2017 at 11:36 AM, Peter Vary 
> > wrote:
> > > >
> > > >> Congratulations Anishek!
> > > >>
> > > >> > On Sep 29, 2017, at 7:55 PM, Ashutosh Chauhan <
> hashut...@apache.org
> > >
> > > >> wrote:
> > > >> >
> > > >> > The Project Management Committee (PMC) for Apache Hive has invited
> > > >> Anishek
> > > >> > Agarwal to become a committer and we are pleased to announce that
> he
> > > >>has
> > > >> > accepted.
> > > >> >
> > > >> > Welcome, Anishek!
> > > >> >
> > > >> > Thanks,
> > > >> > Ashutosh
> > > >>
> > > >>
> > >
> > >
> >
>

Re: [Announce] New committer: Sankar Hariappan

2017-09-30 Thread Sushanth Sowmyan

Welcome aboard! :)

On Sep 30, 2017 3:28 AM, "Barna Zsombor Klara" 
wrote:

Congrats Sankar!

Rajesh Balamohan  (időpont: 2017. szept. 30., Szo,
2:24) ezt írta:

> Congrats Sankar!!
>
> ~Rajesh.B
>
> On Sat, Sep 30, 2017 at 4:30 AM, Vaibhav Gumashta <
> vgumas...@hortonworks.com
> > wrote:
>
> > Congratulations Sankar!
> >
> > On 9/29/17, 3:58 PM, "Thejas Nair"  wrote:
> >
> > >Congrats Sankar!
> > >
> > >On Fri, Sep 29, 2017 at 11:36 AM, Peter Vary 
> wrote:
> > >
> > >> Congratulations Sankar!
> > >>
> > >> > On Sep 29, 2017, at 7:56 PM, Ashutosh Chauhan  >
> > >> wrote:
> > >> >
> > >> > The Project Management Committee (PMC) for Apache Hive has invited
> > >>Sankar
> > >> > Harriapan to become a committer and we are pleased to announce that
> he
> > >> has
> > >> > accepted.
> > >> >
> > >> > Welcome, Sankar!
> > >> >
> > >> > Thanks,
> > >> > Ashutosh
> > >>
> > >>
> >
> >
>

[jira] [Created] (HIVE-17095) Long chain repl loads do not complete in a timely fashion

2017-07-13 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-17095:
---

 Summary: Long chain repl loads do not complete in a timely fashion
 Key: HIVE-17095
 URL: https://issues.apache.org/jira/browse/HIVE-17095
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, repl
Reporter: sapin amin
Assignee: Sushanth Sowmyan


Per performance testing done by [~sapinamin] (thus, I'm setting him as 
reporter), we were able to discover an important bug affecting replication. It 
has the potential to affect other large DAGs of Tasks that hive generates as 
well, if those DAGs have multiple paths to child Task nodes.

Basically, we find that incremental REPL LOAD does not finish in a timely 
fashion. The test, in this case was to add 400 partitions, and replicate them. 
Associated with each partition, there was an ADD PTN and a ALTER PTN. For each 
of the ADD PTN tasks, we'd generate a DDLTask, a CopyTask and a MoveTask. For 
each Alter ptn, there'd be a single DDLTask. And order of execution is 
important, so it would chain in dependency collection tasks between phases.

Trying to root cause this shows us that it seems to stall forever at the Driver 
instantiation time, and it almost looks like the thread doesn't proceed past 
that point.

Looking at logs, it seems that the way this is written, it looks for all tasks 
generated that are subtrees of all nodes, without looking for duplicates, and 
this is done simply to get the number of execution tasks!

And thus, the task visitor will visit every subtree of every node, which is 
fine if you have graphs that look like open trees, but is horrible for us, 
since we have dependency collection tasks between each phase. Effectively, this 
is what's happening:

We have a DAG, say, like this:

4 tasks in parallel -> DEP col -> 4 tasks in parallel -> DEP col -> ...

This means that for each of the 4 root tasks, we will do a full traversal of 
every graph (not just every node) past the DEP col, and this happens 
recursively, and this leads to an exponential growth of number of tasks visited 
as the length and breadth of the graph increase. In our case, we had about 800 
tasks in the graph, with roughly a width of about 2-3, with 200 stages, a dep 
collection before and after, and this meant that leaf nodes of this DAG would 
have something like 2^200 - 3^200 ways in which they can be visited, and thus, 
we'd visit them in all those ways. And all this simply to count the number of 
tasks to schedule - we would revisit this function multiple more times, once 
per each hook, once for the MapReduceCompiler and once for the TaskCompiler.

We have not been sending such large DAGs to the Driver, thus it has not yet 
been a problem, and there are upcoming changes to reduce the number of tasks 
replication generates(as part of a memory addressing issue), but we still 
should fix the way we do Task traversal so that a large DAG cannot cripple us.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Sushanth Sowmyan

+1

On Jun 30, 2017 17:05, "Owen O'Malley"  wrote:

> On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
>
> > and maybe a different project name?
> >
>
> Yes, it certainly needs a new name. I'd like to suggest Riven.
>
> .. Owen
>

[jira] [Created] (HIVE-17005) Ensure REPL DUMP and REPL LOAD are authorized properly

2017-06-30 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-17005:
---

 Summary: Ensure REPL DUMP and REPL LOAD are authorized properly
 Key: HIVE-17005
 URL: https://issues.apache.org/jira/browse/HIVE-17005
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Currently, we piggyback REPL DUMP and REPL LOAD on EXPORT and IMPORT auth 
privileges. However, work is on to not populate all the relevant objects in 
inputObjs and outputObjs, which then requires that REPL DUMP and REPL LOAD be 
authorized at a higher level, and simply require ADMIN_PRIV to run,



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-16918) Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp

2017-06-19 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-16918:
---

 Summary: Skip ReplCopyTask distcp for _metadata copying. Also 
enable -pb for distcp
 Key: HIVE-16918
 URL: https://issues.apache.org/jira/browse/HIVE-16918
 Project: Hive
  Issue Type: Bug
  Components: repl
Affects Versions: 3.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


With HIVE-16686, we switched ReplCopyTask to always use a privileged DistCp. 
This, however, is incorrect for copying _metadata generated from a temporary 
scratch directory to hdfs. We need to change that so that routes to using a 
regular CopyTask.

Also, in the spirit of following up HIVE-16686, we missed adding "-pb" as a 
default for invocations of distcp from hive. Adding that in. This would not be 
necessary if HADOOP-8143 had made it in, but till it doesn't go in, we need it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-16860) HostUtil.getTaskLogUrl change between hadoop 2.3 and 2.4 breaks at runtime.

2017-06-08 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-16860:
---

 Summary: HostUtil.getTaskLogUrl change between hadoop 2.3 and 2.4 
breaks at runtime.
 Key: HIVE-16860
 URL: https://issues.apache.org/jira/browse/HIVE-16860
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0, 0.14.0
Reporter: Chris Drome
Assignee: Jason Dere
 Fix For: 0.14.0


The signature for HostUtil.getTaskLogUrl has changed between Hadoop-2.3 and 
Hadoop-2.4.

Code in 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java works 
with Hadoop-2.3 method and causes compilation failure with Hadoop-2.4.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16686) repli invocations of distcp needs additional handling

2017-05-16 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-16686:
---

 Summary: repli invocations of distcp needs additional handling
 Key: HIVE-16686
 URL: https://issues.apache.org/jira/browse/HIVE-16686
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


When REPL LOAD invokes distcp, there needs to be a way for the user invoking 
REPL LOAD to pass on arguments to distcp. In addition, there is sometimes a 
need for distcp to be invoked from within an impersonated context, such as 
running as user "hdfs", asking distcp to preserve ownerships of individual 
files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16642) New Events created as part of replv2 potentially break replv1

2017-05-11 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-16642:
---

 Summary: New Events created as part of replv2 potentially break 
replv1
 Key: HIVE-16642
 URL: https://issues.apache.org/jira/browse/HIVE-16642
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We have a couple of new events introduced, such as 
{CREATE,DROP}{INDEX,FUNCTION} since the introduction of replv1, but those which 
do not have a replv1 ReplicationTask associated with them.

Thus, for users like Falcon, we potentially wind up throwing a 
IllegalStateException if replv1 based HiveDR is running on a cluster with these 
updated events.

Thus, we should be more graceful when encountering them, returning a 
NoopReplicationTask equivalent that they can make use of, or ignore, for such 
newer events.

In addition, we should add additional test cases so that we track whether or 
not the creation of these events leads to any backward incompatibility we 
introduce. To this end, if any of the events should change so that we introduce 
a backward incompatibility, we should have these tests fail, and alert us to 
that possibility.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: pre-commit jenkins issues

2017-05-05 Thread Sushanth Sowmyan

Thanks! It looks like it's chugging away now. :)

On May 5, 2017 08:22, "Sergio Pena" <sergio.p...@cloudera.com> wrote:

> I restarted hiveptest and seems is working now. There was a hiccup on the
> server while using the libraries to create the slave nodes.
>
> On Fri, May 5, 2017 at 12:05 AM, Sushanth Sowmyan <khorg...@gmail.com>
> wrote:
>
> > Hi,
> >
> > It looks like the precommit queue is currently having issues :
> > https://builds.apache.org/job/PreCommit-HIVE-Build/
> >
> > See builds# 5041,5042,5043 - It looks like it takes about 8 hours
> > waiting for the tests to finish running and to report back, and kills
> > it as it exceeds a 500minute time out, and returns without results. Is
> > anyone able to look into this to see what is going on?
> >
> > Thanks!
> > -Sush
> >
>

pre-commit jenkins issues

2017-05-04 Thread Sushanth Sowmyan

Hi,

It looks like the precommit queue is currently having issues :
https://builds.apache.org/job/PreCommit-HIVE-Build/

See builds# 5041,5042,5043 - It looks like it takes about 8 hours
waiting for the tests to finish running and to report back, and kills
it as it exceeds a 500minute time out, and returns without results. Is
anyone able to look into this to see what is going on?

Thanks!
-Sush

Re: [VOTE] Apache Hive 1.2.2 Release Candidate 0

2017-04-06 Thread Sushanth Sowmyan

+1 (binding)

Verified md5 and asc
KEYS obtained from hive match (from
https://people.apache.org/keys/group/hive.asc) , and is publically
searchable and signed.
RAT test succeeds
Source and binary tarballs look good
Compiling works, some base unit tests succeed.
Testing local mode works.

On Wed, Apr 5, 2017 at 11:16 PM, Thejas Nair  wrote:
> +1 (binding)
> - Verified signature and checksum
> - Build from source
> - Ran simple queries in local mode with binary tar.gz
> - Checked RELEASE_NOTES file. Traditionally this file has had the set of
> patches fixed in previous releases as well ( ie, each new release was
> adding entries to the top of the file). This time it has only the new patch
> release patches. The old approach helps to quickly verify if patch is in
> the release. I think it would be good to fix that in branch. I think it is
> OK for this release.
> - README.txt has old 1.2.1 version number in it. IMO, we should just remove
> the mention of version in that file. Not a release blocker.
>
>
>
>
> On Wed, Apr 5, 2017 at 3:52 PM, Sergio Pena 
> wrote:
>
>> +1 (no-binding)
>>
>> I unpacked the bin and src packages.
>> Verified gpg and md5 signatures.
>> Check license and release notes files.
>> Run a few queries from hive-cli.
>>
>> - Sergio
>>
>> On Tue, Apr 4, 2017 at 11:12 AM, Ashutosh Chauhan 
>> wrote:
>>
>> > Verified md5 of src and binary tar balls.
>> > Built from src.
>> > Ran some simple queries like join, group by.
>> > All looks good.
>> >
>> > +1
>> >
>> > Thanks,
>> > Ashutosh
>> >
>> > On Mon, Apr 3, 2017 at 4:47 PM, Vaibhav Gumashta <
>> > vgumas...@hortonworks.com>
>> > wrote:
>> >
>> > > Thanks for pointing out Ashutosh. Link to my PGP key:
>> > > http://pgp.mit.edu/pks/lookup?search=gumashta=index.
>> > >
>> > > I think it will take a day or so for the KEYS file to be updated (it is
>> > > auto generated), but if you want to test the release in the meantime,
>> > > please use the above link to access the signing key.
>> > >
>> > > Thanks,
>> > > ‹Vaibhav
>> > >
>> > > On 4/3/17, 2:53 PM, "Ashutosh Chauhan"  wrote:
>> > >
>> > > >Hi Vaibhav,
>> > > >
>> > > >Can't locate your key at any of standard location. Can you point out
>> > which
>> > > >key you used to sign the release?
>> > > >
>> > > >Thanks,
>> > > >Ashutosh
>> > > >
>> > > >On Mon, Apr 3, 2017 at 12:51 AM, Vaibhav Gumashta
>> > > >> > > >> wrote:
>> > > >> Hi everyone,
>> > > >>
>> > > >> Apache Hive 1.2.2 Release Candidate 0 is available here:
>> > > >>
>> > > >> https://dist.apache.org/repos/dist/dev/hive/apache-hive-1.2.2-rc0/
>> > > >>
>> > > >> Maven artifacts are available here:
>> > > >>
>> > > >> https://repository.apache.org/content/repositories/
>> > orgapachehive-1072/
>> > > >>
>> > > >> Source tag for RC0 is at:
>> > > >> https://github.com/apache/hive/releases/tag/release-1.2.2-rc0
>> > > >>
>> > > >> Voting will conclude in 72 hours.
>> > > >>
>> > > >> Hive PMC Members: Please test and vote.
>> > > >>
>> > > >> Thanks,
>> > > >> -Vaibhav
>> > > >>
>> > > >>
>> > >
>> > >
>> >
>>

Re: [ANNOUNCE] New committer: Zoltan Haindrich

2017-02-21 Thread Sushanth Sowmyan

Congrats, Zoltan!

Welcome aboard. :)

On Feb 21, 2017 15:42, "Rajesh Balamohan"  wrote:

> Congrats Zoltan. :)
>
> ~Rajesh.B
>
> On Wed, Feb 22, 2017 at 4:43 AM, Wei Zheng  wrote:
>
> > Congrats Zoltan!
> >
> > Thanks,
> > Wei
> >
> > On 2/21/17, 13:09, "Alan Gates"  wrote:
> >
> > On behalf of the Hive PMC I am happy to announce Zoltan Haindrich is
> > our newest committer.  He has been contributing to Hive for several
> months
> > across a number of areas, including the parser, HiveServer2, and cleaning
> > up unit tests and documentation.  Please join me in welcoming Zoltan to
> > Hive.
> >
> > Zoltan, feel free to say a few words introducing yourself if you
> would
> > like to.
> >
> > Alan.
> >
> >
> >
>

[jira] [Created] (HIVE-15668) change REPL DUMP syntax to use "LIMIT" instead of "BATCH" keyword

2017-01-19 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15668:
---

 Summary: change REPL DUMP syntax to use "LIMIT" instead of "BATCH" 
keyword
 Key: HIVE-15668
 URL: https://issues.apache.org/jira/browse/HIVE-15668
 Project: Hive
  Issue Type: Sub-task
    Reporter: Sushanth Sowmyan
    Assignee: Sushanth Sowmyan


Currently, REPL DUMP syntax goes:

{noformat}
REPL DUMP [[.]] [FROM  [BATCH ]]
{noformat}

The BATCH directive says that when doing an event dump, to not dump out more 
than _batchSize_ number of events. However, there is a clearer keyword for the 
same effect, and that is LIMIT. Thus, rephrasing the syntax as follows makes it 
clearer:

{noformat}
REPL DUMP [[.]] [FROM  [LIMIT ]]
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15652) Optimize(reduce) the number of alter calls made to fix repl.last.id

2017-01-17 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15652:
---

 Summary: Optimize(reduce) the number of alter calls made to fix 
repl.last.id
 Key: HIVE-15652
 URL: https://issues.apache.org/jira/browse/HIVE-15652
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Per code review from HIVE-15534, we might be doing alters to parent objects to 
set repl.last.id when it is not necessary, since some future event might make 
this alter redundant.

There are 3 cases where this might happen:

a) After a CREATE_TABLE event - any prior reference to that table does not need 
an ALTER, since CREATE_TABLE will have a repl.last.id come with it.
b) After a DROP_TABLE event - any prior reference to that table is irrelevant, 
and thus, no alter is needed.
c) After an ALTER_TABLE event, since that dump will itself do a metadata update 
that will get the latest repl.last.id along with this event.

In each of these cases, we can remove the alter call needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 55392: HIVE-15469: Fix REPL DUMP/LOAD DROP_PTN so it works on non-string-ptn-key tables

2017-01-11 Thread Sushanth Sowmyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55392/#review161290
---

itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java
 (line 550)
<https://reviews.apache.org/r/55392/#comment232522>

This has minor clashes with issues.apache.org/jira/browse/HIVE-15365 , and 
easier to fix here after that goes in rather than there.

Instead of this code segment, we can use the following:

```java
DropPartitionMessage dropPtnMsg = 
md.getDropPartitionMessage(event.getMessage());
Table tableObj = dropPtnMsg.getTableObj();
// .. and the asserts can remain as-is.
```

Note that the first line is likely spurious as well if HIVE-15365 goes in, 
since it will create the dropPtnMsg here, so the only line needing changing is 
the line instantiating tableObj.

I can regenerate this patch post-HIVE-15365, not a problem.

itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
 (line 345)
<https://reviews.apache.org/r/55392/#comment232523>

One more post-HIVE-15365 comment. :)

run(..) followed by verifyResults(..) is being replaced by two methods:

verifyRun(.. , ..) or
verifySetup(.. , ..)

verifySetup is called in cases where you're still setting up the test, and 
verifying that your setup happened correctly. In this case, for instance, the 
run followed by verifyResults would be replaced by verifySetup instead.

verifyRun is called when running some command that we're interested in 
testing where the results showcase the functionality we're testing.

The idea is that in steady state, after we finish our initial development, 
we flip a switch, and all verifySetups don't do the additional verification 
step, whereas verifyRun still would.

itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
 (line 372)
<https://reviews.apache.org/r/55392/#comment232524>

still verifySetup case, as per prior comment.

itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
 (line 385)
<https://reviews.apache.org/r/55392/#comment232525>

still verifySetup, since we're testing that the source dropped the data 
correctly.

itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
 (line 415)
<https://reviews.apache.org/r/55392/#comment232526>

This is now a verifyRun, finally. :)

- Sushanth Sowmyan

On Jan. 10, 2017, 9:29 p.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55392/
> ---
> 
> (Updated Jan. 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair.
> 
> 
> Bugs: HIVE-15469
> https://issues.apache.org/jira/browse/HIVE-15469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-15469
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java
>  4eabb24 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
>  6b86080 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/DropPartitionMessage.java
>  26aecb3 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONDropPartitionMessage.java
>  b8ea224 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java
>  2749371 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
> 85f8c64 
> 
> Diff: https://reviews.apache.org/r/55392/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>

Re: Review Request 55154: HIVE-15366: REPL LOAD & DUMP support for incremental INSERT events

2017-01-09 Thread Sushanth Sowmyan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55154/#review160929
---


Fix it, then Ship it!




Looks good to me. I have one potential issue marked, but that can be solved in 
a future patch. Thanks, Vaibhav!


ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 2352)
<https://reviews.apache.org/r/55154/#comment232164>

We probably don't have a problem here, in that all entries in the list of 
newFiles are all probably in the same filesystem, but if ever that changes, we 
can have off-by-one issues here wherein we cannot line up the file to its 
checksum, if some files have checksums and others in the middle don't. Would it 
make sense to put in a  "" or something like that to indicate that there was no 
checksum for this file?

Note - this is not a blocker issue, and the patch can continue as-is. I 
mention more because this is something that might change in the future.


- Sushanth Sowmyan


On Jan. 6, 2017, 6:43 a.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55154/
> ---
> 
> (Updated Jan. 6, 2017, 6:43 a.m.)
> 
> 
> Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair.
> 
> 
> Bugs: HIVE-15366
> https://issues.apache.org/jira/browse/HIVE-15366
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-15366
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java
>  39356ae 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
>  e29aa22 
>   metastore/if/hive_metastore.thrift 79592ea 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 1311b20 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/InsertEventRequestData.java
>  39a607d 
>   metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb ebed504 
>   metastore/src/java/org/apache/hadoop/hive/metastore/events/InsertEvent.java 
> d9a42a7 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java
>  fe747df 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/MessageFactory.java
>  fdb8e80 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java
>  bd9f9ec 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java
>  9954902 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java 4c0f817 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java be5a6a9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 6e9602f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java 
> f61274b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
> 5561e06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
> 9b83407 
> 
> Diff: https://reviews.apache.org/r/55154/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>

Re: Review Request 55154: HIVE-15366: REPL LOAD & DUMP support for incremental INSERT events

2017-01-04 Thread Sushanth Sowmyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55154/#review160533
---

metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java
 (line 53)
<https://reviews.apache.org/r/55154/#comment231682>

I'm not convinced that this is a good method to add, since this is 
repl-specific, and adds complexity. Any presence of checksum must be encoded 
into the uris, so that when we call getFiles(), it contains it. Also, the files 
have no explicit meaning without the checksum, since they will not be stable 
uris. The getFiles() returned by InsertMessage should already be a CM uri that 
encodes the checksum, for eg: cm://hdfs%3A%2F%2Fblah$2Ffile1#abcdef1234567890 
might imply the file hdfs://blah/file1 with checksum "abcdef1234567890". I'm 
not super pick on the actual encoding mechanism used, but we want the 
getFiles() results to be uris that are stable uris - ones which, even if we 
don't have a FileSystem object associated with it directly, we can extract the 
info we want from it at the endpoint when we use it, and generate it when we 
generate it, and all areas in between simply pass it on without doing anything 
additional with it.

Thus, the places I see "generating" this are either DbNotificationListener 
or fireInsertEvent(), or ReplCopyTask during a bootstrap dump. The only place I 
see extracting/consuming this uri would be in ReplCopyTask on destination. All 
other areas should not split this.

metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java
 (line 376)
<https://reviews.apache.org/r/55154/#comment231687>

We should not be adding more of these methods into JSONMessageFactory that 
add field names here. That knowledge should belong to the domain of the message 
itself. The existing methods that do this are currently slated for removal once 
we refactor DbNotificationListener to not depend on them.

ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
(line 576)
<https://reviews.apache.org/r/55154/#comment231678>

The partspec can be obtained from insertMsg.getPartitionKeyValues() - we 
should'nt make calls to JSONMessageFactory here.

JSONInsertMessage, in its implementation of getPartitionKeyValues, can, in 
turn, then call generic functions from JSONMessageFactory using knowledge it 
has about itself.

There should'nt be any explicit calls to JSONMessageFactory from any class 
which is not a JSON*Message.

See the previous ALTER patch and how it changed the ADD_PTNS/CREATE_TABLE 
processing for a reference.

ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
(line 595)
<https://reviews.apache.org/r/55154/#comment231681>

We should not be making calls to JSONMessageFactory, or getting fields with 
knowledge of names such as "fileChecksums" or "files". Knowledge of fieldnames 
should be restricted to inside the message itself, which exposes api via its 
parent Message class.

This should simply be a dump of what the InsertMessage.getFiles() returns 
and no more. Any encoding of checksum/etc that we do must happen in 
DbNotificationListener, or even possibly in fireInsertEvent, since the location 
is meaningless without the checksum.

- Sushanth Sowmyan

On Jan. 4, 2017, 12:59 p.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55154/
> ---
> 
> (Updated Jan. 4, 2017, 12:59 p.m.)
> 
> 
> Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair.
> 
> 
> Bugs: HIVE-15366
> https://issues.apache.org/jira/browse/HIVE-15366
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-15366
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
>  e29aa22 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java
>  fe747df 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java
>  bd9f9ec 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java
>  9954902 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java 4c0f817 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 6e9602f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java 
> f61274b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/Impo

Re: Review Request 55154: HIVE-15366: REPL LOAD & DUMP support for incremental INSERT events

2017-01-03 Thread Sushanth Sowmyan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55154/#review160450
---




ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
(line 573)
<https://reviews.apache.org/r/55154/#comment231594>

Instead of using JSONMessageFactory.getTableName, please instantiate the 
InsertMessage (not JSONInsertMessage) and ask it for getTableName() - that way, 
we stick to portable MessageFactory based api. Also, if you look at the alter 
patch and how it changes add_ptns, you'll see how to get the partitions 
objects/etc generically.


- Sushanth Sowmyan


On Jan. 3, 2017, 10:27 p.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55154/
> ---
> 
> (Updated Jan. 3, 2017, 10:27 p.m.)
> 
> 
> Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair.
> 
> 
> Bugs: HIVE-15366
> https://issues.apache.org/jira/browse/HIVE-15366
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-15366
> 
> 
> Diffs
> -
> 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java
>  fe747df 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java
>  bd9f9ec 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java
>  9954902 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java 4c0f817 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 6e9602f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java 
> f61274b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
> 5561e06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
> 9b83407 
> 
> Diff: https://reviews.apache.org/r/55154/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>

Re: Review Request 55154: HIVE-15366: REPL LOAD & DUMP support for incremental INSERT events

2017-01-03 Thread Sushanth Sowmyan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55154/#review160447
---




metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java
 (line 58)
<https://reviews.apache.org/r/55154/#comment231590>

Rather than a List<byte[]> getFileChecksums, I was really visualizing a 
List getFiles(), where each file listed is a URI that has the checksum 
encoded into it.

The reason is that a list of checksums is too highly bound to our 
replication usecase only, and has nothing to do with a more generic "Message" 
that could be used for other purposes as well. Messages are currently used for 
things like audit as well, and not just replication.

Having the checksums coded in the URLs makes the message interface 
consistent without knowing more on how to actually read the url.



metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java
 (line 115)
<https://reviews.apache.org/r/55154/#comment231591>

Same comment as with InsertMessage - this should not be a list of checksums 
but a list of pathnames (urls)



ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 
<https://reviews.apache.org/r/55154/#comment231592>

removing this is incorrect and breaks current EXPORT in replv1 - this is 
used to basically noop-out things like non-storagehandler based tables, views, 
etc.


- Sushanth Sowmyan


On Jan. 3, 2017, 10:27 p.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55154/
> ---
> 
> (Updated Jan. 3, 2017, 10:27 p.m.)
> 
> 
> Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair.
> 
> 
> Bugs: HIVE-15366
> https://issues.apache.org/jira/browse/HIVE-15366
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-15366
> 
> 
> Diffs
> -
> 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java
>  fe747df 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java
>  bd9f9ec 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java
>  9954902 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java 4c0f817 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 6e9602f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java 
> f61274b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
> 5561e06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
> 9b83407 
> 
> Diff: https://reviews.apache.org/r/55154/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>

Re: Review Request 55154: HIVE-15366: REPL LOAD & DUMP support for incremental INSERT events

2017-01-03 Thread Sushanth Sowmyan



> On Jan. 3, 2017, 11:55 p.m., Sushanth Sowmyan wrote:
> >

Note - the following is not exhaustive, and I know this patch has already been 
updated, but wanted to mention a few things that I noticed.


- Sushanth


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55154/#review160447
---


On Jan. 3, 2017, 10:27 p.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55154/
> ---
> 
> (Updated Jan. 3, 2017, 10:27 p.m.)
> 
> 
> Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair.
> 
> 
> Bugs: HIVE-15366
> https://issues.apache.org/jira/browse/HIVE-15366
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-15366
> 
> 
> Diffs
> -
> 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java
>  fe747df 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java
>  bd9f9ec 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java
>  9954902 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java 4c0f817 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 6e9602f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java 
> f61274b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
> 5561e06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
> 9b83407 
> 
> Diff: https://reviews.apache.org/r/55154/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>

[jira] [Created] (HIVE-15536) Tests failing due to unexpected q.out outputs : udf_coalesce,case_sensitivity,input_testxpath,

2017-01-03 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15536:
---

 Summary: Tests failing due to unexpected q.out outputs : 
udf_coalesce,case_sensitivity,input_testxpath,
 Key: HIVE-15536
 URL: https://issues.apache.org/jira/browse/HIVE-15536
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


All of these tests seem to be failing based on a q.out diff:

{noformat}
Running: diff -a 
/home/hiveptest/162.222.183.40-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/input_testxpath.q.out
 
/home/hiveptest/162.222.183.40-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/input_testxpath.q.out
32a33
> Pruned Column Paths: lintstring.mystring
{noformat}

{noformat}
Running: diff -a 
/home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/case_sensitivity.q.out
 
/home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/case_sensitivity.q.out
32a33
> Pruned Column Paths: lintstring.mystring
{noformat}

{noformat}
Running: diff -a 
/home/hiveptest/104.197.172.185-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/udf_coalesce.q.out
 
/home/hiveptest/104.197.172.185-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/udf_coalesce.q.out
142a143
>   Pruned Column Paths: lintstring.mystring
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15535) Flaky test : TestHS2HttpServer.testContextRootUrlRewrite

2017-01-03 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15535:
---

 Summary: Flaky test : TestHS2HttpServer.testContextRootUrlRewrite
 Key: HIVE-15535
 URL: https://issues.apache.org/jira/browse/HIVE-15535
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Per recent test failure : 
https://builds.apache.org/job/PreCommit-HIVE-Build/2766/testReport/org.apache.hive.service.server/TestHS2HttpServer/testContextRootUrlRewrite/

{noformat}
Stacktrace

org.junit.ComparisonFailure: 
expected:<...d>Tue Jan 03 11:54:4[6] PST 2017
 ...> but was:<...d>Tue Jan 03 11:54:4[7] PST 2017
 ...>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite(TestHS2HttpServer.java:99)
{noformat}

Looks like it is overly picky on an exact string match on a field that contains 
a second difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15534) Update db/table repl.last.id at the end of REPL LOAD of a batch of events

2017-01-03 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15534:
---

 Summary: Update db/table repl.last.id at the end of REPL LOAD of a 
batch of events
 Key: HIVE-15534
 URL: https://issues.apache.org/jira/browse/HIVE-15534
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Tracking TODO task in ReplSemanticAnalyzer :

{noformat}
// TODO : Over here, we need to track a Map<dbName:String,evLast:Long> 
for every db updated
// and update repl.last.id for each, if this is a wh-level load, and if 
it is a db-level load,
// then a single repl.last.id update, and if this is a tbl-lvl load 
which does not alter the
// table itself, we'll need to update repl.last.id for that as well.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15533) Repl rename support adds unnecessary duplication for non-rename alters

2017-01-03 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15533:
---

 Summary: Repl rename support adds unnecessary duplication for 
non-rename alters
 Key: HIVE-15533
 URL: https://issues.apache.org/jira/browse/HIVE-15533
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Currently, the rename events contain a before & after object. For non-rename 
cases, we simply impress the "after" object, and thus have no need of the 
"before" object. Thus, we might want to minimize wastage by not materializing 
"before" if this is a non-rename case.

Also worth considering - if a rename case, do we really need the before object, 
or simply the before & after names?

Having before & after objects is good in that it allows us flexibility, but we 
might not need that much info. From a perf viewpoint, we might want to trim 
things a bit here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15532) Refactor/cleanup TestReplicationScenario

2017-01-03 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15532:
---

 Summary: Refactor/cleanup TestReplicationScenario
 Key: HIVE-15532
 URL: https://issues.apache.org/jira/browse/HIVE-15532
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


TestReplicationScenarios could use a bit of cleanup, based on comments from 
reviews:

a) Separate "setup" phase of each test, so that we don't run unnecessary 
verifications which aren't testing replication itself, but are verifying that 
the env is set up correctly to then test replication. This can be flag-gated so 
as to allow it to be turned on at test-dev time, and off during build/commit 
unit test time.

b) Better comments inside the tests for what is being set up / tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15522) REPL LOAD & DUMP support for incremental ALTER_TABLE/ALTER_PTN including renames

2016-12-28 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15522:
---

 Summary: REPL LOAD & DUMP support for incremental 
ALTER_TABLE/ALTER_PTN including renames
 Key: HIVE-15522
 URL: https://issues.apache.org/jira/browse/HIVE-15522
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15480) Failing test : TestMiniTezCliDriver.testCliDriver : explainanalyze_1

2016-12-20 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15480:
---

 Summary: Failing test : TestMiniTezCliDriver.testCliDriver : 
explainanalyze_1
 Key: HIVE-15480
 URL: https://issues.apache.org/jira/browse/HIVE-15480
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


See recent ptest failure : 
https://builds.apache.org/job/PreCommit-HIVE-Build/2642/testReport/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/testCliDriver_explainanalyze_1_/

{noformat}
Standard Output

Running: diff -a 
/home/hiveptest/104.154.92.121-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/explainanalyze_1.q.out
 
/home/hiveptest/104.154.92.121-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out
248c248
< Group By Operator [GBY_2] (rows=205/500 width=95)
---
> Group By Operator [GBY_2] (rows=205/309 width=95)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15469) Fix REPL DUMP/LOAD DROP_PTN so it works on non-string-ptn-key tables

2016-12-19 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15469:
---

 Summary: Fix REPL DUMP/LOAD DROP_PTN so it works on 
non-string-ptn-key tables
 Key: HIVE-15469
 URL: https://issues.apache.org/jira/browse/HIVE-15469
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


The current implementation of REPL DROP/REPL LOAD for DROP_PTN is limited to 
dropping partitions whose key types are strings. This needs the tableObj to be 
available in the DropPartitionMessage before it can be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15466) REPL LOAD & DUMP support for incremental DROP_TABLE/DROP_PTN

2016-12-19 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15466:
---

 Summary: REPL LOAD & DUMP support for incremental 
DROP_TABLE/DROP_PTN
 Key: HIVE-15466
 URL: https://issues.apache.org/jira/browse/HIVE-15466
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15455) Flaky test : TestHS2HttpServer.testContextRootUrlRewrite

2016-12-16 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15455:
---

 Summary: Flaky test : TestHS2HttpServer.testContextRootUrlRewrite
 Key: HIVE-15455
 URL: https://issues.apache.org/jira/browse/HIVE-15455
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


This test failed in ptest when testing HIVE-15426 but seems to succeed locally. 
I'm not able to find another recent run which had this test fail as well, and 
the test logs for HIVE-15426 have been rotated out. Creating this jira anyway, 
to track it if it pops up again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15454) Failing test : TestMiniTezCliDriver.testCliDriver : explainanalyze_2

2016-12-16 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15454:
---

 Summary: Failing test : TestMiniTezCliDriver.testCliDriver : 
explainanalyze_2
 Key: HIVE-15454
 URL: https://issues.apache.org/jira/browse/HIVE-15454
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


This test has failed on some recent ptest runs.

Example : 
https://builds.apache.org/job/PreCommit-HIVE-Build/2611/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/testCliDriver_explainanalyze_2_/

{noformat}
Standard Output

Running: diff -a 
/home/hiveptest/104.197.114.29-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/explainanalyze_2.q.out
 
/home/hiveptest/104.197.114.29-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out
2095c2095
<   Group By Operator [GBY_16] (rows=500/760 width=280)
---
>   Group By Operator [GBY_16] (rows=500/619 width=280)
2105c2105
<   Group By Operator [GBY_22] (rows=1001/760 width=464)
---
>   Group By Operator [GBY_22] (rows=1001/619 width=464)
2111c2111
<   Group By Operator [GBY_16] (rows=500/760 width=280)
---
>   Group By Operator [GBY_16] (rows=500/619 width=280)
2119c2119
<   Group By Operator [GBY_22] (rows=1001/760 width=464)
---
>   Group By Operator [GBY_22] (rows=1001/619 width=464)
2125c2125
<   Group By Operator [GBY_16] (rows=500/760 width=280)
---
>   Group By Operator [GBY_16] (rows=500/619 width=280)
2142c2142
<   Group By Operator [GBY_22] (rows=1001/760 width=464)
---
>   Group By Operator [GBY_22] (rows=1001/619 width=464)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15453) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : stats_based_fetch_decision

2016-12-16 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15453:
---

 Summary: Failing test : TestMiniLlapLocalCliDriver.testCliDriver : 
stats_based_fetch_decision
 Key: HIVE-15453
 URL: https://issues.apache.org/jira/browse/HIVE-15453
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


This test has been failing in a couple of ptests off late. A recent example is 
in 
https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/

{noformat}
2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
2016-12-16 09:42:14 Completed running task attempt: 
attempt_1481909974530_0001_239_00_00_0
2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
2016-12-16 09:42:14 Completed running task attempt: 
attempt_1481909974530_0001_240_00_00_0
2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
Running: diff -a 
/home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
 
/home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
153c153
<   Statistics: Num rows: 2000 Data size: 1092000 Basic stats: 
COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 2000 Data size: 1092000 Basic stats: 
> COMPLETE Column stats: COMPLETE
156c156
< Statistics: Num rows: 1 Data size: 546 Basic stats: 
COMPLETE Column stats: PARTIAL
---
> Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: COMPLETE
160c160
<   Statistics: Num rows: 1 Data size: 543 Basic stats: 
COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: COMPLETE
163c163
< Statistics: Num rows: 1 Data size: 543 Basic stats: 
COMPLETE Column stats: PARTIAL
---
> Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: COMPLETE
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15452) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : metadataonly1

2016-12-16 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15452:
---

 Summary: Failing test : TestMiniLlapLocalCliDriver.testCliDriver : 
metadataonly1
 Key: HIVE-15452
 URL: https://issues.apache.org/jira/browse/HIVE-15452
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Test seems to be failing on recent ptest runs.

See 
https://builds.apache.org/job/PreCommit-HIVE-Build/2615/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_metadataonly1_/
 for recent example.

{noformat}
Running: diff -a 
/home/hiveptest/104.154.236.143-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/metadataonly1.q.out
 
/home/hiveptest/104.154.236.143-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/metadataonly1.q.out
148c148
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
240c240
< NULL
---
> 1
287c287
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
379c379
< 0
---
> 1
971c971
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1016c1016
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1061c1061
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1160a1161
> 1 3
1448c1449
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1492c1493
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1587c1588
< NULL
---
> 2
1690c1691
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1735c1736
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1780c1781
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1825c1826
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1870c1871
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1975a1977,1979
> 01:10:10  1
> 01:10:20  1
> 1 3
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15451) Failing test : TestMiniLlapCliDriver.testCliDriver : transform_ppr2

2016-12-16 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15451:
---

 Summary: Failing test : TestMiniLlapCliDriver.testCliDriver :  
transform_ppr2
 Key: HIVE-15451
 URL: https://issues.apache.org/jira/browse/HIVE-15451
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


This test has been failing on ptest off late.

See 
https://builds.apache.org/job/PreCommit-HIVE-Build/2615/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapCliDriver/testCliDriver_transform_ppr2_/
 for a recent example

Fails on stdout diff:
{noformat}
2016-12-16 12:20:11 Completed running task attempt: 
attempt_1481919437560_0001_177_01_00_0
Running: diff -a 
/home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/transform_ppr2.q.out
 
/home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/llap/transform_ppr2.q.out
41c41
<   Statistics: Num rows: 1000 Data size: 178000 Basic stats: 
COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 1000 Data size: 178000 Basic stats: 
> COMPLETE Column stats: COMPLETE
46c46
< Statistics: Num rows: 1000 Data size: 272000 Basic stats: 
COMPLETE Column stats: PARTIAL
---
> Statistics: Num rows: 1000 Data size: 272000 Basic stats: 
> COMPLETE Column stats: COMPLETE
59c59
<   Statistics: Num rows: 1000 Data size: 272000 Basic 
stats: COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 1000 Data size: 272000 Basic 
> stats: COMPLETE Column stats: COMPLETE
63c63
< Statistics: Num rows: 333 Data size: 2664 Basic 
stats: COMPLETE Column stats: PARTIAL
---
> Statistics: Num rows: 333 Data size: 2664 Basic 
> stats: COMPLETE Column stats: COMPLETE
69c69
<   Statistics: Num rows: 333 Data size: 2664 Basic 
stats: COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 333 Data size: 2664 Basic 
> stats: COMPLETE Column stats: COMPLETE
178c178
< Statistics: Num rows: 333 Data size: 2664 Basic stats: 
COMPLETE Column stats: PARTIAL
---
> Statistics: Num rows: 333 Data size: 2664 Basic stats: 
> COMPLETE Column stats: COMPLETE
184c184
<   Statistics: Num rows: 333 Data size: 2664 Basic stats: 
COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 333 Data size: 2664 Basic stats: 
> COMPLETE Column stats: COMPLETE
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15450) Flaky tests : testCliDriver.sample[24679]

2016-12-16 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15450:
---

 Summary: Flaky tests : testCliDriver.sample[24679]
 Key: HIVE-15450
 URL: https://issues.apache.org/jira/browse/HIVE-15450
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Noted during ptests, the .q out seems to be erroring.

There seems to be a difference in ordering of output that is causing this 
failure.

See https://builds.apache.org/job/PreCommit-HIVE-Build/2615/#showFailuresLink 
for a new-ish job with these failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15449) Failing test : TestVectorizedColumnReaderBase (possibly slow)

2016-12-16 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15449:
---

 Summary: Failing test : TestVectorizedColumnReaderBase (possibly 
slow)
 Key: HIVE-15449
 URL: https://issues.apache.org/jira/browse/HIVE-15449
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Got the following error from a ptest run:

TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD

2016-12-13 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15426:
---

 Summary: Fix order guarantee of event executions for REPL LOAD
 Key: HIVE-15426
 URL: https://issues.apache.org/jira/browse/HIVE-15426
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15332) REPL LOAD & DUMP support for incremental CREATE_TABLE/ADD_PTN

2016-12-01 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15332:
---

 Summary: REPL LOAD & DUMP support for incremental 
CREATE_TABLE/ADD_PTN
 Key: HIVE-15332
 URL: https://issues.apache.org/jira/browse/HIVE-15332
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We need to add in support for REPL LOAD and REPL DUMP of incremental events, 
and we need to be able to replicate creates, for a start. This jira tracks the 
inclusion of CREATE_TABLE/ADD_PARTITION event support to REPL DUMP & LOAD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15284) Add junit test to test replication scenarios

2016-11-24 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15284:
---

 Summary: Add junit test to test replication scenarios
 Key: HIVE-15284
 URL: https://issues.apache.org/jira/browse/HIVE-15284
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15151) Bootstrap support for replv2

2016-11-08 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-15151:
---

 Summary: Bootstrap support for replv2
 Key: HIVE-15151
 URL: https://issues.apache.org/jira/browse/HIVE-15151
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We need to support the ability to bootstrap an initial state, dumping out 
currently existing dbs/tables, etc, so that incremental replication can take 
over from that point. To this end, we should implement commands such as REPL 
DUMP, REPL LOAD, REPL STATUS, as described over at 
https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: behavior or insert overwrite with dynamic partitions

2016-10-17 Thread Sushanth Sowmyan

I expect the following because it follows per-ptn if-write-then-overwrite
semantics:

0,10
1,25
1,50

There can be a case to be made that it should overwrite the entire table,
and that would make sense too(probably more sense than this one), but not
one I'd think we should switch behavior to(backward compatibility).

On Oct 17, 2016 18:10, "Sergey Shelukhin"  wrote:

> What do you think this SHOULD do?
>
> > select key from src;
> 10
> 25
> 50
>
> > create table t(val int) partitioned by (pk int);
> > insert overwrite table t partition (pk)
>   select 0 as val, key from src where key < 30;
> > insert overwrite table t partition (pk)
>   select 1 as val, key from src where key > 20;
>
>
> > select val, pk from t;
> ?
>
>

Re: Creating a new branch for repl dev

2016-09-26 Thread Sushanth Sowmyan

With no objections received, I have created a new branch called repl2,
and have created a new umbrella jira ( HIVE-14841 ) and a jira
component (repl) to track continued development.

Thanks,
-Sushanth

On Thu, Sep 22, 2016 at 10:03 AM, Sushanth Sowmyan <khorg...@gmail.com> wrote:
> Hi Folks,
>
> We had some work done with replication back at HIVE-7973 and this
> implemented a primary mode of replication for hive which can integrate
> with tools like Falcon. I intend to move forward on continuing to
> improve this, to fix some of the major problems with the current
> implementation, mostly the following:
>
> a) Replication follows a rubberbanding pattern, wherein different
> tables/ptns can be in a different/mixed state on the destination, so
> that unless all events are caught up on, we do not have an equivalent
> warehouse. Thus, this only satisfies DR cases, not load balancing
> usecases, and the secondary warehouse is really only seen as a backup,
> rather than as a live warehouse that trails the primary.
> b) The base implementation is a naive implementation, and has several
> performance problems, including a large amount of duplication of data
> for subsequent events, as mentioned in HIVE-13348, having to copy out
> entire partitions/tables when just a delta of files might be
> sufficient/etc. Also, using EXPORT/IMPORT allows us a simple
> implementation, but at the cost of tons of temporary space, much of
> which is not actually applied at the destination.
>
> To that end, I want to create a new branch, so that we can track
> development on this end on public apache jira. The last time I worked
> on this, having a private branch meant large uber patches as in
> HIVE-10227, which I would like to avoid this time, and is also more
> inkeeping with open-development. Also, developing in master itself is
> not a good idea, since some of the ideas I'm trying out can be
> experimental, and probably still a ways from maturity.
>
> So, unless anyone has any objection, I would like to create a new
> branch off master, say "repl2" and create an uber jira to manage
> individual components of the work.
>
> Thanks,
> -Sushanth

[jira] [Created] (HIVE-14841) Replication - Phase 2

2016-09-26 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-14841:
---

 Summary: Replication - Phase 2
 Key: HIVE-14841
 URL: https://issues.apache.org/jira/browse/HIVE-14841
 Project: Hive
  Issue Type: New Feature
  Components: repl
Affects Versions: 2.1.0
Reporter: Sushanth Sowmyan


Per email sent out to the dev list, the current implementation of replication 
in hive has certain drawbacks, for instance :

* Replication follows a rubberbanding pattern, wherein different
tables/ptns can be in a different/mixed state on the destination, so
that unless all events are caught up on, we do not have an equivalent
warehouse. Thus, this only satisfies DR cases, not load balancing
usecases, and the secondary warehouse is really only seen as a backup,
rather than as a live warehouse that trails the primary.
* The base implementation is a naive implementation, and has several
performance problems, including a large amount of duplication of data
for subsequent events, as mentioned in HIVE-13348, having to copy out
entire partitions/tables when just a delta of files might be
sufficient/etc. Also, using EXPORT/IMPORT allows us a simple
implementation, but at the cost of tons of temporary space, much of
which is not actually applied at the destination.

Thus, to track this, we now create a new branch (repl2) and a uber-jira(this 
one) to track experimental development towards improvement of this situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Getting ready for a Hive 1.2.2 release

2016-09-22 Thread Sushanth Sowmyan

Hi Folks,

I'm afraid I've been otherwise occupied and not been able to spend
enough time on this. Thankfully, Vaibhav Gumashta has volunteered to
take this on and be the RM for 1.2.2 . He'll follow up on the process
as it goes forward.

Thanks,
-Sushanth


On Tue, May 3, 2016 at 4:32 PM, Sushanth Sowmyan <khorg...@gmail.com> wrote:
> Hi All,
>
> It has been nearly a year now since 1.2.1 was released, and I said
> then that I would keep the branch open for further bugfixes with a
> view of making another 1.2.2 stability upgrade release. There have
> been 64 such patches committed since then.
>
> I think it's time we revisited that and looked to making a 1.2.2
> release to reflect 1.2.1 + all these updates, and I will go ahead and
> start rolling out release candidates after next weekend(May 16th)
> unless anyone has any objections.
>
> If anyone wants to get in any other patches before then, please feel
> free to do so. The original restrictions for commits to branch-1.2,
> that of no breaking changes, no db changes and no large features still
> applies, and I'll do a full test verification before pushing it out.
> If anyone has any patches that they think will take longer than next
> week, but are important fixes that need to be in this, please ping me.
> I will edit the wiki (
> https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status
> ) to reflect this.
>
> Thanks,
> -Sushanth

Creating a new branch for repl dev

2016-09-22 Thread Sushanth Sowmyan

Hi Folks,

We had some work done with replication back at HIVE-7973 and this
implemented a primary mode of replication for hive which can integrate
with tools like Falcon. I intend to move forward on continuing to
improve this, to fix some of the major problems with the current
implementation, mostly the following:

a) Replication follows a rubberbanding pattern, wherein different
tables/ptns can be in a different/mixed state on the destination, so
that unless all events are caught up on, we do not have an equivalent
warehouse. Thus, this only satisfies DR cases, not load balancing
usecases, and the secondary warehouse is really only seen as a backup,
rather than as a live warehouse that trails the primary.
b) The base implementation is a naive implementation, and has several
performance problems, including a large amount of duplication of data
for subsequent events, as mentioned in HIVE-13348, having to copy out
entire partitions/tables when just a delta of files might be
sufficient/etc. Also, using EXPORT/IMPORT allows us a simple
implementation, but at the cost of tons of temporary space, much of
which is not actually applied at the destination.

To that end, I want to create a new branch, so that we can track
development on this end on public apache jira. The last time I worked
on this, having a private branch meant large uber patches as in
HIVE-10227, which I would like to avoid this time, and is also more
inkeeping with open-development. Also, developing in master itself is
not a good idea, since some of the ideas I'm trying out can be
experimental, and probably still a ways from maturity.

So, unless anyone has any objection, I would like to create a new
branch off master, say "repl2" and create an uber jira to manage
individual components of the work.

Thanks,
-Sushanth

[jira] [Created] (HIVE-14766) ObjectStore.initialize() needs retry mechanisms in case of connection failures

2016-09-15 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-14766:
---

 Summary: ObjectStore.initialize() needs retry mechanisms in case 
of connection failures
 Key: HIVE-14766
 URL: https://issues.apache.org/jira/browse/HIVE-14766
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


RetryingHMSHandler handles retries to most HMSHandler calls. However, one area 
where we do not have retries is in the very instantiation of ObjectStore. The 
lack of retries here sometimes means that a flaky db connect around the time 
the metastore is started yields an unresponsive metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-17 Thread Sushanth Sowmyan

+1 for having a separate storage-api project to define common interfaces
for people to develop against. It'll make things much easier to develop
against generically.

I'm okay(+0) with the sub-project idea as opposed to enthusiastic about it,
mostly because I have reservations that it'll encourage laziness and will
in practice wind up being tied to hive releases and dev and over time
assumptions of how hive works and what is available will bleed in. But,
still, having a motion of separation will definitely help.

On Aug 17, 2016 11:39, "Prasanth Jayachandran" <
pjayachand...@hortonworks.com> wrote:

> +1 for making it a subproject with separate (preferably shorter) release
> cycle. The module in itself is too small for a separate project. Also
> having a faster release cycle will resolve circular dependency and will
> help other projects make use of vectorization, sarg, bloom filter etc.
>
> For version management, how about adding another version after patch
> version i.e sub-project version?
> Example: 2.2.0.[0] will be storage api’s release version. Hive will always
> depend on 2.2.0-SNAPSHOT. I think maven will let us release modules with
> different versions. https://dev.c-ware.de/confluence/display/PUBLIC/
> Releasing+modules+of+a+multi-module+project+with+
> independent+version+numbers
>
> Thanks
> Prasanth
>
> > On Aug 17, 2016, at 10:46 AM, Alan Gates  wrote:
> >
> > +1 for making the API clean and easy for other projects to work with.  A
> few questions:
> >
> > 1) Would this also make it easier for Parquet and others to implement
> Hive’s ACID interfaces?
> >
> > 2) Would we make any attempt to coordinate version numbers between Hive
> and the storage module, or would a given version of Hive just depend on a
> given version of the storage module?
> >
> > Alan.
> >
> >> On Aug 15, 2016, at 17:01, Owen O'Malley  wrote:
> >>
> >> All,
> >>
> >> As part of moving ORC out of Hive, we pulled all of the vectorization
> >> storage and sarg classes into a separate module, which is named
> >> storage-api.  Although it is currently only used by ORC, it could be
> used
> >> by Parquet or Avro if they wanted to make a fast vectorized reader that
> >> read directly in to Hive's VectorizedRowBatch without needing a shim or
> >> data copy. Note that this is in many ways similar to pulling the Arrow
> >> project out of Drill.
> >>
> >> This unfortunately still leaves us with a circular dependency between
> Hive
> >> and ORC. I'd hoped that storage-api wouldn't change that much, but that
> >> doesn't seem to be happening. As a result, ORC ends up shipping its own
> >> fork of storage-api.
> >>
> >> Although we could make a new project for just the storage-api, I think
> it
> >> would be better to make it a subproject of Hive that is released
> >> independently.
> >>
> >> What do others think?
> >>
> >>  Owen
> >
> >
>
>

[jira] [Created] (HIVE-14449) Expand HiveReplication doc as a admin/user-facing doc

2016-08-05 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-14449:
---

 Summary: Expand HiveReplication doc as a admin/user-facing doc
 Key: HIVE-14449
 URL: https://issues.apache.org/jira/browse/HIVE-14449
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Sushanth Sowmyan


https://cwiki.apache.org/confluence/display/Hive/Replication is a good 
user-facing/admin-facing doc for replication, in contrast to the 
https://cwiki.apache.org/confluence/display/Hive/HiveReplicationDevelopment 
which was intended to talk more about the design of it.

We should expand this further with all the knobs that exist, what APIs exist 
for other programs to take advantage of replication, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14394) Reduce excessive INFO level logging

2016-07-31 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-14394:
---

 Summary: Reduce excessive INFO level logging
 Key: HIVE-14394
 URL: https://issues.apache.org/jira/browse/HIVE-14394
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We need to cull down on the number of logs we generate in HMS and HS2 that are 
not needed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14365) Simplify logic for check introduced in HIVE-10022

2016-07-27 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-14365:
---

 Summary: Simplify logic for check introduced in HIVE-10022
 Key: HIVE-14365
 URL: https://issues.apache.org/jira/browse/HIVE-14365
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan


We introduced a parent-check/glob-check/file-check in SQLAuthorizationUtils in 
HIVE-10022, but the logic for that is more convoluted than it needs to be. 
Taking a cue off RANGER-1126 , we should simplify this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-17 Thread Sushanth Sowmyan

Good to have you onboard, Jesus! :)

On Jul 17, 2016 12:00, "Lefty Leverenz"  wrote:

> Congratulations Jesus!
>
> -- Lefty
>
> On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan 
> wrote:
>
>> Hello Hive community,
>>
>> I'm pleased to announce that Jesus Camacho Rodriguez has accepted the
>> Apache Hive PMC's
>> invitation, and is now our newest PMC member. Many thanks to Jesus for
>> all of
>> his hard work.
>>
>> Please join me congratulating Jesus!
>>
>> Best,
>> Ashutosh
>> (On behalf of the Apache Hive PMC)
>>
>
>

Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-17 Thread Sushanth Sowmyan

Welcome aboard Pengcheng! :)

On Jul 17, 2016 12:01, "Lefty Leverenz"  wrote:

> Congratulations Pengcheng!
>
> -- Lefty
>
> On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan 
> wrote:
>
>> >
>> > Hello Hive community,
>> >
>> > I'm pleased to announce that Pengcheng Xiong has accepted the Apache
>> Hive
>> > PMC's
>> > invitation, and is now our newest PMC member. Many thanks to Pengcheng
>> for
>> > all of his hard work.
>> >
>> > Please join me congratulating Pengcheng!
>> >
>> > Best,
>> > Ashutosh
>> > (On behalf of the Apache Hive PMC)
>> >
>>
>
>

[jira] [Created] (HIVE-14207) Strip HiveConf hidden params in webui conf

2016-07-11 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-14207:
---

 Summary: Strip HiveConf hidden params in webui conf
 Key: HIVE-14207
 URL: https://issues.apache.org/jira/browse/HIVE-14207
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


HIVE-12338 introduced a new web ui, which has a page that displays the current 
HiveConf being used by HS2. However, before it displays that config, it does 
not strip entries from it which are considered "hidden" conf parameters, thus 
exposing those values from a web-ui for HS2. We need to add stripping to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

2016-06-17 Thread Sushanth Sowmyan

Actually, to be more explicit, per Thejas' case of the top level
license taking precedence, this RC has my +1.

On Fri, Jun 17, 2016 at 3:28 PM, Sushanth Sowmyan <khorg...@gmail.com> wrote:
> I will happily rescind my -1 and even convert it to a +1 if the top
> level license does hold. I thought that the RAT check was a necessary
> blocker.
>
> (Although, if the top level license does cover across the board, we
> may want to open a new discussion on whether having a license
> requirement for every source file is necessary in the first place, and
> tweak the definition of the rat check so it does not fail it in this
> case.)
>
> On Fri, Jun 17, 2016 at 3:20 PM, Thejas Nair <thejas.n...@gmail.com> wrote:
>> I don't think the missing headers for 2 files mandates a respin of
>> this RC .  It is not really a case of 'incompatible' license or code
>> that shouldn't be shipped.
>> We have a top level license file that covers the entire project,
>> including these files.
>> IMO, We should fix it if there is a new RC for some other reason. But
>> this alone doesn't seem to make new RC necessary.
>>
>> Sushanth, Can you please reconsider your -1 ?
>>
>>
>> On Fri, Jun 17, 2016 at 3:06 PM, Sushanth Sowmyan <khorg...@gmail.com> wrote:
>>> -1, terribly sorry I didn't check for this earlier, but the RAT check
>>> fails for this.
>>>
>>> If you run mvn apache-rat:check , then you see the following issue:
>>>
>>> Unapproved licenses:
>>>
>>>   
>>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java
>>>   
>>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java
>>>
>>> Basically, these two files are missing the apache license header. We
>>> need to add them in.
>>>
>>> All other things are good, though. It has the oracle fix I asked for
>>> in RC2, md5s and signatures check out, compilation works on source
>>> package, and I'm able to run the hive binary from the binary package.
>>> I also tried a number of tests, and I've run a rat test on the release
>>>
>>> On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguez
>>> <jcamachorodrig...@hortonworks.com> wrote:
>>>> Apache Hive 2.1.0 Release Candidate 3 is available here:
>>>>
>>>> http://people.apache.org/~jcamacho/hive-2.1.0-rc3
>>>>
>>>> Maven artifacts are available here:
>>>>
>>>> https://repository.apache.org/content/repositories/orgapachehive-1057/
>>>>
>>>> Source tag for RC3 is at:
>>>> https://github.com/apache/hive/releases/tag/release-2.1.0-rc3
>>>>
>>>>
>>>> Voting will conclude in 72 hours.
>>>>
>>>> Hive PMC Members: Please test and vote.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>>

Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

2016-06-17 Thread Sushanth Sowmyan

I will happily rescind my -1 and even convert it to a +1 if the top
level license does hold. I thought that the RAT check was a necessary
blocker.

(Although, if the top level license does cover across the board, we
may want to open a new discussion on whether having a license
requirement for every source file is necessary in the first place, and
tweak the definition of the rat check so it does not fail it in this
case.)

On Fri, Jun 17, 2016 at 3:20 PM, Thejas Nair <thejas.n...@gmail.com> wrote:
> I don't think the missing headers for 2 files mandates a respin of
> this RC .  It is not really a case of 'incompatible' license or code
> that shouldn't be shipped.
> We have a top level license file that covers the entire project,
> including these files.
> IMO, We should fix it if there is a new RC for some other reason. But
> this alone doesn't seem to make new RC necessary.
>
> Sushanth, Can you please reconsider your -1 ?
>
>
> On Fri, Jun 17, 2016 at 3:06 PM, Sushanth Sowmyan <khorg...@gmail.com> wrote:
>> -1, terribly sorry I didn't check for this earlier, but the RAT check
>> fails for this.
>>
>> If you run mvn apache-rat:check , then you see the following issue:
>>
>> Unapproved licenses:
>>
>>   
>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java
>>   
>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java
>>
>> Basically, these two files are missing the apache license header. We
>> need to add them in.
>>
>> All other things are good, though. It has the oracle fix I asked for
>> in RC2, md5s and signatures check out, compilation works on source
>> package, and I'm able to run the hive binary from the binary package.
>> I also tried a number of tests, and I've run a rat test on the release
>>
>> On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguez
>> <jcamachorodrig...@hortonworks.com> wrote:
>>> Apache Hive 2.1.0 Release Candidate 3 is available here:
>>>
>>> http://people.apache.org/~jcamacho/hive-2.1.0-rc3
>>>
>>> Maven artifacts are available here:
>>>
>>> https://repository.apache.org/content/repositories/orgapachehive-1057/
>>>
>>> Source tag for RC3 is at:
>>> https://github.com/apache/hive/releases/tag/release-2.1.0-rc3
>>>
>>>
>>> Voting will conclude in 72 hours.
>>>
>>> Hive PMC Members: Please test and vote.
>>>
>>> Thanks.
>>>
>>>
>>>
>>>

Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

2016-06-17 Thread Sushanth Sowmyan

-1, terribly sorry I didn't check for this earlier, but the RAT check
fails for this.

If you run mvn apache-rat:check , then you see the following issue:

Unapproved licenses:

/Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java

/Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java

Basically, these two files are missing the apache license header. We
need to add them in.

All other things are good, though. It has the oracle fix I asked for
in RC2, md5s and signatures check out, compilation works on source
package, and I'm able to run the hive binary from the binary package.
I also tried a number of tests, and I've run a rat test on the release

On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguez
 wrote:
> Apache Hive 2.1.0 Release Candidate 3 is available here:
>
> http://people.apache.org/~jcamacho/hive-2.1.0-rc3
>
> Maven artifacts are available here:
>
> https://repository.apache.org/content/repositories/orgapachehive-1057/
>
> Source tag for RC3 is at:
> https://github.com/apache/hive/releases/tag/release-2.1.0-rc3
>
>
> Voting will conclude in 72 hours.
>
> Hive PMC Members: Please test and vote.
>
> Thanks.
>
>
>
>

Re: [VOTE] Apache Hive 2.1.0 Release Candidate 2

2016-06-16 Thread Sushanth Sowmyan

Without HIVE-14020, I'm afraid people will not be able to upgrade the
hive metastore from an earlier version of hive to 2.1 if they use
Oracle as a backing db.

There are workarounds, in that the sql script is easily fixed, but
since we're still in the process of voting a RC, I think this is a big
enough problem that we should roll out a new RC. I think I'm a -0 on
this.



On Thu, Jun 16, 2016 at 2:58 PM, Jesus Camacho Rodriguez
 wrote:
> Yes, exactly... I am taking care of that once again, do not worry.
> If you want a precise list of which issues were actually fixed in
> this release, you can check the release notes in RC2 :)
>
>
>
>
> On 6/16/16, 10:32 PM, "Sergey Shelukhin"  wrote:
>
>>Hmm… would this mean that all those issues changed from 2.1.1 to 2.1.0
>>would need to be changed back to 2.1.1 now? ;)
>>
>>On 16/6/16, 13:12, "Jesus Camacho Rodriguez"
>> wrote:
>>
>>>I have been talking to Matt and HIVE-13974 will not make it to the
>>>release as it needs some
>>>additional time to be fixed. I will add info about this issue to the
>>>release note.
>>>
>>>This means RC2 is still alive.
>>>
>>>We already got a +1 from Alan. Please, Hive PMC Members, test and vote so
>>>we can move forward
>>>with the release!
>>>
>>>Thanks!
>>>
>>>
>>>
>>>On 6/16/16, 11:02 AM, "Jesus Camacho Rodriguez"
>>> wrote:
>>>
Sure, I am taking care of this each time we roll out a new RC.




On 6/15/16, 10:43 PM, "Sergey Shelukhin"  wrote:

>Should all the 2.1.1-fixed JIRAs be converted to 2.1.0?
>
>On 16/6/15, 14:03, "Jesus Camacho Rodriguez"
> wrote:
>
>>OK, vote for RC2 is cancelled.
>>
>>Matt, please push HIVE-13974 as soon as possible and I will restart the
>>vote.
>>
>>Thanks,
>>Jesús
>>
>>
>>
>>
>>
>>On 6/15/16, 9:47 PM, "Matthew McCline" 
>>wrote:
>>
>>>
>>>-1 for HIVE-13974 ORC Schema Evolution doesn't support add columns to
>>>non-last STRUCT columns
>>>
>>>This bug will prevent people with ORC tables that have added columns
>>>to
>>>inner STRUCT columns to not be able to read their tables.
>>>
>>>
>>>From: Jesus Camacho Rodriguez 
>>>Sent: Wednesday, June 15, 2016 3:20 AM
>>>To: dev@hive.apache.org
>>>Subject: Re: [VOTE] Apache Hive 2.1.0 Release Candidate 2
>>>
>>>Hive PMC members,
>>>
>>>Just a quick reminder that the vote for RC2 is still open and it needs
>>>two additional votes to pass.
>>>
>>>Please test and cast your vote!
>>>
>>>Thanks,
>>>Jesús
>>>
>>>
>>>
>>>On 6/10/16, 6:29 PM, "Alan Gates"  wrote:
>>>
+1, checked signatures, did a build and ran a few simple unit tests.

Alan.

> On Jun 10, 2016, at 05:44, Jesus Camacho Rodriguez
> wrote:
>
> Apache Hive 2.1.0 Release Candidate 2 is available here:
>
> http://people.apache.org/~jcamacho/hive-2.1.0-rc2
>
> Maven artifacts are available here:
>
>
>https://repository.apache.org/content/repositories/orgapachehive-105
>5/
>
> Source tag for RC2 is at:
> https://github.com/apache/hive/releases/tag/release-2.1.0-rc2
>
>
> Voting will conclude in 72 hours.
>
> Hive PMC Members: Please test and vote.
>
> Thanks.
>
>


>>>
>
>>

[jira] [Created] (HIVE-13949) Investigate why Filter mechanism does not work for XSRF filtering from HS2

2016-06-05 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-13949:
---

 Summary: Investigate why Filter mechanism does not work for XSRF 
filtering from HS2
 Key: HIVE-13949
 URL: https://issues.apache.org/jira/browse/HIVE-13949
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Sushanth Sowmyan


While working on HIVE-13853, it was found that simply using the constructed 
Filter as-is from ThriftHttpCLIService was not working, and thus, needed 
explicit calling of the filtering method from ThriftHttpServlet was needed.

We should investigate why that other method did not work, and make it fall 
inline with filter usage, so as to not need to call functions inside the 
filter. Also, this is a prerequisite for eventually getting rid of our shim if 
we later update to always expecting hadoop versions that contain the filter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13941) Improve errors returned from SchemaTool

2016-06-03 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-13941:
---

 Summary: Improve errors returned from SchemaTool
 Key: HIVE-13941
 URL: https://issues.apache.org/jira/browse/HIVE-13941
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We've had feedback from Ambari folks on Schematool usage being opaque on errors.

While, yes, the underlying error is present hidden in the stacktrace if you do 
a --verbose, that is often unwieldy and unusable. And without a --verbose, 
there is no indication of what actually went wrong.

Thus, we need to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13931) Add support for HikariCP and replace BoneCP usage with HikariCP

2016-06-02 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-13931:
---

 Summary: Add support for HikariCP and replace BoneCP usage with 
HikariCP
 Key: HIVE-13931
 URL: https://issues.apache.org/jira/browse/HIVE-13931
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Currently, we use BoneCP as our primary connection pooling mechanism 
(overridable by users). However, BoneCP is no longer being actively developed, 
and is considered deprecated, replaced by HikariCP.

Thus, we should add support for HikariCP, and try to replace our primary usage 
of BoneCP with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat

2016-05-25 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-13853:
---

 Summary: Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
 Key: HIVE-13853
 URL: https://issues.apache.org/jira/browse/HIVE-13853
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, WebHCat
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


There is a possibility that there may be a CSRF-based attack on various hadoop 
components, and thus, there is an effort to add a block for all incoming http 
requests if they do not contain a X-XSRF-Header header. (See HADOOP-12691 for 
motivation)

This has potential to affect HS2 when running on thrift-over-http mode(if 
cookie-based-auth is used), and webhcat.

We introduce new flags to determine whether or not we're using the filter, and 
if we are, we will automatically reject any http requests which do not contain 
this header.

To allow this to work, we also need to make changes to our JDBC driver to 
automatically inject this header into any requests it makes. Also, any 
client-side programs/api not using the JDBC driver directly will need to make 
changes to add a X-XSRF-Header header to the request to make calls to 
HS2/WebHCat if this filter is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13738) Bump up httpcomponent.*.version deps in branch-1.2 to 4.4

2016-05-11 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-13738:
---

 Summary: Bump up httpcomponent.*.version deps in branch-1.2 to 4.4
 Key: HIVE-13738
 URL: https://issues.apache.org/jira/browse/HIVE-13738
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1, 1.2.2
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


apache-httpcomponents has had certain security issues (see HADOOP-12767) due to 
which upgrading to a newer dep version is recommended.

We've already upped the dep. version to 4.4 in other branches of hive, we 
should do so here as well if we are going to do a new update of 1.2.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Getting ready for a Hive 1.2.2 release

2016-05-03 Thread Sushanth Sowmyan

Hi All,

It has been nearly a year now since 1.2.1 was released, and I said
then that I would keep the branch open for further bugfixes with a
view of making another 1.2.2 stability upgrade release. There have
been 64 such patches committed since then.

I think it's time we revisited that and looked to making a 1.2.2
release to reflect 1.2.1 + all these updates, and I will go ahead and
start rolling out release candidates after next weekend(May 16th)
unless anyone has any objections.

If anyone wants to get in any other patches before then, please feel
free to do so. The original restrictions for commits to branch-1.2,
that of no breaking changes, no db changes and no large features still
applies, and I'll do a full test verification before pushing it out.
If anyone has any patches that they think will take longer than next
week, but are important fixes that need to be in this, please ping me.
I will edit the wiki (
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status
) to reflect this.

Thanks,
-Sushanth

[jira] [Created] (HIVE-13670) Improve Beeline reconnect semantics

2016-05-02 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-13670:
---

 Summary: Improve Beeline reconnect semantics
 Key: HIVE-13670
 URL: https://issues.apache.org/jira/browse/HIVE-13670
 Project: Hive
  Issue Type: Improvement
Affects Versions: 2.1.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


For most users of beeline, chances are that they will be using it with a single 
HS2 instance most of the time. In this scenario, having them type out a jdbc 
uri for HS2 every single time to !connect can get tiresome. Thus, we should 
improve semantics so that if a user does a successful !connect, then we must 
store the last-connected-to-url, so that if they do a !close, and then a 
!reconnect, then !reconnect should attempt to connect to the last successfully 
used url.

Also, if they then do a !save, then that last-successfully-used url must be 
saved, so that in subsequent sessions, they can simply do !reconnect rather 
than specifying a url for !connect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13645) Beeline needs null-guard around hiveVars and hiveConfVars read

2016-04-28 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-13645:
---

 Summary: Beeline needs null-guard around hiveVars and hiveConfVars 
read
 Key: HIVE-13645
 URL: https://issues.apache.org/jira/browse/HIVE-13645
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 2.1.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Beeline has a bug wherein if a user does a !save ever, then on next load, if 
beeline.hiveVariables or beeline.hiveconfvariables are empty, i.e. \{\} or 
unspecified, then it loads it as null, and then, on next connect, there is no 
null-check on these variables leading to an NPE.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13480) Add hadoop2 metrics reporter for Codahale metrics

2016-04-11 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-13480:
---

 Summary: Add hadoop2 metrics reporter for Codahale metrics
 Key: HIVE-13480
 URL: https://issues.apache.org/jira/browse/HIVE-13480
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan


Multiple other apache components allow sending metrics over to Hadoop2 metrics, 
which allow for monitoring solutions like Ambari Metrics Server to work against 
that to show metrics for components in one place. Our Codahale metrics works 
very well, so ideally, we would like to bridge the two, to allow Codahale to 
add a Hadoop2 reporter that enables us to continue to use Codahale metrics 
(i.e. not write another custom metrics impl) but report using Hadoop2.

Apache Phoenix also had such a recent usecase and were in the process of adding 
in a stub piece that allows this forwarding. We should use the same reporter to 
minimize redundancy while pushing metrics to a centralized solution like 
Hadoop2 Metrics/AMS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13370) Add test for HIVE-11470

2016-03-28 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-13370:
---

 Summary: Add test for HIVE-11470
 Key: HIVE-13370
 URL: https://issues.apache.org/jira/browse/HIVE-13370
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13348) Add Event Nullification support for Replication

2016-03-23 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-13348:
---

 Summary: Add Event Nullification support for Replication
 Key: HIVE-13348
 URL: https://issues.apache.org/jira/browse/HIVE-13348
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Replication, as implemented by HIVE-7973 works as follows:

a) For every singly modification to the hive metastore, an event gets triggered 
that logs a notification object.
b) Replication tools such as falcon can consume these notification objects as a 
HCatReplicationTaskIterator from HCatClient.getReplicationTasks(lastEventId, 
maxEvents, dbName, tableName).
c) For each event,  we generate statements and distcp requirements for falcon 
to export, distcp and import to do the replication (along with requisite 
changes to export and import that would allow state management).

The big thing missing from this picture is that while it works, it is pretty 
dumb about how it works in that it will exhaustively process every single event 
generated, and will try to do the export-distcp-import cycle for all 
modifications, irrespective of whether or not that will actually get used at 
import time.

We need to build some sort of filtering logic which can process a batch of 
events to identify events that will result in effective no-ops, and to nullify 
those events from the stream before passing them on. The goal is to minimize 
the number of events that the tools like Falcon would actually have to process.

Examples of cases where event nullification would take place:

a) CREATE-DROP cases: If an object is being created in event#34 that will 
eventually get dropped in event#47, then there is no point in replicating this 
along. We simply null out both these events, and also, any other event that 
references this object between event#34 and event#47.

b) APPEND-APPEND : Some objects are replicated wholesale, which means every 
APPEND that occurs would cause a full export of the object in question. At this 
point, the prior APPENDS would all be supplanted by the last APPEND. Thus, we 
could nullify all the prior such events. 

Additional such cases can be inferred by analysis of the Export-Import relay 
protocol definition at 
https://issues.apache.org/jira/secure/attachment/12725999/EXIMReplicationReplayProtocol.pdf
 or by reasoning out various event processing orders possible.

Replication, as implemented by HIVE-7973 is merely a first step for functional 
support. This work is needed for replication to be efficient at all, and thus, 
usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [Discuss] MariaDB support

2016-03-19 Thread Sushanth Sowmyan

+1 to introduction of mariadb support - I think it's important that we
support MariaDB - there is an increasing interest in the broader open
source community of migrating from mysql to either postgres or
mariadb. While they're compatible now, it's important that we be aware
of gotchas that come up, which we'll be aware of only after there is
active usage.

+1 to not duplicating mysql scripts unless we find a need to diverge,
and having schematool consider it an alias for now.

On Wed, Mar 16, 2016 at 12:09 PM, Szehon Ho  wrote:
> Yea, +1 to point 2.
>
> For point one, I also agree that it is compatible with mysql and not be a
> ton of work unless you want to optimize, on our observations we have seen
> existing mysql scripts work fine against mariadb.
>
> On Wed, Mar 16, 2016 at 12:04 PM, Dmitry Tolpeko 
> wrote:
>>
>> +1 great idea
>>
>> On Wed, Mar 16, 2016 at 10:00 PM, Thejas Nair 
>> wrote:
>>>
>>> + Sergio, Szehon, Ashutosh, Sushanth, Sergey,
>>>
>>> Any thoughts on this ?
>>>
>>>
>>> On Tue, Mar 15, 2016 at 7:08 PM, Thejas Nair 
>>> wrote:
>>> > There seems to be increasing interest in supporting MariaDB as an
>>> > option for storing metastore metadata. Supporting it as a database
>>> > option is also easy as it is compatible with mysql. I thought it would
>>> > be useful to discuss supporting it in the dev list before creating any
>>> > jiras.
>>> >
>>> > There are two aspects I would like to discuss -
>>> >
>>> > 1. Changes in hive to support MariaDB
>>> >
>>> > The existing mysql schema creation/upgrade scripts in hive should just
>>> > work for mariadb as well.
>>> > However, MariaDB has some additional optimizations that we might want
>>> > to use in future to optimize queries for it. That would mean creating
>>> > specific scripts for mariadb.
>>> >
>>> > However, until we introduce such MariaDB specific tuning, I think it
>>> > is better to avoid duplicating the mysql scripts.
>>> >
>>> > To make the transition to possibly using MariaDB optimized scripts
>>> > easier, one option is to have schematool consider it as an alias for
>>> > mysql until that happens.
>>> >
>>> >
>>> > 2. Testing with MariaDB
>>> > It would be useful to have tests for mariadb as well on the lines of
>>> > what is available for mysql in
>>> > https://issues.apache.org/jira/browse/HIVE-9800, to ensure that
>>> > mariadb support is not broken.
>>> >
>>> > Thanks,
>>> > Thejas
>>
>>
>

CVE-2015-7521: Apache Hive authorization bug disclosure (update)

2016-02-17 Thread Sushanth Sowmyan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

CVE-2015-7521: Apache Hive authorization bug disclosure

Severity: Important

Vendor: The Apache Software Foundation

Versions Affected:
Apache Hive 0.13.x
Apache Hive 0.14.x
Apache Hive 1.0.0 - 1.0.1
Apache Hive 1.1.0 - 1.1.1
Apache Hive 1.2.0 - 1.2.1

Description:

Some partition-level operations exist that do not explicitly also
authorize privileges of the parent table. This can lead to issues when
the parent table would have denied the operation, but no denial occurs
because the partition-level privilege is not checked by the
authorization framework, which defines authorization entities only
from the table level upwards.

This issue is known to affect Hive clusters protected by both Ranger
as well as SqlStdHiveAuthorization.

Mitigation:

For Hive 0.13.x, 0.14.x, 1.0, 1.1 and 1.2, a separate jar is being made 
available,
which users can put in their ${HIVE_HOME}/lib/, and this provides a hook for
administrators to add to their hive-site.xml, by setting
hive.semantic.analyzer.hook=org.apache.hadoop.hive.ql.parse.ParentTableAuthorizationHook
 .
This parameter is a comma-separated-list and this hook can be
appended to an existing list if one already exists in the setup. You
will then want to make sure that you protect the
hive.semantic.analyzer.hook parameter from being changed at runtime by
adding it to hive.conf.restricted.list.

This jar and associated source tarball are available for download
over at : https://hive.apache.org/downloads.html
along with their gpg-signed .asc signatures, as well as the md5sums
for verification in the hive-parent-auth-hook/ directory.

This issue has already been patched in all Hive branches that are
affected, and is fixed in the recently released Hive 2.0.0. Hive 2.0.0 and
any future release will not need these mitigation steps.

Credit: This issue was discovered by Olaf Flebbe of science+computing ag.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)

iQIVAwUBVsTAYh6tt4FFMLreAQJwiw/+JqSYNXefO6dAckvDke57Hv+TYqB36K06
pQt6JiRBQ1Ov084TkfrDESj9ftIIdxnL4MD8o2wmunSJSL6an6aFFR3uxMjmYDrW
6cTr1noxl3t1WQHVf0oE4aAKCjmYBp+6qtlymt4y//PKNxaVq+8bQ53jArMt78YA
UZHV3ET+9vxQM2uoseh1QbdonFMsNMVFY2SfDiZ9OKk8o5eQuF9XhjJWpNKyboYR
hxQhjCfZxkCcqA6ulG/lhpxjRvaqEN8JwePQfpNxEToTm6Y68PrQbR01ry+MENS2
Q2KQ9H8sr9LQMXM1U+pvf1NUDnEA5m6sWTC7JcLoz/4KP5aLy1yxSAoVKhDF5ewI
7d8ECRFsCtJo64yQzy1k7W6vdkg8wuciVKv86KVYaM926wFK0Lj9VFjxFO2G1AY5
nBDMxgEnGk0AiNb9qa8fnVSsiDTwrvfBglvQlmTawdCeBUBWFaNONvxP+9lohe04
NYZz3FKSUTFaqluijfw+2x+abP+0qbwy3JfnUgTdttXJ8R5Xxlf2vGmlj2mAJYI/
+hwfBgBkVeITQ5YK/wNaI2tr8FSFOitX4np/FtJA860ygGxi9C4P/Sl1Xj97cCJC
HSfZjIOsJ6j11W+DFmI85FE5Pqp042EHq8yqIPrlcKAlmrNT3mtXyrWqdBXjESxs
BXyP9rHZJxo=
=5PjL
-END PGP SIGNATURE-

CVE-2015-7521: Apache Hive authorization bug disclosure

2016-01-28 Thread Sushanth Sowmyan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

CVE-2015-7521: Apache Hive authorization bug disclosure

Severity: Important

Vendor: The Apache Software Foundation

Versions Affected:
Apache Hive 1.0.0 - 1.0.1
Apache Hive 1.1.0 - 1.1.1
Apache Hive 1.2.0 - 1.2.1

Description:

Some partition-level operations exist that do not explicitly also
authorize privileges of the parent table. This can lead to issues when
the parent table would have denied the operation, but no denial occurs
because the partition-level privilege is not checked by the
authorization framework, which defines authorization entities only
from the table level upwards.

This issue is known to affect Hive clusters protected by both Ranger
as well as SqlStdHiveAuthorization.

Mitigation:

For Hive 1.0, 1.1 and 1.2, a separate jar is being made available,
which users can put in their ${HIVE_HOME}/lib/, and this provides a hook for
administrators to add to their hive-site.xml, by setting
hive.semantic.analyzer.hook=org.apache.hadoop.hive.ql.parse.ParentTableAuthorizationHook
 .
This parameter is a comma-separated-list and this hook can be
appended to an existing list if one already exists in the setup. You
will then want to make sure that you protect the
hive.semantic.analyzer.hook parameter from being changed at runtime by
adding it to hive.conf.restricted.list.

This jar and associated source tarball are available for download
over at : https://hive.apache.org/downloads.html
along with their gpg-signed .asc signatures, as well as the md5sums
for verification in the hive-parent-auth-hook/ directory.

This issue has already been patched in all Hive branches that are
affected, and any future release will not need these mitigation steps.

Credit: This issue was discovered by Olaf Flebbe of science+computing ag.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)

iQIVAwUBVqpoih6tt4FFMLreAQKkbA//f+r+DDDKiYPbymTXjOhCUqDIDirtiT2A
OZHBn3LTNad3eQPZ6qrYadbw30iJpU+SCMtN+gO7F27TJRIdBfk/g9HjrG9i/uLb
q4a/uzHEGbFFnfz61gXERtvyqHP/7RzbUz/WNBvCGitJJL2AZ/j3oTvUxc4r3fbC
mVXSCtkY7fj28fbi9/jhj/go9Qr7aL0Tk/Tkb6RQ97YwoZVZTvPTFh7ALmX+f2Qh
0qPMg7phL9clTXR/cNGRA8LUFRbDuTahP5hptHmE2KgGQJK5fjKwvisoH6lvVKnh
iv5UFs9jjcvPd1MpuBDRfHj/RA0L8emkDzz2/36XKM0AFyEB5gHf4U7ZT7AVj1P/
xwdxgNJZcqgRfSabkNIbNhEcYLVx9H2btIIAgkdDnu2HaxzCBErjRcv6hj7bbG5N
5NHQDcnjzj86u2D7XiA4hXPLnQE6JNJyc7cLaU4xRV18QiN9KzpDJQpIot0GvXs1
7q2+I6H6AxDxeotSCmQnwEE5NCVxl3ivUCKA8tA0jxEzhm8QE/bTaeM00OwJ+7wl
ruDdGkfF3b854U4Fyzh14WCGy1b74wjc79iOt8tJfLEh9kdRNbA5Jb7QZYNpCJ4n
Eb5lxZv5MQFyBvbJCttz59jgzxCcmunkyNZamGRGugmR3Dwu9jOQRCk2s+4pouCf
20RJ9WEkoXY=
=Q0SZ
-END PGP SIGNATURE-

[jira] [Created] (HIVE-12937) DbNotificationListener unable to clean up old notification events

2016-01-26 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-12937:
---

 Summary: DbNotificationListener unable to clean up old 
notification events
 Key: HIVE-12937
 URL: https://issues.apache.org/jira/browse/HIVE-12937
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1, 1.3.0, 2.0.0, 2.1.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


There is a bug in ObjectStore, where we use pm.deletePersistent instead of 
pm.deletePersistentAll, which causes the persistenceManager to try and drop a 
org.datanucleus.store.rdbms.query.ForwardQueryResult instead of the appropriate 
associated org.apache.hadoop.hive.metastore.model.MNotificationLog.

This results in an error that looks like this:

{noformat}
Exception in thread "CleanerThread" 
org.datanucleus.api.jdo.exceptions.ClassNotPersistenceCapableException: The 
class "org.datanucleus.store.rdbms.query.ForwardQueryResult" is not 
persistable. This means that it either hasnt been enhanced, or that the 
enhanced version of the file is not in the CLASSPATH (or is hidden by an 
unenhanced version), or the Meta-Data/annotations for the class are not found.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:380)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.jdoDeletePersistent(JDOPersistenceManager.java:807)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.deletePersistent(JDOPersistenceManager.java:820)
at 
org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:7149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
at com.sun.proxy.$Proxy0.cleanNotificationEvents(Unknown Source)
at 
org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:277)
NestedThrowablesStackTrace:
The class "org.datanucleus.store.rdbms.query.ForwardQueryResult" is not 
persistable. This means that it either hasnt been enhanced, or that the 
enhanced version of the file is not in the CLASSPATH (or is hidden by an 
unenhanced version), or the Meta-Data/annotations for the class are not found.
org.datanucleus.exceptions.ClassNotPersistableException: The class 
"org.datanucleus.store.rdbms.query.ForwardQueryResult" is not persistable. This 
means that it either hasnt been enhanced, or that the enhanced version of the 
file is not in the CLASSPATH (or is hidden by an unenhanced version), or the 
Meta-Data/annotations for the class are not found.
at 
org.datanucleus.ExecutionContextImpl.assertClassPersistable(ExecutionContextImpl.java:5698)
at 
org.datanucleus.ExecutionContextImpl.deleteObjectInternal(ExecutionContextImpl.java:2495)
at 
org.datanucleus.ExecutionContextImpl.deleteObjectWork(ExecutionContextImpl.java:2466)
at 
org.datanucleus.ExecutionContextImpl.deleteObject(ExecutionContextImpl.java:2417)
at 
org.datanucleus.ExecutionContextThreadedImpl.deleteObject(ExecutionContextThreadedImpl.java:245)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.jdoDeletePersistent(JDOPersistenceManager.java:802)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.deletePersistent(JDOPersistenceManager.java:820)
at 
org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:7149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
at com.sun.proxy.$Proxy0.cleanNotificationEvents(Unknown Source)
at 
org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:277)
{noformat}

The end result of this bug is that users of DbNotificationListener will have an 
evergrowing number of notification events that are not cleaned up as they age. 
This is an easy enough fix, but also shows that we have a lack of code coverage 
here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()

2016-01-14 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-12875:
---

 Summary: Verify sem.getInputs() and sem.getOutputs()
 Key: HIVE-12875
 URL: https://issues.apache.org/jira/browse/HIVE-12875
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


For every partition entity object present in sem.getInputs() and 
sem.getOutputs(), we must ensure that the appropriate Table is also added to 
the list of entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12630) Import should create a new WriteEntity for the new table it's creating to mimic CREATETABLE behaviour

2015-12-09 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-12630:
---

 Summary: Import should create a new WriteEntity for the new table 
it's creating to mimic CREATETABLE behaviour
 Key: HIVE-12630
 URL: https://issues.apache.org/jira/browse/HIVE-12630
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Import/Export
Affects Versions: 1.2.0, 1.3.0, 2.0.0, 2.1.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


CREATE-TABLE creates a new WriteEntity for the new table being created, whereas 
IMPORT does not mimic that behaviour.

While SQLStandardAuth itself does not care about this difference, external 
Authorizers, as with Ranger can and do make a distinction on this, and can have 
policies set up on patterns for objects that do not yet exist. Thus, we must 
emit a WriteEntity for the yet-to-be-created table as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12345) Followup for HIVE-9013 : Hidden commands still visible through beeline

2015-11-05 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-12345:
---

 Summary: Followup for HIVE-9013 : Hidden commands still visible 
through beeline
 Key: HIVE-12345
 URL: https://issues.apache.org/jira/browse/HIVE-12345
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


HIVE-9013 introduced the ability to hide certain conf variables when output 
through the "set" command. However, there still exists one further bug in it 
that causes these variables to still be visible through beeline connecting to 
HS2, wherein HS2 exposes hidden variables such as the HS2's metastore password 
when "set" is run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-09 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-12083:
---

 Summary: HIVE-10965 introduces thrift error if partNames or 
colNames are empty
 Key: HIVE-12083
 URL: https://issues.apache.org/jira/browse/HIVE-12083
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
AggrStats object to be returned if partNames is empty or colNames is empty:

{code}
diff --git 
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
index 0a56bac..ed810d2 100644
--- metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
+++ metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
@@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
   public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
   List partNames, List colNames, boolean 
useDensityFunctionForNDVEstimation)
   throws MetaException {
+if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); // 
Nothing to aggregate.
 long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
colNames);
 List colStatsList;
 // Try to read from the cache first
{code}

This runs afoul of thrift requirements that AggrStats have required fields:

{code}
struct AggrStats {
1: required list colStats,
2: required i64 partsFound // number of partitions for which stats were found
}
{code}

Thus, we get errors as follows:

{noformat}
2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
(TThreadPoolServer.java:run(213)) - Thrift error occurred during processing of 
message.
org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
unset! Struct:AggrStats(colStats:null, partsFound:0)
at 
org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Normally, this would not occur since HIVE-10965 does also include a guard on 
the client-side for colNames.isEmpty() to not call the metastore call at all, 
but there is no guard for partNames being empty, and would still cause an error 
on the metastore side if the thrift call were called directly, as would happen 
if the client is from an odler version before this was patched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11936) Support SQLAnywhere as a backing DB for the hive metastore

2015-09-23 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-11936:
---

 Summary: Support SQLAnywhere as a backing DB for the hive metastore
 Key: HIVE-11936
 URL: https://issues.apache.org/jira/browse/HIVE-11936
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


I've had pings from people interested in enabling the metastore to work on top 
of SQLAnywhere (17+), and thus, opening this jira to track changes needed in 
hive to make SQLAnywhere work as a backing db for the metastore.

I have it working and passing all tests currently in my setup, and will upload 
patches as I'm able to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: unit tests in patches

2015-09-22 Thread Sushanth Sowmyan

+1 to Siddharth's suggestion - it makes it easier on people used to
dealing with other conventions.


On Tue, Sep 22, 2015 at 3:21 PM, Siddharth Seth  wrote:
> Can a  'Target Version' field be added to jiras ? That would help to get
> rid of the confusion caused by Fix Version being used to represent branches
> a jira does go into.
>
> On Mon, Sep 21, 2015 at 12:55 PM, Ashutosh Chauhan 
> wrote:
>
>> Hi everyone,
>>
>> Generally, its a good idea to add unit tests in patches especially when its
>> easy to repro (e.g,., NPE). This may not always be possible, but we should
>> aim to add tests wherever we can. In addition to regression testing, tests
>> also proves the existence of bug. I would especially like to call out the
>> attention of committers that they make sure patches they are committing has
>> a test case. In case its not possible to repro test, there should be an
>> explanation on jira.
>>
>> Related to this is affect versions and fix versions. Reporter should update
>> this field while creating jiras. There is some confusion around exactly
>> what a fix version is. Fix version indicates earliest version on which this
>> fix is available. So, it should be updated after patch is committed to
>> reflect which upcoming version it will be available on. Please don't use it
>> as 'target version' that is a version on which you would like to see it
>> fixed.
>>
>> Examples of commits where I didn't follow what I am preaching :) but plan
>> to improve on:
>>
>> https://issues.apache.org/jira/browse/HIVE-9377
>>
>> https://issues.apache.org/jira/browse/HIVE-9507
>>
>> https://issues.apache.org/jira/browse/HIVE-11285
>>
>> https://issues.apache.org/jira/browse/HIVE-9386
>>
>> https://issues.apache.org/jira/browse/HIVE-10808
>>

[jira] [Created] (HIVE-11852) numRows and rawDataSize table properties are not replicated

2015-09-16 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-11852:
---

 Summary: numRows and rawDataSize table properties are not 
replicated
 Key: HIVE-11852
 URL: https://issues.apache.org/jira/browse/HIVE-11852
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Affects Versions: 1.2.1
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


numRows and rawDataSize table properties are not replicated when exported for 
replication and re-imported.

{code}
Table drdbnonreplicatabletable.vanillatable has different TblProps from 
drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, 
totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}]
java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has 
different TblProps from drdbnonreplicatabletable.vanillatable expected 
[{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found 
[{numFiles=1, totalSize=560}]
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11697) Add Unit Test to test serializability/deserializability of HCatSplits

2015-08-31 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-11697:
---

 Summary: Add Unit Test to test serializability/deserializability 
of HCatSplits
 Key: HIVE-11697
 URL: https://issues.apache.org/jira/browse/HIVE-11697
 Project: Hive
  Issue Type: Test
Reporter: Sushanth Sowmyan


As HIVE-11344 found, we should have unit tests for this scenario, and we need 
to add one in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11585) Explicitly set pmf.setDetachAllOnCommit on metastore unless configured otherwise

2015-08-17 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-11585:
---

 Summary: Explicitly set pmf.setDetachAllOnCommit on metastore 
unless configured otherwise
 Key: HIVE-11585
 URL: https://issues.apache.org/jira/browse/HIVE-11585
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


datanucleus.detachAllOnCommit has a default value of false. However, we've 
observed a number of objects (especially FieldSchema objects) being retained  
that causes us OOM issues on the metastore. Hive should prefer using a default 
of datanucleus.detachAllOnCommit as true, unless otherwise explicitly 
overridden by users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [ANNOUNCE] New Hive Committer - Dmitry Tolpeko

2015-08-03 Thread Sushanth Sowmyan

Congrats, Dmitry!

On Mon, Aug 3, 2015 at 2:02 PM, Jimmy Xiang jxi...@cloudera.com wrote:
 Congrats!

 On Mon, Aug 3, 2015 at 1:57 PM, Prasanth Jayachandran 
 pjayachand...@hortonworks.com wrote:

 Congrats Dmitry!

  On Aug 3, 2015, at 1:54 PM, Sergey Shelukhin ser...@hortonworks.com
 wrote:
 
  Congrats!
 
  On 15/8/3, 12:57, Vaibhav Gumashta vgumas...@hortonworks.com wrote:
 
  Congrats Dmitry!
 
  ‹Vaibhav
 
  On 8/3/15, 12:31 PM, Lefty Leverenz leftylever...@gmail.com wrote:
 
  Congratulations Dmitry!
 
  -- Lefty
 
  On Mon, Aug 3, 2015 at 2:33 PM, Carl Steinbach c...@apache.org wrote:
 
  The Apache Hive PMC has voted to make Dmitry Tolpeko a committer on
 the
  Apache Hive Project.
 
  Please join me in congratulating Dmitry!
 
  Thanks.
 
  - Carl

[jira] [Created] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it

2015-07-22 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-11344:
---

 Summary: HIVE-9845 makes HCatSplit.write modify the split so that 
PartitionInfo objects are unusable after it
 Key: HIVE-11344
 URL: https://issues.apache.org/jira/browse/HIVE-11344
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


HIVE-9845 introduced a notion of compression for HCatSplits so that when 
serializing, it finds commonalities between PartInfo and TableInfo objects, and 
if the two are identical, it nulls out that field in PartInfo, thus making sure 
that when PartInfo is then serialized, info is not repeated.

This, however, has the side effect of making the PartInfo object unusable if 
HCatSplit.write has been called.

While this does not affect M/R directly, since they do not know about the 
PartInfo objects and once serialized, the HCatSplit object is recreated by 
deserializing on the backend, which does restore the split and its PartInfo 
objects, this does, however, affect framework users of HCat that try to mimic 
M/R and then use the PartInfo objects to instantiate distinct readers.

Thus, we need to make it so that PartInfo is still usable after HCatSplit.write 
is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [ANNOUNCE] New Hive PMC Member - Sushanth Sowmyan

2015-07-22 Thread Sushanth Sowmyan

Thanks, all! :)

Re: [ANNOUNCE] New Hive Committer - Pengcheng Xiong

2015-07-16 Thread Sushanth Sowmyan

Congrats, Pengcheng!
On Jul 16, 2015 11:17, Prasanth Jayachandran 
pjayachand...@hortonworks.com wrote:

 Congrats Pengcheng!

  On Jul 16, 2015, at 11:11 AM, Vikram Dixit K vikram.di...@gmail.com
 wrote:
 
  Congratulations Pengcheng!
 
  On Thu, Jul 16, 2015 at 10:10 AM, Hari Subramaniyan 
  hsubramani...@hortonworks.com wrote:
 
  Congrats Pengcheng!
  
  From: Chao Sun c...@cloudera.com
  Sent: Thursday, July 16, 2015 10:06 AM
  To: dev@hive.apache.org
  Subject: Re: [ANNOUNCE] New Hive Committer - Pengcheng Xiong
 
  Congrats Pengcheng!
 
  On Thu, Jul 16, 2015 at 10:03 AM, Szehon Ho sze...@cloudera.com
 wrote:
 
  Congrats!
 
  On Thu, Jul 16, 2015 at 6:47 AM, Vaibhav Gumashta 
  vgumas...@hortonworks.com
  wrote:
 
  Congrats Pengcheng!
 
  ‹Vaibhav
 
  On 7/16/15, 7:12 PM, Chaoyu Tang ctang...@gmail.com wrote:
 
  Congratulations to Pengcheng!
 
  On Thu, Jul 16, 2015 at 9:10 AM, Xuefu Zhang xzh...@cloudera.com
  wrote:
 
  Congratulations, Pengcheng!
 
  On Thu, Jul 16, 2015 at 4:50 AM, Carl Steinbach c...@apache.org
  wrote:
 
  The Apache Hive PMC has voted to make Pengcheng Xiong a committer
  on
  the
  Apache Hive Project.
 
  Please join me in congratulating Pengcheng!
 
  Thanks.
 
  - Carl
 
 
 
 
 
 
 
 
 
  --
  Nothing better than when appreciated for hard work.
  -Mark

[ANNOUNCE] Apache Hive 1.2.1 Released

2015-06-28 Thread Sushanth Sowmyan

The Apache Hive team is proud to announce the the release of Apache
Hive version 1.2.1.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides:

* Tools to enable easy data extract/transform/load (ETL)
* A mechanism to impose structure on a variety of data formats
* Access to files stored either directly in Apache HDFS (TM) or in
other data storage systems such as Apache HBase (TM)
* Query execution via Apache Hadoop MapReduce, Apache Tez  Apache
Spark frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 1.2.1 is an incremental release on top of Hive 1.2.0 and release
notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12332384styleName=TextprojectId=12310843


We would like to thank the many contributors who made this release possible.

Regards,
The Apache Hive Team

Re: [VOTE] Apache Hive 1.2.1 Release Candidate 0

2015-06-28 Thread Sushanth Sowmyan

I just sent out the announce mail for 1.2.1. I have now updated the
wiki page to reflect rules for further commits to branch-1.2:

a) The commit must not introduce any schema or interface changes.
b) The commit must fix a bug that causes an outage/breakage (such as
an NPE) in regular hive operation, or it must fix a data corruption
issue or it must be a security fix with an appropriate CVE.

If it meets those bars, you do not need to cc me or ask for my
permission, you may go ahead and commit to branch-1.2, and I will keep
watch on this branch. If it does not meet those bars, you are directed
to target branch-1 or master instead. The goal for an eventual 1.2.2
is to ensure that this branch stays live for breaking fixes, but not
so that we may keep landing new patches in released branches.

Thanks all!
-Sushanth

On Tue, Jun 23, 2015 at 11:45 AM, Sushanth Sowmyan khorg...@gmail.com wrote:
Thanks for testing and verifying, folks!

With 4 PMC votes and 105 hours( 72 hours ) now having passed, the
vote for releasing 1.2.1 RC0 as Hive 1.2.1 passes. I will go ahead and
publish artifacts for the 1.2.1 release and send out mail about
general availability.

With this release, please note that commits to branch-1.2 are now
restricted to a higher bar of necessity, and will require it to be
fixing a product outage(such as an NPE when you run a query). I will
update the wiki to that effect to indicate the process for further
commits to the branch. For most part, please restrict commits to
branch-1 and master from now on.

I am amenable to doing a 1.2.2 release eventually if we have enough
such issues, maybe about 3+ months out.

Thanks all!
-Sushanth

On Sun, Jun 21, 2015 at 6:13 PM, Vikram Dixit K vikram.di...@gmail.com
wrote:
+1 built on both profiles and ran a simple query on the rc.

Thanks
Vikram.

On Sat, Jun 20, 2015 at 7:47 AM, Thejas Nair thejas.n...@gmail.com wrote:
+1
Checked signatures, checksums
Checked release notes
Reviewed changes in pom files.
Built with hadoop2 and hadoop1. Ran some simple queries in local mode.

On Fri, Jun 19, 2015 at 5:00 PM, Gunther Hagleitner
ghagleit...@hortonworks.com wrote:

+1 Checked signatures, compiled, ran some tests.

Thanks,
Gunther.
--
*From:* Alan Gates alanfga...@gmail.com
*Sent:* Friday, June 19, 2015 11:44 AM
*To:* dev@hive.apache.org
*Subject:* Re: [VOTE] Apache Hive 1.2.1 Release Candidate 0

+1. Checked signatures, looked for binary files, compiled the code, and
ran a rat check.

Alan.

Sushanth Sowmyan khorg...@gmail.com
June 19, 2015 at 2:44
Hi Folks,

It's been a month since 1.2.0, and I promised to do a stabilization
1.2.1 release, and this is it. A large number of patches have been
applied since 1.2.0, and major known issues have been cleared/fixed. A
few jiras were deferred out to 1.3/2.0 as not being ready to commit
into 1.2.1 at this time. More details are available here :
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status

Apache Hive 1.2.1 Release Candidate 0 is available here:

https://people.apache.org/~khorgath/releases/1.2.1_RC0/artifacts/

My public key used for signing is as available from the hive
committers key list : http://www.apache.org/dist/hive/KEYS

Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-1040/

Source tag for RC0 is up on the apache git repo as tag
release-1.2.1-rc0 (Browseable view over at

https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=0f6ee99efc911cbc1566f9bbbc63a51600302703
)

Voting will conclude in 72 hours.

Hive PMC Members: Please test and vote.

Thanks,
-Sushanth

--
Nothing better than when appreciated for hard work.
-Mark

Re: [VOTE] Apache Hive 1.2.1 Release Candidate 0

2015-06-23 Thread Sushanth Sowmyan

Thanks for testing and verifying, folks!

I am amenable to doing a 1.2.2 release eventually if we have enough
such issues, maybe about 3+ months out.

Thanks all!
-Sushanth

On Sun, Jun 21, 2015 at 6:13 PM, Vikram Dixit K vikram.di...@gmail.com wrote:
+1 built on both profiles and ran a simple query on the rc.

Thanks
Vikram.

On Fri, Jun 19, 2015 at 5:00 PM, Gunther Hagleitner
ghagleit...@hortonworks.com wrote:

+1 Checked signatures, compiled, ran some tests.

Thanks,
Gunther.
--
*From:* Alan Gates alanfga...@gmail.com
*Sent:* Friday, June 19, 2015 11:44 AM
*To:* dev@hive.apache.org
*Subject:* Re: [VOTE] Apache Hive 1.2.1 Release Candidate 0

+1. Checked signatures, looked for binary files, compiled the code, and
ran a rat check.

Alan.

Sushanth Sowmyan khorg...@gmail.com
June 19, 2015 at 2:44
Hi Folks,

Apache Hive 1.2.1 Release Candidate 0 is available here:

https://people.apache.org/~khorgath/releases/1.2.1_RC0/artifacts/

My public key used for signing is as available from the hive
committers key list : http://www.apache.org/dist/hive/KEYS

Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-1040/

Source tag for RC0 is up on the apache git repo as tag
release-1.2.1-rc0 (Browseable view over at

https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=0f6ee99efc911cbc1566f9bbbc63a51600302703
)

Voting will conclude in 72 hours.

Hive PMC Members: Please test and vote.

Thanks,
-Sushanth

--
Nothing better than when appreciated for hard work.
-Mark

[jira] [Created] (HIVE-11059) hcatalog-server-extensions tests scope should depend on hive-exec

2015-06-19 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-11059:
---

 Summary: hcatalog-server-extensions tests scope should depend on 
hive-exec
 Key: HIVE-11059
 URL: https://issues.apache.org/jira/browse/HIVE-11059
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.1
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[VOTE] Apache Hive 1.2.1 Release Candidate 0

2015-06-19 Thread Sushanth Sowmyan

Hi Folks,

It's been a month since 1.2.0, and I promised to do a stabilization
1.2.1 release, and this is it. A large number of patches have been
applied since 1.2.0, and major known issues have been cleared/fixed. A
few jiras were deferred out to 1.3/2.0 as not being ready to commit
into 1.2.1 at this time. More details are available here :
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status

Apache Hive 1.2.1 Release Candidate 0 is available here:

https://people.apache.org/~khorgath/releases/1.2.1_RC0/artifacts/

My public key used for signing is as available from the hive
committers key list : http://www.apache.org/dist/hive/KEYS

Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-1040/

Source tag for RC0 is up on the apache git repo as tag
release-1.2.1-rc0 (Browseable view over at
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=0f6ee99efc911cbc1566f9bbbc63a51600302703
)

Voting will conclude in 72 hours.

Hive PMC Members: Please test and vote.

Thanks,
-Sushanth

Re: Getting ready for 1.2.1

2015-06-18 Thread Sushanth Sowmyan

Hi All,

Please consider branch-1.2 frozen for now, till after the release process.

Thanks,
-Sushanth

On Tue, Jun 16, 2015 at 12:28 AM, Sushanth Sowmyan khorg...@gmail.com wrote:
 Hi folks,

 It's been nearly a month since 1.2.0, and when I did that release, I said
 I'd keep the branch open for any further non-db-changing, non-breaking
 patches, and from the sheer number of patches registered on the status page,
 that's been a good idea.

 Now, I think it's time to start drawing that to a close to see an
 stabilization update, and I would like to begin the process of rolling out
 release candidates for 1.2.1. I would like to start rolling out an RC0 by
 Wednesday night if no one objects.

 For now, the rules on committing to branch-1.2 remain the same:
 a) commit to branch-1  master first
 b) add me as a watcher on that jira
 c) add the bug to the release status wiki.

 Once I start the release process, I will once again increase the bar for
 commits as we did the last time. That said, this time, once we finish the
 release for 1.2.1, the bar on further commits to branch-1.2 is intended to
 remain at a higher level, so as to make sure we don't have too much of a
 back porting hassle - we will soon try to limit our commits to branch-1 and
 master only.

 Cheers,
 -Sushanth

[jira] [Created] (HIVE-11047) Update versions of branch-1.2 to 1.2.1

2015-06-18 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-11047:
---

 Summary: Update versions of branch-1.2 to 1.2.1
 Key: HIVE-11047
 URL: https://issues.apache.org/jira/browse/HIVE-11047
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Getting ready for 1.2.1

2015-06-18 Thread Sushanth Sowmyan

Hi All,

I've gotten a couple of requests from a couple of folks about some
important patches that should be part of 1.2.1, if we do have
additional RCs, to keep the branch unfrozen, so I'm unfreezing it
again for the time being. The same rules hold as before - to make any
commits to branch-1.2, it should have already been committed to master
and branch-1, and must be added to the wiki over at
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status
. If you do not have permission to edit the wiki, please ping me, and
I'll add it for you.


On Thu, Jun 18, 2015 at 10:55 AM, Sushanth Sowmyan khorg...@gmail.com wrote:
 Hi All,

 Please consider branch-1.2 frozen for now, till after the release process.

 Thanks,
 -Sushanth

 On Tue, Jun 16, 2015 at 12:28 AM, Sushanth Sowmyan khorg...@gmail.com wrote:
 Hi folks,

 It's been nearly a month since 1.2.0, and when I did that release, I said
 I'd keep the branch open for any further non-db-changing, non-breaking
 patches, and from the sheer number of patches registered on the status page,
 that's been a good idea.

 Now, I think it's time to start drawing that to a close to see an
 stabilization update, and I would like to begin the process of rolling out
 release candidates for 1.2.1. I would like to start rolling out an RC0 by
 Wednesday night if no one objects.

 For now, the rules on committing to branch-1.2 remain the same:
 a) commit to branch-1  master first
 b) add me as a watcher on that jira
 c) add the bug to the release status wiki.

 Once I start the release process, I will once again increase the bar for
 commits as we did the last time. That said, this time, once we finish the
 release for 1.2.1, the bar on further commits to branch-1.2 is intended to
 remain at a higher level, so as to make sure we don't have too much of a
 back porting hassle - we will soon try to limit our commits to branch-1 and
 master only.

 Cheers,
 -Sushanth

[jira] [Created] (HIVE-11039) Write a tool to allow people with datanucelus.identifierFactory=datanucleus2 to migrate their metastore to datanucleus1 naming

2015-06-17 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-11039:
---

 Summary: Write a tool to allow people with 
datanucelus.identifierFactory=datanucleus2 to migrate their metastore to 
datanucleus1 naming
 Key: HIVE-11039
 URL: https://issues.apache.org/jira/browse/HIVE-11039
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical


We hit an interesting bug in a case where datanucleus.identifierFactory = 
datanucleus2 .

The problem is that directSql handgenerates SQL strings assuming datanucleus1 
naming scheme. If a user has their metastore JDO managed by 
datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are 
incorrect.

One simple example of what this results in is the following: whenever DN 
persists a field which is held as a ListT, it winds up storing each T as a 
separate line in the appropriate mapping table, and has a column called 
INTEGER_IDX, which holds the position in the list. Then, upon reading, it 
automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which 
results in the list retaining its order. In DN2 naming scheme, the column is 
called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool 
upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX 
and IDX.

Whenever they use JDO, such as with all writes, it will then use the IDX field, 
and when they do any sort of optimized reads, such as through directSQL, it 
will ORDER BY INTEGER_IDX.

An immediate danger is seen when we consider that the schema of a table is 
stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX 
will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch 
schema for the table can come up mixed up in the table's native hashing order, 
rather than sorted by the index.

This can then result in schema ordering being different from the actual table. 
For eg:, if a user has a (a:int,b:string,c:string), a describe on this may 
return (c:string, a:int, b: string), and thus, queries which are inserting 
after selecting from another table can have ClassCastExceptions when trying to 
insert data in the wong order - this is how we discovered this bug. This 
problem, however, can be far worse, if there are no type problems - it is 
possible, for eg., that if a,bc were all strings, that that insert query would 
succeed but mix up the order, which then results in user table data being mixed 
up. This has the potential to be very bad.

We should write a tool to help convert metastores that use datanucleus2 to 
datanucleus1(more difficult, needs more one-time testing) or change directSql 
to support both(easier to code, but increases test-coverage matrix 
significantly and we should really then be testing against both schemes). But 
in the short term, we should disable directSql if we see that the 
identifierfactory is datanucleus2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2

2015-06-16 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-11023:
---

 Summary: Disable directSQL if datanucleus.identifierFactory = 
datanucleus2
 Key: HIVE-11023
 URL: https://issues.apache.org/jira/browse/HIVE-11023
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We hit an interesting bug in a case where datanucleus.identifierFactory = 
datanucleus2 .

The problem is that directSql handgenerates SQL strings assuming datanucleus1 
naming scheme. If a user has their metastore JDO managed by 
datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are 
incorrect.

One simple example of what this results in is the following: whenever DN 
persists a field which is held as a ListT, it winds up storing each T as a 
separate line in the appropriate mapping table, and has a column called 
INTEGER_IDX, which holds the position in the list. Then, upon reading, it 
automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which 
results in the list retaining its order. In DN2 naming scheme, the column is 
called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool 
upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX 
and IDX.

Whenever they use JDO, such as with all writes, it will then use the IDX field, 
and when they do any sort of optimized reads, such as through directSQL, it 
will ORDER BY INTEGER_IDX.

An immediate danger is seen when we consider that the schema of a table is 
stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX 
will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch 
schema for the table can come up mixed up in the table's native hashing order, 
rather than sorted by the index.

This can then result in schema ordering being different from the actual table. 
For eg:, if a user has a (a:int,b:string,c:string), a describe on this may 
return (c:string, a:int, b: string), and thus, queries which are inserting 
after selecting from another table can have ClassCastExceptions when trying to 
insert data in the wong order - this is how we discovered this bug. This 
problem, however, can be far worse, if there are no type problems - it is 
possible, for eg., that if a,bc were all strings, that that insert query would 
succeed but mix up the order, which then results in user table data being mixed 
up. This has the potential to be very bad.

We should write a tool to help convert metastores that use datanucleus2 to 
datanucleus1(more difficult, needs more one-time testing) or change directSql 
to support both(easier to code, but increases test-coverage matrix 
significantly and we should really then be testing against both schemes). But 
in the short term, we should disable directSql if we see that the 
identifierfactory is datanucleus2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Getting ready for 1.2.1

2015-06-16 Thread Sushanth Sowmyan

Hi folks,

It's been nearly a month since 1.2.0, and when I did that release, I said
I'd keep the branch open for any further non-db-changing, non-breaking
patches, and from the sheer number of patches registered on the status
page, that's been a good idea.

Now, I think it's time to start drawing that to a close to see an
stabilization update, and I would like to begin the process of rolling out
release candidates for 1.2.1. I would like to start rolling out an RC0 by
Wednesday night if no one objects.

For now, the rules on committing to branch-1.2 remain the same:
a) commit to branch-1  master first
b) add me as a watcher on that jira
c) add the bug to the release status wiki.

Once I start the release process, I will once again increase the bar for
commits as we did the last time. That said, this time, once we finish the
release for 1.2.1, the bar on further commits to branch-1.2 is intended to
remain at a higher level, so as to make sure we don't have too much of a
back porting hassle - we will soon try to limit our commits to branch-1 and
master only.

Cheers,
-Sushanth

[jira] [Created] (HIVE-10892) TestHCatClient should not accept external metastore param from -Dhive.metastore.uris

2015-06-02 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-10892:
---

 Summary: TestHCatClient should not accept external metastore param 
from -Dhive.metastore.uris
 Key: HIVE-10892
 URL: https://issues.apache.org/jira/browse/HIVE-10892
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


HIVE-10074 added the ability to specify a -Dhive.metastore.uris from the 
commandline, so as to run the test against a deployed metastore.

However, because of the way HiveConf is written, this results in that parameter 
always overriding any value specified in the conf passed into it for 
instantiation, since it accepts System Var Overrides. This results in some 
tests, notably those that attempt to connect between two metastores (such as 
TestHCatClient#testPartitionRegistrationWithCustomSchema to fail.

Fixing this in HiveConf is not a good idea, since that behaviour is desired for 
HiveConf. Fixing this in HCatUtil.getHiveConf doesn't really work either, since 
that is a utility wrapper on HiveConf, and is supposed to behave similarly. 
Thus, the fix for this then becomes something to use in all our testcases, 
where we instantiate Configuration objects. It seems more appropriate to change 
the parameter we use to specify test parameters then, than to change each 
config object.

Thus, we should change semantics for running this test against an external 
metastore by specifying the override in a different parameter name, say 
test.hive.metastore.uris, instead of hive.metastore.uris, which has a specific 
meaning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [ANNOUNCE] New Hive Committer - Chaoyu Tang

2015-05-20 Thread Sushanth Sowmyan

Congrats Chaoyu, welcome aboard! :)
On May 20, 2015 3:45 PM, Vaibhav Gumashta vgumas...@hortonworks.com
wrote:

 Congratulations!

 ‹Vaibhav

 On 5/20/15, 3:40 PM, Jimmy Xiang jxi...@cloudera.com wrote:

 Congrats!!
 
 On Wed, May 20, 2015 at 3:29 PM, Carl Steinbach c...@apache.org wrote:
 
  The Apache Hive PMC has voted to make Chaoyu Tang a committer on the
 Apache
  Hive Project.
 
  Please join me in congratulating Chaoyu!
 
  Thanks.
 
  - Carl

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1498 matches

Mail list logo