[jira] [Created] (HIVE-17700) Update committer list
Sushanth Sowmyan created HIVE-17700: --- Summary: Update committer list Key: HIVE-17700 URL: https://issues.apache.org/jira/browse/HIVE-17700 Project: Hive Issue Type: Bug Reporter: Aihua Xu Assignee: Aihua Xu Priority: Minor Please update committer list: Name: Aihua Xu Apache ID: aihuaxu Organization: Cloudera Name: Yongzhi Chen Apache ID: ychena Organization: Cloudera -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: [Announce] New committer: Anishek Agarwal
Welcome aboard! :) On Sep 30, 2017 3:27 AM, "Barna Zsombor Klara"wrote: > Congratulations Anishek! > > Rajesh Balamohan (időpont: 2017. szept. 30., Szo, > 2:25) ezt írta: > > > Congrats Anishek!! > > > > ~Rajesh.B > > > > On Sat, Sep 30, 2017 at 4:30 AM, Vaibhav Gumashta < > > vgumas...@hortonworks.com > > > wrote: > > > > > Congratulations Anishek! > > > > > > > > > On 9/29/17, 3:57 PM, "Thejas Nair" wrote: > > > > > > >Congrats Anishek! > > > > > > > >On Fri, Sep 29, 2017 at 11:36 AM, Peter Vary > > wrote: > > > > > > > >> Congratulations Anishek! > > > >> > > > >> > On Sep 29, 2017, at 7:55 PM, Ashutosh Chauhan < > hashut...@apache.org > > > > > > >> wrote: > > > >> > > > > >> > The Project Management Committee (PMC) for Apache Hive has invited > > > >> Anishek > > > >> > Agarwal to become a committer and we are pleased to announce that > he > > > >>has > > > >> > accepted. > > > >> > > > > >> > Welcome, Anishek! > > > >> > > > > >> > Thanks, > > > >> > Ashutosh > > > >> > > > >> > > > > > > > > >
Re: [Announce] New committer: Sankar Hariappan
Welcome aboard! :) On Sep 30, 2017 3:28 AM, "Barna Zsombor Klara"wrote: Congrats Sankar! Rajesh Balamohan (időpont: 2017. szept. 30., Szo, 2:24) ezt írta: > Congrats Sankar!! > > ~Rajesh.B > > On Sat, Sep 30, 2017 at 4:30 AM, Vaibhav Gumashta < > vgumas...@hortonworks.com > > wrote: > > > Congratulations Sankar! > > > > On 9/29/17, 3:58 PM, "Thejas Nair" wrote: > > > > >Congrats Sankar! > > > > > >On Fri, Sep 29, 2017 at 11:36 AM, Peter Vary > wrote: > > > > > >> Congratulations Sankar! > > >> > > >> > On Sep 29, 2017, at 7:56 PM, Ashutosh Chauhan > > > >> wrote: > > >> > > > >> > The Project Management Committee (PMC) for Apache Hive has invited > > >>Sankar > > >> > Harriapan to become a committer and we are pleased to announce that > he > > >> has > > >> > accepted. > > >> > > > >> > Welcome, Sankar! > > >> > > > >> > Thanks, > > >> > Ashutosh > > >> > > >> > > > > >
[jira] [Created] (HIVE-17095) Long chain repl loads do not complete in a timely fashion
Sushanth Sowmyan created HIVE-17095: --- Summary: Long chain repl loads do not complete in a timely fashion Key: HIVE-17095 URL: https://issues.apache.org/jira/browse/HIVE-17095 Project: Hive Issue Type: Bug Components: Query Planning, repl Reporter: sapin amin Assignee: Sushanth Sowmyan Per performance testing done by [~sapinamin] (thus, I'm setting him as reporter), we were able to discover an important bug affecting replication. It has the potential to affect other large DAGs of Tasks that hive generates as well, if those DAGs have multiple paths to child Task nodes. Basically, we find that incremental REPL LOAD does not finish in a timely fashion. The test, in this case was to add 400 partitions, and replicate them. Associated with each partition, there was an ADD PTN and a ALTER PTN. For each of the ADD PTN tasks, we'd generate a DDLTask, a CopyTask and a MoveTask. For each Alter ptn, there'd be a single DDLTask. And order of execution is important, so it would chain in dependency collection tasks between phases. Trying to root cause this shows us that it seems to stall forever at the Driver instantiation time, and it almost looks like the thread doesn't proceed past that point. Looking at logs, it seems that the way this is written, it looks for all tasks generated that are subtrees of all nodes, without looking for duplicates, and this is done simply to get the number of execution tasks! And thus, the task visitor will visit every subtree of every node, which is fine if you have graphs that look like open trees, but is horrible for us, since we have dependency collection tasks between each phase. Effectively, this is what's happening: We have a DAG, say, like this: 4 tasks in parallel -> DEP col -> 4 tasks in parallel -> DEP col -> ... This means that for each of the 4 root tasks, we will do a full traversal of every graph (not just every node) past the DEP col, and this happens recursively, and this leads to an exponential growth of number of tasks visited as the length and breadth of the graph increase. In our case, we had about 800 tasks in the graph, with roughly a width of about 2-3, with 200 stages, a dep collection before and after, and this meant that leaf nodes of this DAG would have something like 2^200 - 3^200 ways in which they can be visited, and thus, we'd visit them in all those ways. And all this simply to count the number of tasks to schedule - we would revisit this function multiple more times, once per each hook, once for the MapReduceCompiler and once for the TaskCompiler. We have not been sending such large DAGs to the Driver, thus it has not yet been a problem, and there are upcoming changes to reduce the number of tasks replication generates(as part of a memory addressing issue), but we still should fix the way we do Task traversal so that a large DAG cannot cripple us. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: [DISCUSS] Separating out the metastore as its own TLP
+1 On Jun 30, 2017 17:05, "Owen O'Malley"wrote: > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun wrote: > > > and maybe a different project name? > > > > Yes, it certainly needs a new name. I'd like to suggest Riven. > > .. Owen >
[jira] [Created] (HIVE-17005) Ensure REPL DUMP and REPL LOAD are authorized properly
Sushanth Sowmyan created HIVE-17005: --- Summary: Ensure REPL DUMP and REPL LOAD are authorized properly Key: HIVE-17005 URL: https://issues.apache.org/jira/browse/HIVE-17005 Project: Hive Issue Type: Sub-task Components: repl Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Currently, we piggyback REPL DUMP and REPL LOAD on EXPORT and IMPORT auth privileges. However, work is on to not populate all the relevant objects in inputObjs and outputObjs, which then requires that REPL DUMP and REPL LOAD be authorized at a higher level, and simply require ADMIN_PRIV to run, -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16918) Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp
Sushanth Sowmyan created HIVE-16918: --- Summary: Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp Key: HIVE-16918 URL: https://issues.apache.org/jira/browse/HIVE-16918 Project: Hive Issue Type: Bug Components: repl Affects Versions: 3.0.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan With HIVE-16686, we switched ReplCopyTask to always use a privileged DistCp. This, however, is incorrect for copying _metadata generated from a temporary scratch directory to hdfs. We need to change that so that routes to using a regular CopyTask. Also, in the spirit of following up HIVE-16686, we missed adding "-pb" as a default for invocations of distcp from hive. Adding that in. This would not be necessary if HADOOP-8143 had made it in, but till it doesn't go in, we need it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16860) HostUtil.getTaskLogUrl change between hadoop 2.3 and 2.4 breaks at runtime.
Sushanth Sowmyan created HIVE-16860: --- Summary: HostUtil.getTaskLogUrl change between hadoop 2.3 and 2.4 breaks at runtime. Key: HIVE-16860 URL: https://issues.apache.org/jira/browse/HIVE-16860 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0, 0.14.0 Reporter: Chris Drome Assignee: Jason Dere Fix For: 0.14.0 The signature for HostUtil.getTaskLogUrl has changed between Hadoop-2.3 and Hadoop-2.4. Code in shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java works with Hadoop-2.3 method and causes compilation failure with Hadoop-2.4. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16686) repli invocations of distcp needs additional handling
Sushanth Sowmyan created HIVE-16686: --- Summary: repli invocations of distcp needs additional handling Key: HIVE-16686 URL: https://issues.apache.org/jira/browse/HIVE-16686 Project: Hive Issue Type: Sub-task Components: repl Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan When REPL LOAD invokes distcp, there needs to be a way for the user invoking REPL LOAD to pass on arguments to distcp. In addition, there is sometimes a need for distcp to be invoked from within an impersonated context, such as running as user "hdfs", asking distcp to preserve ownerships of individual files. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16642) New Events created as part of replv2 potentially break replv1
Sushanth Sowmyan created HIVE-16642: --- Summary: New Events created as part of replv2 potentially break replv1 Key: HIVE-16642 URL: https://issues.apache.org/jira/browse/HIVE-16642 Project: Hive Issue Type: Sub-task Components: repl Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan We have a couple of new events introduced, such as {CREATE,DROP}{INDEX,FUNCTION} since the introduction of replv1, but those which do not have a replv1 ReplicationTask associated with them. Thus, for users like Falcon, we potentially wind up throwing a IllegalStateException if replv1 based HiveDR is running on a cluster with these updated events. Thus, we should be more graceful when encountering them, returning a NoopReplicationTask equivalent that they can make use of, or ignore, for such newer events. In addition, we should add additional test cases so that we track whether or not the creation of these events leads to any backward incompatibility we introduce. To this end, if any of the events should change so that we introduce a backward incompatibility, we should have these tests fail, and alert us to that possibility. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: pre-commit jenkins issues
Thanks! It looks like it's chugging away now. :) On May 5, 2017 08:22, "Sergio Pena" <sergio.p...@cloudera.com> wrote: > I restarted hiveptest and seems is working now. There was a hiccup on the > server while using the libraries to create the slave nodes. > > On Fri, May 5, 2017 at 12:05 AM, Sushanth Sowmyan <khorg...@gmail.com> > wrote: > > > Hi, > > > > It looks like the precommit queue is currently having issues : > > https://builds.apache.org/job/PreCommit-HIVE-Build/ > > > > See builds# 5041,5042,5043 - It looks like it takes about 8 hours > > waiting for the tests to finish running and to report back, and kills > > it as it exceeds a 500minute time out, and returns without results. Is > > anyone able to look into this to see what is going on? > > > > Thanks! > > -Sush > > >
pre-commit jenkins issues
Hi, It looks like the precommit queue is currently having issues : https://builds.apache.org/job/PreCommit-HIVE-Build/ See builds# 5041,5042,5043 - It looks like it takes about 8 hours waiting for the tests to finish running and to report back, and kills it as it exceeds a 500minute time out, and returns without results. Is anyone able to look into this to see what is going on? Thanks! -Sush
Re: [VOTE] Apache Hive 1.2.2 Release Candidate 0
+1 (binding) Verified md5 and asc KEYS obtained from hive match (from https://people.apache.org/keys/group/hive.asc) , and is publically searchable and signed. RAT test succeeds Source and binary tarballs look good Compiling works, some base unit tests succeed. Testing local mode works. On Wed, Apr 5, 2017 at 11:16 PM, Thejas Nairwrote: > +1 (binding) > - Verified signature and checksum > - Build from source > - Ran simple queries in local mode with binary tar.gz > - Checked RELEASE_NOTES file. Traditionally this file has had the set of > patches fixed in previous releases as well ( ie, each new release was > adding entries to the top of the file). This time it has only the new patch > release patches. The old approach helps to quickly verify if patch is in > the release. I think it would be good to fix that in branch. I think it is > OK for this release. > - README.txt has old 1.2.1 version number in it. IMO, we should just remove > the mention of version in that file. Not a release blocker. > > > > > On Wed, Apr 5, 2017 at 3:52 PM, Sergio Pena > wrote: > >> +1 (no-binding) >> >> I unpacked the bin and src packages. >> Verified gpg and md5 signatures. >> Check license and release notes files. >> Run a few queries from hive-cli. >> >> - Sergio >> >> On Tue, Apr 4, 2017 at 11:12 AM, Ashutosh Chauhan >> wrote: >> >> > Verified md5 of src and binary tar balls. >> > Built from src. >> > Ran some simple queries like join, group by. >> > All looks good. >> > >> > +1 >> > >> > Thanks, >> > Ashutosh >> > >> > On Mon, Apr 3, 2017 at 4:47 PM, Vaibhav Gumashta < >> > vgumas...@hortonworks.com> >> > wrote: >> > >> > > Thanks for pointing out Ashutosh. Link to my PGP key: >> > > http://pgp.mit.edu/pks/lookup?search=gumashta=index. >> > > >> > > I think it will take a day or so for the KEYS file to be updated (it is >> > > auto generated), but if you want to test the release in the meantime, >> > > please use the above link to access the signing key. >> > > >> > > Thanks, >> > > ‹Vaibhav >> > > >> > > On 4/3/17, 2:53 PM, "Ashutosh Chauhan" wrote: >> > > >> > > >Hi Vaibhav, >> > > > >> > > >Can't locate your key at any of standard location. Can you point out >> > which >> > > >key you used to sign the release? >> > > > >> > > >Thanks, >> > > >Ashutosh >> > > > >> > > >On Mon, Apr 3, 2017 at 12:51 AM, Vaibhav Gumashta >> > > > > > > >> wrote: >> > > >> Hi everyone, >> > > >> >> > > >> Apache Hive 1.2.2 Release Candidate 0 is available here: >> > > >> >> > > >> https://dist.apache.org/repos/dist/dev/hive/apache-hive-1.2.2-rc0/ >> > > >> >> > > >> Maven artifacts are available here: >> > > >> >> > > >> https://repository.apache.org/content/repositories/ >> > orgapachehive-1072/ >> > > >> >> > > >> Source tag for RC0 is at: >> > > >> https://github.com/apache/hive/releases/tag/release-1.2.2-rc0 >> > > >> >> > > >> Voting will conclude in 72 hours. >> > > >> >> > > >> Hive PMC Members: Please test and vote. >> > > >> >> > > >> Thanks, >> > > >> -Vaibhav >> > > >> >> > > >> >> > > >> > > >> > >>
Re: [ANNOUNCE] New committer: Zoltan Haindrich
Congrats, Zoltan! Welcome aboard. :) On Feb 21, 2017 15:42, "Rajesh Balamohan"wrote: > Congrats Zoltan. :) > > ~Rajesh.B > > On Wed, Feb 22, 2017 at 4:43 AM, Wei Zheng wrote: > > > Congrats Zoltan! > > > > Thanks, > > Wei > > > > On 2/21/17, 13:09, "Alan Gates" wrote: > > > > On behalf of the Hive PMC I am happy to announce Zoltan Haindrich is > > our newest committer. He has been contributing to Hive for several > months > > across a number of areas, including the parser, HiveServer2, and cleaning > > up unit tests and documentation. Please join me in welcoming Zoltan to > > Hive. > > > > Zoltan, feel free to say a few words introducing yourself if you > would > > like to. > > > > Alan. > > > > > > >
[jira] [Created] (HIVE-15668) change REPL DUMP syntax to use "LIMIT" instead of "BATCH" keyword
Sushanth Sowmyan created HIVE-15668: --- Summary: change REPL DUMP syntax to use "LIMIT" instead of "BATCH" keyword Key: HIVE-15668 URL: https://issues.apache.org/jira/browse/HIVE-15668 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Currently, REPL DUMP syntax goes: {noformat} REPL DUMP [[.]] [FROM [BATCH ]] {noformat} The BATCH directive says that when doing an event dump, to not dump out more than _batchSize_ number of events. However, there is a clearer keyword for the same effect, and that is LIMIT. Thus, rephrasing the syntax as follows makes it clearer: {noformat} REPL DUMP [[.]] [FROM [LIMIT ]] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15652) Optimize(reduce) the number of alter calls made to fix repl.last.id
Sushanth Sowmyan created HIVE-15652: --- Summary: Optimize(reduce) the number of alter calls made to fix repl.last.id Key: HIVE-15652 URL: https://issues.apache.org/jira/browse/HIVE-15652 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Per code review from HIVE-15534, we might be doing alters to parent objects to set repl.last.id when it is not necessary, since some future event might make this alter redundant. There are 3 cases where this might happen: a) After a CREATE_TABLE event - any prior reference to that table does not need an ALTER, since CREATE_TABLE will have a repl.last.id come with it. b) After a DROP_TABLE event - any prior reference to that table is irrelevant, and thus, no alter is needed. c) After an ALTER_TABLE event, since that dump will itself do a metadata update that will get the latest repl.last.id along with this event. In each of these cases, we can remove the alter call needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 55392: HIVE-15469: Fix REPL DUMP/LOAD DROP_PTN so it works on non-string-ptn-key tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55392/#review161290 --- itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java (line 550) <https://reviews.apache.org/r/55392/#comment232522> This has minor clashes with issues.apache.org/jira/browse/HIVE-15365 , and easier to fix here after that goes in rather than there. Instead of this code segment, we can use the following: ```java DropPartitionMessage dropPtnMsg = md.getDropPartitionMessage(event.getMessage()); Table tableObj = dropPtnMsg.getTableObj(); // .. and the asserts can remain as-is. ``` Note that the first line is likely spurious as well if HIVE-15365 goes in, since it will create the dropPtnMsg here, so the only line needing changing is the line instantiating tableObj. I can regenerate this patch post-HIVE-15365, not a problem. itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java (line 345) <https://reviews.apache.org/r/55392/#comment232523> One more post-HIVE-15365 comment. :) run(..) followed by verifyResults(..) is being replaced by two methods: verifyRun(.. , ..) or verifySetup(.. , ..) verifySetup is called in cases where you're still setting up the test, and verifying that your setup happened correctly. In this case, for instance, the run followed by verifyResults would be replaced by verifySetup instead. verifyRun is called when running some command that we're interested in testing where the results showcase the functionality we're testing. The idea is that in steady state, after we finish our initial development, we flip a switch, and all verifySetups don't do the additional verification step, whereas verifyRun still would. itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java (line 372) <https://reviews.apache.org/r/55392/#comment232524> still verifySetup case, as per prior comment. itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java (line 385) <https://reviews.apache.org/r/55392/#comment232525> still verifySetup, since we're testing that the source dropped the data correctly. itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java (line 415) <https://reviews.apache.org/r/55392/#comment232526> This is now a verifyRun, finally. :) - Sushanth Sowmyan On Jan. 10, 2017, 9:29 p.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55392/ > --- > > (Updated Jan. 10, 2017, 9:29 p.m.) > > > Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair. > > > Bugs: HIVE-15469 > https://issues.apache.org/jira/browse/HIVE-15469 > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-15469 > > > Diffs > - > > > itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java > 4eabb24 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java > 6b86080 > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/DropPartitionMessage.java > 26aecb3 > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONDropPartitionMessage.java > b8ea224 > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java > 2749371 > > ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java > 85f8c64 > > Diff: https://reviews.apache.org/r/55392/diff/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
Re: Review Request 55154: HIVE-15366: REPL LOAD & DUMP support for incremental INSERT events
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55154/#review160929 --- Fix it, then Ship it! Looks good to me. I have one potential issue marked, but that can be solved in a future patch. Thanks, Vaibhav! ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 2352) <https://reviews.apache.org/r/55154/#comment232164> We probably don't have a problem here, in that all entries in the list of newFiles are all probably in the same filesystem, but if ever that changes, we can have off-by-one issues here wherein we cannot line up the file to its checksum, if some files have checksums and others in the middle don't. Would it make sense to put in a "" or something like that to indicate that there was no checksum for this file? Note - this is not a blocker issue, and the patch can continue as-is. I mention more because this is something that might change in the future. - Sushanth Sowmyan On Jan. 6, 2017, 6:43 a.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55154/ > --- > > (Updated Jan. 6, 2017, 6:43 a.m.) > > > Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair. > > > Bugs: HIVE-15366 > https://issues.apache.org/jira/browse/HIVE-15366 > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-15366 > > > Diffs > - > > > itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java > 39356ae > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java > e29aa22 > metastore/if/hive_metastore.thrift 79592ea > metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 1311b20 > > metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/InsertEventRequestData.java > 39a607d > metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb ebed504 > metastore/src/java/org/apache/hadoop/hive/metastore/events/InsertEvent.java > d9a42a7 > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java > fe747df > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/MessageFactory.java > fdb8e80 > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java > bd9f9ec > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java > 9954902 > ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java 4c0f817 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java be5a6a9 > ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 6e9602f > ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java > f61274b > ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java > 5561e06 > > ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java > 9b83407 > > Diff: https://reviews.apache.org/r/55154/diff/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
Re: Review Request 55154: HIVE-15366: REPL LOAD & DUMP support for incremental INSERT events
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55154/#review160533 --- metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java (line 53) <https://reviews.apache.org/r/55154/#comment231682> I'm not convinced that this is a good method to add, since this is repl-specific, and adds complexity. Any presence of checksum must be encoded into the uris, so that when we call getFiles(), it contains it. Also, the files have no explicit meaning without the checksum, since they will not be stable uris. The getFiles() returned by InsertMessage should already be a CM uri that encodes the checksum, for eg: cm://hdfs%3A%2F%2Fblah$2Ffile1#abcdef1234567890 might imply the file hdfs://blah/file1 with checksum "abcdef1234567890". I'm not super pick on the actual encoding mechanism used, but we want the getFiles() results to be uris that are stable uris - ones which, even if we don't have a FileSystem object associated with it directly, we can extract the info we want from it at the endpoint when we use it, and generate it when we generate it, and all areas in between simply pass it on without doing anything additional with it. Thus, the places I see "generating" this are either DbNotificationListener or fireInsertEvent(), or ReplCopyTask during a bootstrap dump. The only place I see extracting/consuming this uri would be in ReplCopyTask on destination. All other areas should not split this. metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java (line 376) <https://reviews.apache.org/r/55154/#comment231687> We should not be adding more of these methods into JSONMessageFactory that add field names here. That knowledge should belong to the domain of the message itself. The existing methods that do this are currently slated for removal once we refactor DbNotificationListener to not depend on them. ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java (line 576) <https://reviews.apache.org/r/55154/#comment231678> The partspec can be obtained from insertMsg.getPartitionKeyValues() - we should'nt make calls to JSONMessageFactory here. JSONInsertMessage, in its implementation of getPartitionKeyValues, can, in turn, then call generic functions from JSONMessageFactory using knowledge it has about itself. There should'nt be any explicit calls to JSONMessageFactory from any class which is not a JSON*Message. See the previous ALTER patch and how it changed the ADD_PTNS/CREATE_TABLE processing for a reference. ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java (line 595) <https://reviews.apache.org/r/55154/#comment231681> We should not be making calls to JSONMessageFactory, or getting fields with knowledge of names such as "fileChecksums" or "files". Knowledge of fieldnames should be restricted to inside the message itself, which exposes api via its parent Message class. This should simply be a dump of what the InsertMessage.getFiles() returns and no more. Any encoding of checksum/etc that we do must happen in DbNotificationListener, or even possibly in fireInsertEvent, since the location is meaningless without the checksum. - Sushanth Sowmyan On Jan. 4, 2017, 12:59 p.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55154/ > --- > > (Updated Jan. 4, 2017, 12:59 p.m.) > > > Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair. > > > Bugs: HIVE-15366 > https://issues.apache.org/jira/browse/HIVE-15366 > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-15366 > > > Diffs > - > > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java > e29aa22 > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java > fe747df > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java > bd9f9ec > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java > 9954902 > ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java 4c0f817 > ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 6e9602f > ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java > f61274b > ql/src/java/org/apache/hadoop/hive/ql/parse/Impo
Re: Review Request 55154: HIVE-15366: REPL LOAD & DUMP support for incremental INSERT events
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55154/#review160450 --- ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java (line 573) <https://reviews.apache.org/r/55154/#comment231594> Instead of using JSONMessageFactory.getTableName, please instantiate the InsertMessage (not JSONInsertMessage) and ask it for getTableName() - that way, we stick to portable MessageFactory based api. Also, if you look at the alter patch and how it changes add_ptns, you'll see how to get the partitions objects/etc generically. - Sushanth Sowmyan On Jan. 3, 2017, 10:27 p.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55154/ > --- > > (Updated Jan. 3, 2017, 10:27 p.m.) > > > Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair. > > > Bugs: HIVE-15366 > https://issues.apache.org/jira/browse/HIVE-15366 > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-15366 > > > Diffs > - > > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java > fe747df > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java > bd9f9ec > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java > 9954902 > ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java 4c0f817 > ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 6e9602f > ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java > f61274b > ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java > 5561e06 > > ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java > 9b83407 > > Diff: https://reviews.apache.org/r/55154/diff/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
Re: Review Request 55154: HIVE-15366: REPL LOAD & DUMP support for incremental INSERT events
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55154/#review160447 --- metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java (line 58) <https://reviews.apache.org/r/55154/#comment231590> Rather than a List<byte[]> getFileChecksums, I was really visualizing a List getFiles(), where each file listed is a URI that has the checksum encoded into it. The reason is that a list of checksums is too highly bound to our replication usecase only, and has nothing to do with a more generic "Message" that could be used for other purposes as well. Messages are currently used for things like audit as well, and not just replication. Having the checksums coded in the URLs makes the message interface consistent without knowing more on how to actually read the url. metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java (line 115) <https://reviews.apache.org/r/55154/#comment231591> Same comment as with InsertMessage - this should not be a list of checksums but a list of pathnames (urls) ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java <https://reviews.apache.org/r/55154/#comment231592> removing this is incorrect and breaks current EXPORT in replv1 - this is used to basically noop-out things like non-storagehandler based tables, views, etc. - Sushanth Sowmyan On Jan. 3, 2017, 10:27 p.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55154/ > --- > > (Updated Jan. 3, 2017, 10:27 p.m.) > > > Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair. > > > Bugs: HIVE-15366 > https://issues.apache.org/jira/browse/HIVE-15366 > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-15366 > > > Diffs > - > > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java > fe747df > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java > bd9f9ec > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java > 9954902 > ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java 4c0f817 > ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 6e9602f > ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java > f61274b > ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java > 5561e06 > > ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java > 9b83407 > > Diff: https://reviews.apache.org/r/55154/diff/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
Re: Review Request 55154: HIVE-15366: REPL LOAD & DUMP support for incremental INSERT events
> On Jan. 3, 2017, 11:55 p.m., Sushanth Sowmyan wrote: > > Note - the following is not exhaustive, and I know this patch has already been updated, but wanted to mention a few things that I noticed. - Sushanth --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55154/#review160447 --- On Jan. 3, 2017, 10:27 p.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55154/ > --- > > (Updated Jan. 3, 2017, 10:27 p.m.) > > > Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair. > > > Bugs: HIVE-15366 > https://issues.apache.org/jira/browse/HIVE-15366 > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-15366 > > > Diffs > - > > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/InsertMessage.java > fe747df > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONInsertMessage.java > bd9f9ec > > metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java > 9954902 > ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java 4c0f817 > ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 6e9602f > ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java > f61274b > ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java > 5561e06 > > ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java > 9b83407 > > Diff: https://reviews.apache.org/r/55154/diff/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
[jira] [Created] (HIVE-15536) Tests failing due to unexpected q.out outputs : udf_coalesce,case_sensitivity,input_testxpath,
Sushanth Sowmyan created HIVE-15536: --- Summary: Tests failing due to unexpected q.out outputs : udf_coalesce,case_sensitivity,input_testxpath, Key: HIVE-15536 URL: https://issues.apache.org/jira/browse/HIVE-15536 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan All of these tests seem to be failing based on a q.out diff: {noformat} Running: diff -a /home/hiveptest/162.222.183.40-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/input_testxpath.q.out /home/hiveptest/162.222.183.40-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/input_testxpath.q.out 32a33 > Pruned Column Paths: lintstring.mystring {noformat} {noformat} Running: diff -a /home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/case_sensitivity.q.out /home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/case_sensitivity.q.out 32a33 > Pruned Column Paths: lintstring.mystring {noformat} {noformat} Running: diff -a /home/hiveptest/104.197.172.185-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/udf_coalesce.q.out /home/hiveptest/104.197.172.185-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/udf_coalesce.q.out 142a143 > Pruned Column Paths: lintstring.mystring {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15535) Flaky test : TestHS2HttpServer.testContextRootUrlRewrite
Sushanth Sowmyan created HIVE-15535: --- Summary: Flaky test : TestHS2HttpServer.testContextRootUrlRewrite Key: HIVE-15535 URL: https://issues.apache.org/jira/browse/HIVE-15535 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Per recent test failure : https://builds.apache.org/job/PreCommit-HIVE-Build/2766/testReport/org.apache.hive.service.server/TestHS2HttpServer/testContextRootUrlRewrite/ {noformat} Stacktrace org.junit.ComparisonFailure: expected:<...d>Tue Jan 03 11:54:4[6] PST 2017 ...> but was:<...d>Tue Jan 03 11:54:4[7] PST 2017 ...> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite(TestHS2HttpServer.java:99) {noformat} Looks like it is overly picky on an exact string match on a field that contains a second difference. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15534) Update db/table repl.last.id at the end of REPL LOAD of a batch of events
Sushanth Sowmyan created HIVE-15534: --- Summary: Update db/table repl.last.id at the end of REPL LOAD of a batch of events Key: HIVE-15534 URL: https://issues.apache.org/jira/browse/HIVE-15534 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Tracking TODO task in ReplSemanticAnalyzer : {noformat} // TODO : Over here, we need to track a Map<dbName:String,evLast:Long> for every db updated // and update repl.last.id for each, if this is a wh-level load, and if it is a db-level load, // then a single repl.last.id update, and if this is a tbl-lvl load which does not alter the // table itself, we'll need to update repl.last.id for that as well. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15533) Repl rename support adds unnecessary duplication for non-rename alters
Sushanth Sowmyan created HIVE-15533: --- Summary: Repl rename support adds unnecessary duplication for non-rename alters Key: HIVE-15533 URL: https://issues.apache.org/jira/browse/HIVE-15533 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Currently, the rename events contain a before & after object. For non-rename cases, we simply impress the "after" object, and thus have no need of the "before" object. Thus, we might want to minimize wastage by not materializing "before" if this is a non-rename case. Also worth considering - if a rename case, do we really need the before object, or simply the before & after names? Having before & after objects is good in that it allows us flexibility, but we might not need that much info. From a perf viewpoint, we might want to trim things a bit here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15532) Refactor/cleanup TestReplicationScenario
Sushanth Sowmyan created HIVE-15532: --- Summary: Refactor/cleanup TestReplicationScenario Key: HIVE-15532 URL: https://issues.apache.org/jira/browse/HIVE-15532 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan TestReplicationScenarios could use a bit of cleanup, based on comments from reviews: a) Separate "setup" phase of each test, so that we don't run unnecessary verifications which aren't testing replication itself, but are verifying that the env is set up correctly to then test replication. This can be flag-gated so as to allow it to be turned on at test-dev time, and off during build/commit unit test time. b) Better comments inside the tests for what is being set up / tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15522) REPL LOAD & DUMP support for incremental ALTER_TABLE/ALTER_PTN including renames
Sushanth Sowmyan created HIVE-15522: --- Summary: REPL LOAD & DUMP support for incremental ALTER_TABLE/ALTER_PTN including renames Key: HIVE-15522 URL: https://issues.apache.org/jira/browse/HIVE-15522 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15480) Failing test : TestMiniTezCliDriver.testCliDriver : explainanalyze_1
Sushanth Sowmyan created HIVE-15480: --- Summary: Failing test : TestMiniTezCliDriver.testCliDriver : explainanalyze_1 Key: HIVE-15480 URL: https://issues.apache.org/jira/browse/HIVE-15480 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan See recent ptest failure : https://builds.apache.org/job/PreCommit-HIVE-Build/2642/testReport/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/testCliDriver_explainanalyze_1_/ {noformat} Standard Output Running: diff -a /home/hiveptest/104.154.92.121-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/explainanalyze_1.q.out /home/hiveptest/104.154.92.121-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 248c248 < Group By Operator [GBY_2] (rows=205/500 width=95) --- > Group By Operator [GBY_2] (rows=205/309 width=95) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15469) Fix REPL DUMP/LOAD DROP_PTN so it works on non-string-ptn-key tables
Sushanth Sowmyan created HIVE-15469: --- Summary: Fix REPL DUMP/LOAD DROP_PTN so it works on non-string-ptn-key tables Key: HIVE-15469 URL: https://issues.apache.org/jira/browse/HIVE-15469 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan The current implementation of REPL DROP/REPL LOAD for DROP_PTN is limited to dropping partitions whose key types are strings. This needs the tableObj to be available in the DropPartitionMessage before it can be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15466) REPL LOAD & DUMP support for incremental DROP_TABLE/DROP_PTN
Sushanth Sowmyan created HIVE-15466: --- Summary: REPL LOAD & DUMP support for incremental DROP_TABLE/DROP_PTN Key: HIVE-15466 URL: https://issues.apache.org/jira/browse/HIVE-15466 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15455) Flaky test : TestHS2HttpServer.testContextRootUrlRewrite
Sushanth Sowmyan created HIVE-15455: --- Summary: Flaky test : TestHS2HttpServer.testContextRootUrlRewrite Key: HIVE-15455 URL: https://issues.apache.org/jira/browse/HIVE-15455 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan This test failed in ptest when testing HIVE-15426 but seems to succeed locally. I'm not able to find another recent run which had this test fail as well, and the test logs for HIVE-15426 have been rotated out. Creating this jira anyway, to track it if it pops up again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15454) Failing test : TestMiniTezCliDriver.testCliDriver : explainanalyze_2
Sushanth Sowmyan created HIVE-15454: --- Summary: Failing test : TestMiniTezCliDriver.testCliDriver : explainanalyze_2 Key: HIVE-15454 URL: https://issues.apache.org/jira/browse/HIVE-15454 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan This test has failed on some recent ptest runs. Example : https://builds.apache.org/job/PreCommit-HIVE-Build/2611/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/testCliDriver_explainanalyze_2_/ {noformat} Standard Output Running: diff -a /home/hiveptest/104.197.114.29-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/explainanalyze_2.q.out /home/hiveptest/104.197.114.29-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 2095c2095 < Group By Operator [GBY_16] (rows=500/760 width=280) --- > Group By Operator [GBY_16] (rows=500/619 width=280) 2105c2105 < Group By Operator [GBY_22] (rows=1001/760 width=464) --- > Group By Operator [GBY_22] (rows=1001/619 width=464) 2111c2111 < Group By Operator [GBY_16] (rows=500/760 width=280) --- > Group By Operator [GBY_16] (rows=500/619 width=280) 2119c2119 < Group By Operator [GBY_22] (rows=1001/760 width=464) --- > Group By Operator [GBY_22] (rows=1001/619 width=464) 2125c2125 < Group By Operator [GBY_16] (rows=500/760 width=280) --- > Group By Operator [GBY_16] (rows=500/619 width=280) 2142c2142 < Group By Operator [GBY_22] (rows=1001/760 width=464) --- > Group By Operator [GBY_22] (rows=1001/619 width=464) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15453) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : stats_based_fetch_decision
Sushanth Sowmyan created HIVE-15453: --- Summary: Failing test : TestMiniLlapLocalCliDriver.testCliDriver : stats_based_fetch_decision Key: HIVE-15453 URL: https://issues.apache.org/jira/browse/HIVE-15453 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan This test has been failing in a couple of ptests off late. A recent example is in https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/ {noformat} 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239 2016-12-16 09:42:14 Completed running task attempt: attempt_1481909974530_0001_239_00_00_0 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240 2016-12-16 09:42:14 Completed running task attempt: attempt_1481909974530_0001_240_00_00_0 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240 Running: diff -a /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out 153c153 < Statistics: Num rows: 2000 Data size: 1092000 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 2000 Data size: 1092000 Basic stats: > COMPLETE Column stats: COMPLETE 156c156 < Statistics: Num rows: 1 Data size: 546 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 1 Data size: 546 Basic stats: > COMPLETE Column stats: COMPLETE 160c160 < Statistics: Num rows: 1 Data size: 543 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 1 Data size: 543 Basic stats: > COMPLETE Column stats: COMPLETE 163c163 < Statistics: Num rows: 1 Data size: 543 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 1 Data size: 543 Basic stats: > COMPLETE Column stats: COMPLETE {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15452) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : metadataonly1
Sushanth Sowmyan created HIVE-15452: --- Summary: Failing test : TestMiniLlapLocalCliDriver.testCliDriver : metadataonly1 Key: HIVE-15452 URL: https://issues.apache.org/jira/browse/HIVE-15452 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Test seems to be failing on recent ptest runs. See https://builds.apache.org/job/PreCommit-HIVE-Build/2615/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_metadataonly1_/ for recent example. {noformat} Running: diff -a /home/hiveptest/104.154.236.143-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/metadataonly1.q.out /home/hiveptest/104.154.236.143-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/metadataonly1.q.out 148c148 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 240c240 < NULL --- > 1 287c287 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 379c379 < 0 --- > 1 971c971 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 1016c1016 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 1061c1061 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 1160a1161 > 1 3 1448c1449 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 1492c1493 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 1587c1588 < NULL --- > 2 1690c1691 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 1735c1736 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 1780c1781 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 1825c1826 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 1870c1871 < input format: org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat --- > input format: > org.apache.hadoop.hive.ql.io.OneNullRowInputFormat 1975a1977,1979 > 01:10:10 1 > 01:10:20 1 > 1 3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15451) Failing test : TestMiniLlapCliDriver.testCliDriver : transform_ppr2
Sushanth Sowmyan created HIVE-15451: --- Summary: Failing test : TestMiniLlapCliDriver.testCliDriver : transform_ppr2 Key: HIVE-15451 URL: https://issues.apache.org/jira/browse/HIVE-15451 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan This test has been failing on ptest off late. See https://builds.apache.org/job/PreCommit-HIVE-Build/2615/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapCliDriver/testCliDriver_transform_ppr2_/ for a recent example Fails on stdout diff: {noformat} 2016-12-16 12:20:11 Completed running task attempt: attempt_1481919437560_0001_177_01_00_0 Running: diff -a /home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/transform_ppr2.q.out /home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/llap/transform_ppr2.q.out 41c41 < Statistics: Num rows: 1000 Data size: 178000 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 1000 Data size: 178000 Basic stats: > COMPLETE Column stats: COMPLETE 46c46 < Statistics: Num rows: 1000 Data size: 272000 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 1000 Data size: 272000 Basic stats: > COMPLETE Column stats: COMPLETE 59c59 < Statistics: Num rows: 1000 Data size: 272000 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 1000 Data size: 272000 Basic > stats: COMPLETE Column stats: COMPLETE 63c63 < Statistics: Num rows: 333 Data size: 2664 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 333 Data size: 2664 Basic > stats: COMPLETE Column stats: COMPLETE 69c69 < Statistics: Num rows: 333 Data size: 2664 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 333 Data size: 2664 Basic > stats: COMPLETE Column stats: COMPLETE 178c178 < Statistics: Num rows: 333 Data size: 2664 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 333 Data size: 2664 Basic stats: > COMPLETE Column stats: COMPLETE 184c184 < Statistics: Num rows: 333 Data size: 2664 Basic stats: COMPLETE Column stats: PARTIAL --- > Statistics: Num rows: 333 Data size: 2664 Basic stats: > COMPLETE Column stats: COMPLETE {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15450) Flaky tests : testCliDriver.sample[24679]
Sushanth Sowmyan created HIVE-15450: --- Summary: Flaky tests : testCliDriver.sample[24679] Key: HIVE-15450 URL: https://issues.apache.org/jira/browse/HIVE-15450 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Noted during ptests, the .q out seems to be erroring. There seems to be a difference in ordering of output that is causing this failure. See https://builds.apache.org/job/PreCommit-HIVE-Build/2615/#showFailuresLink for a new-ish job with these failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15449) Failing test : TestVectorizedColumnReaderBase (possibly slow)
Sushanth Sowmyan created HIVE-15449: --- Summary: Failing test : TestVectorizedColumnReaderBase (possibly slow) Key: HIVE-15449 URL: https://issues.apache.org/jira/browse/HIVE-15449 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Got the following error from a ptest run: TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD
Sushanth Sowmyan created HIVE-15426: --- Summary: Fix order guarantee of event executions for REPL LOAD Key: HIVE-15426 URL: https://issues.apache.org/jira/browse/HIVE-15426 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15332) REPL LOAD & DUMP support for incremental CREATE_TABLE/ADD_PTN
Sushanth Sowmyan created HIVE-15332: --- Summary: REPL LOAD & DUMP support for incremental CREATE_TABLE/ADD_PTN Key: HIVE-15332 URL: https://issues.apache.org/jira/browse/HIVE-15332 Project: Hive Issue Type: Sub-task Components: repl Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan We need to add in support for REPL LOAD and REPL DUMP of incremental events, and we need to be able to replicate creates, for a start. This jira tracks the inclusion of CREATE_TABLE/ADD_PARTITION event support to REPL DUMP & LOAD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15284) Add junit test to test replication scenarios
Sushanth Sowmyan created HIVE-15284: --- Summary: Add junit test to test replication scenarios Key: HIVE-15284 URL: https://issues.apache.org/jira/browse/HIVE-15284 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15151) Bootstrap support for replv2
Sushanth Sowmyan created HIVE-15151: --- Summary: Bootstrap support for replv2 Key: HIVE-15151 URL: https://issues.apache.org/jira/browse/HIVE-15151 Project: Hive Issue Type: Sub-task Components: repl Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan We need to support the ability to bootstrap an initial state, dumping out currently existing dbs/tables, etc, so that incremental replication can take over from that point. To this end, we should implement commands such as REPL DUMP, REPL LOAD, REPL STATUS, as described over at https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: behavior or insert overwrite with dynamic partitions
I expect the following because it follows per-ptn if-write-then-overwrite semantics: 0,10 1,25 1,50 There can be a case to be made that it should overwrite the entire table, and that would make sense too(probably more sense than this one), but not one I'd think we should switch behavior to(backward compatibility). On Oct 17, 2016 18:10, "Sergey Shelukhin"wrote: > What do you think this SHOULD do? > > > select key from src; > 10 > 25 > 50 > > > create table t(val int) partitioned by (pk int); > > insert overwrite table t partition (pk) > select 0 as val, key from src where key < 30; > > insert overwrite table t partition (pk) > select 1 as val, key from src where key > 20; > > > > select val, pk from t; > ? > >
Re: Creating a new branch for repl dev
With no objections received, I have created a new branch called repl2, and have created a new umbrella jira ( HIVE-14841 ) and a jira component (repl) to track continued development. Thanks, -Sushanth On Thu, Sep 22, 2016 at 10:03 AM, Sushanth Sowmyan <khorg...@gmail.com> wrote: > Hi Folks, > > We had some work done with replication back at HIVE-7973 and this > implemented a primary mode of replication for hive which can integrate > with tools like Falcon. I intend to move forward on continuing to > improve this, to fix some of the major problems with the current > implementation, mostly the following: > > a) Replication follows a rubberbanding pattern, wherein different > tables/ptns can be in a different/mixed state on the destination, so > that unless all events are caught up on, we do not have an equivalent > warehouse. Thus, this only satisfies DR cases, not load balancing > usecases, and the secondary warehouse is really only seen as a backup, > rather than as a live warehouse that trails the primary. > b) The base implementation is a naive implementation, and has several > performance problems, including a large amount of duplication of data > for subsequent events, as mentioned in HIVE-13348, having to copy out > entire partitions/tables when just a delta of files might be > sufficient/etc. Also, using EXPORT/IMPORT allows us a simple > implementation, but at the cost of tons of temporary space, much of > which is not actually applied at the destination. > > To that end, I want to create a new branch, so that we can track > development on this end on public apache jira. The last time I worked > on this, having a private branch meant large uber patches as in > HIVE-10227, which I would like to avoid this time, and is also more > inkeeping with open-development. Also, developing in master itself is > not a good idea, since some of the ideas I'm trying out can be > experimental, and probably still a ways from maturity. > > So, unless anyone has any objection, I would like to create a new > branch off master, say "repl2" and create an uber jira to manage > individual components of the work. > > Thanks, > -Sushanth
[jira] [Created] (HIVE-14841) Replication - Phase 2
Sushanth Sowmyan created HIVE-14841: --- Summary: Replication - Phase 2 Key: HIVE-14841 URL: https://issues.apache.org/jira/browse/HIVE-14841 Project: Hive Issue Type: New Feature Components: repl Affects Versions: 2.1.0 Reporter: Sushanth Sowmyan Per email sent out to the dev list, the current implementation of replication in hive has certain drawbacks, for instance : * Replication follows a rubberbanding pattern, wherein different tables/ptns can be in a different/mixed state on the destination, so that unless all events are caught up on, we do not have an equivalent warehouse. Thus, this only satisfies DR cases, not load balancing usecases, and the secondary warehouse is really only seen as a backup, rather than as a live warehouse that trails the primary. * The base implementation is a naive implementation, and has several performance problems, including a large amount of duplication of data for subsequent events, as mentioned in HIVE-13348, having to copy out entire partitions/tables when just a delta of files might be sufficient/etc. Also, using EXPORT/IMPORT allows us a simple implementation, but at the cost of tons of temporary space, much of which is not actually applied at the destination. Thus, to track this, we now create a new branch (repl2) and a uber-jira(this one) to track experimental development towards improvement of this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Getting ready for a Hive 1.2.2 release
Hi Folks, I'm afraid I've been otherwise occupied and not been able to spend enough time on this. Thankfully, Vaibhav Gumashta has volunteered to take this on and be the RM for 1.2.2 . He'll follow up on the process as it goes forward. Thanks, -Sushanth On Tue, May 3, 2016 at 4:32 PM, Sushanth Sowmyan <khorg...@gmail.com> wrote: > Hi All, > > It has been nearly a year now since 1.2.1 was released, and I said > then that I would keep the branch open for further bugfixes with a > view of making another 1.2.2 stability upgrade release. There have > been 64 such patches committed since then. > > I think it's time we revisited that and looked to making a 1.2.2 > release to reflect 1.2.1 + all these updates, and I will go ahead and > start rolling out release candidates after next weekend(May 16th) > unless anyone has any objections. > > If anyone wants to get in any other patches before then, please feel > free to do so. The original restrictions for commits to branch-1.2, > that of no breaking changes, no db changes and no large features still > applies, and I'll do a full test verification before pushing it out. > If anyone has any patches that they think will take longer than next > week, but are important fixes that need to be in this, please ping me. > I will edit the wiki ( > https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status > ) to reflect this. > > Thanks, > -Sushanth
Creating a new branch for repl dev
Hi Folks, We had some work done with replication back at HIVE-7973 and this implemented a primary mode of replication for hive which can integrate with tools like Falcon. I intend to move forward on continuing to improve this, to fix some of the major problems with the current implementation, mostly the following: a) Replication follows a rubberbanding pattern, wherein different tables/ptns can be in a different/mixed state on the destination, so that unless all events are caught up on, we do not have an equivalent warehouse. Thus, this only satisfies DR cases, not load balancing usecases, and the secondary warehouse is really only seen as a backup, rather than as a live warehouse that trails the primary. b) The base implementation is a naive implementation, and has several performance problems, including a large amount of duplication of data for subsequent events, as mentioned in HIVE-13348, having to copy out entire partitions/tables when just a delta of files might be sufficient/etc. Also, using EXPORT/IMPORT allows us a simple implementation, but at the cost of tons of temporary space, much of which is not actually applied at the destination. To that end, I want to create a new branch, so that we can track development on this end on public apache jira. The last time I worked on this, having a private branch meant large uber patches as in HIVE-10227, which I would like to avoid this time, and is also more inkeeping with open-development. Also, developing in master itself is not a good idea, since some of the ideas I'm trying out can be experimental, and probably still a ways from maturity. So, unless anyone has any objection, I would like to create a new branch off master, say "repl2" and create an uber jira to manage individual components of the work. Thanks, -Sushanth
[jira] [Created] (HIVE-14766) ObjectStore.initialize() needs retry mechanisms in case of connection failures
Sushanth Sowmyan created HIVE-14766: --- Summary: ObjectStore.initialize() needs retry mechanisms in case of connection failures Key: HIVE-14766 URL: https://issues.apache.org/jira/browse/HIVE-14766 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan RetryingHMSHandler handles retries to most HMSHandler calls. However, one area where we do not have retries is in the very instantiation of ObjectStore. The lack of retries here sometimes means that a flaky db connect around the time the metastore is started yields an unresponsive metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Making storage-api a separately released artifact
+1 for having a separate storage-api project to define common interfaces for people to develop against. It'll make things much easier to develop against generically. I'm okay(+0) with the sub-project idea as opposed to enthusiastic about it, mostly because I have reservations that it'll encourage laziness and will in practice wind up being tied to hive releases and dev and over time assumptions of how hive works and what is available will bleed in. But, still, having a motion of separation will definitely help. On Aug 17, 2016 11:39, "Prasanth Jayachandran" < pjayachand...@hortonworks.com> wrote: > +1 for making it a subproject with separate (preferably shorter) release > cycle. The module in itself is too small for a separate project. Also > having a faster release cycle will resolve circular dependency and will > help other projects make use of vectorization, sarg, bloom filter etc. > > For version management, how about adding another version after patch > version i.e sub-project version? > Example: 2.2.0.[0] will be storage api’s release version. Hive will always > depend on 2.2.0-SNAPSHOT. I think maven will let us release modules with > different versions. https://dev.c-ware.de/confluence/display/PUBLIC/ > Releasing+modules+of+a+multi-module+project+with+ > independent+version+numbers > > Thanks > Prasanth > > > On Aug 17, 2016, at 10:46 AM, Alan Gateswrote: > > > > +1 for making the API clean and easy for other projects to work with. A > few questions: > > > > 1) Would this also make it easier for Parquet and others to implement > Hive’s ACID interfaces? > > > > 2) Would we make any attempt to coordinate version numbers between Hive > and the storage module, or would a given version of Hive just depend on a > given version of the storage module? > > > > Alan. > > > >> On Aug 15, 2016, at 17:01, Owen O'Malley wrote: > >> > >> All, > >> > >> As part of moving ORC out of Hive, we pulled all of the vectorization > >> storage and sarg classes into a separate module, which is named > >> storage-api. Although it is currently only used by ORC, it could be > used > >> by Parquet or Avro if they wanted to make a fast vectorized reader that > >> read directly in to Hive's VectorizedRowBatch without needing a shim or > >> data copy. Note that this is in many ways similar to pulling the Arrow > >> project out of Drill. > >> > >> This unfortunately still leaves us with a circular dependency between > Hive > >> and ORC. I'd hoped that storage-api wouldn't change that much, but that > >> doesn't seem to be happening. As a result, ORC ends up shipping its own > >> fork of storage-api. > >> > >> Although we could make a new project for just the storage-api, I think > it > >> would be better to make it a subproject of Hive that is released > >> independently. > >> > >> What do others think? > >> > >> Owen > > > > > >
[jira] [Created] (HIVE-14449) Expand HiveReplication doc as a admin/user-facing doc
Sushanth Sowmyan created HIVE-14449: --- Summary: Expand HiveReplication doc as a admin/user-facing doc Key: HIVE-14449 URL: https://issues.apache.org/jira/browse/HIVE-14449 Project: Hive Issue Type: Bug Components: Documentation Reporter: Sushanth Sowmyan https://cwiki.apache.org/confluence/display/Hive/Replication is a good user-facing/admin-facing doc for replication, in contrast to the https://cwiki.apache.org/confluence/display/Hive/HiveReplicationDevelopment which was intended to talk more about the design of it. We should expand this further with all the knobs that exist, what APIs exist for other programs to take advantage of replication, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14394) Reduce excessive INFO level logging
Sushanth Sowmyan created HIVE-14394: --- Summary: Reduce excessive INFO level logging Key: HIVE-14394 URL: https://issues.apache.org/jira/browse/HIVE-14394 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan We need to cull down on the number of logs we generate in HMS and HS2 that are not needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14365) Simplify logic for check introduced in HIVE-10022
Sushanth Sowmyan created HIVE-14365: --- Summary: Simplify logic for check introduced in HIVE-10022 Key: HIVE-14365 URL: https://issues.apache.org/jira/browse/HIVE-14365 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan We introduced a parent-check/glob-check/file-check in SQLAuthorizationUtils in HIVE-10022, but the logic for that is more convoluted than it needs to be. Taking a cue off RANGER-1126 , we should simplify this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [ANNOUNCE] New PMC Member : Jesus
Good to have you onboard, Jesus! :) On Jul 17, 2016 12:00, "Lefty Leverenz"wrote: > Congratulations Jesus! > > -- Lefty > > On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan > wrote: > >> Hello Hive community, >> >> I'm pleased to announce that Jesus Camacho Rodriguez has accepted the >> Apache Hive PMC's >> invitation, and is now our newest PMC member. Many thanks to Jesus for >> all of >> his hard work. >> >> Please join me congratulating Jesus! >> >> Best, >> Ashutosh >> (On behalf of the Apache Hive PMC) >> > >
Re: [ANNOUNCE] New PMC Member : Pengcheng
Welcome aboard Pengcheng! :) On Jul 17, 2016 12:01, "Lefty Leverenz"wrote: > Congratulations Pengcheng! > > -- Lefty > > On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan > wrote: > >> > >> > Hello Hive community, >> > >> > I'm pleased to announce that Pengcheng Xiong has accepted the Apache >> Hive >> > PMC's >> > invitation, and is now our newest PMC member. Many thanks to Pengcheng >> for >> > all of his hard work. >> > >> > Please join me congratulating Pengcheng! >> > >> > Best, >> > Ashutosh >> > (On behalf of the Apache Hive PMC) >> > >> > >
[jira] [Created] (HIVE-14207) Strip HiveConf hidden params in webui conf
Sushanth Sowmyan created HIVE-14207: --- Summary: Strip HiveConf hidden params in webui conf Key: HIVE-14207 URL: https://issues.apache.org/jira/browse/HIVE-14207 Project: Hive Issue Type: Bug Components: Web UI Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan HIVE-12338 introduced a new web ui, which has a page that displays the current HiveConf being used by HS2. However, before it displays that config, it does not strip entries from it which are considered "hidden" conf parameters, thus exposing those values from a web-ui for HS2. We need to add stripping to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3
Actually, to be more explicit, per Thejas' case of the top level license taking precedence, this RC has my +1. On Fri, Jun 17, 2016 at 3:28 PM, Sushanth Sowmyan <khorg...@gmail.com> wrote: > I will happily rescind my -1 and even convert it to a +1 if the top > level license does hold. I thought that the RAT check was a necessary > blocker. > > (Although, if the top level license does cover across the board, we > may want to open a new discussion on whether having a license > requirement for every source file is necessary in the first place, and > tweak the definition of the rat check so it does not fail it in this > case.) > > On Fri, Jun 17, 2016 at 3:20 PM, Thejas Nair <thejas.n...@gmail.com> wrote: >> I don't think the missing headers for 2 files mandates a respin of >> this RC . It is not really a case of 'incompatible' license or code >> that shouldn't be shipped. >> We have a top level license file that covers the entire project, >> including these files. >> IMO, We should fix it if there is a new RC for some other reason. But >> this alone doesn't seem to make new RC necessary. >> >> Sushanth, Can you please reconsider your -1 ? >> >> >> On Fri, Jun 17, 2016 at 3:06 PM, Sushanth Sowmyan <khorg...@gmail.com> wrote: >>> -1, terribly sorry I didn't check for this earlier, but the RAT check >>> fails for this. >>> >>> If you run mvn apache-rat:check , then you see the following issue: >>> >>> Unapproved licenses: >>> >>> >>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java >>> >>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java >>> >>> Basically, these two files are missing the apache license header. We >>> need to add them in. >>> >>> All other things are good, though. It has the oracle fix I asked for >>> in RC2, md5s and signatures check out, compilation works on source >>> package, and I'm able to run the hive binary from the binary package. >>> I also tried a number of tests, and I've run a rat test on the release >>> >>> On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguez >>> <jcamachorodrig...@hortonworks.com> wrote: >>>> Apache Hive 2.1.0 Release Candidate 3 is available here: >>>> >>>> http://people.apache.org/~jcamacho/hive-2.1.0-rc3 >>>> >>>> Maven artifacts are available here: >>>> >>>> https://repository.apache.org/content/repositories/orgapachehive-1057/ >>>> >>>> Source tag for RC3 is at: >>>> https://github.com/apache/hive/releases/tag/release-2.1.0-rc3 >>>> >>>> >>>> Voting will conclude in 72 hours. >>>> >>>> Hive PMC Members: Please test and vote. >>>> >>>> Thanks. >>>> >>>> >>>> >>>>
Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3
I will happily rescind my -1 and even convert it to a +1 if the top level license does hold. I thought that the RAT check was a necessary blocker. (Although, if the top level license does cover across the board, we may want to open a new discussion on whether having a license requirement for every source file is necessary in the first place, and tweak the definition of the rat check so it does not fail it in this case.) On Fri, Jun 17, 2016 at 3:20 PM, Thejas Nair <thejas.n...@gmail.com> wrote: > I don't think the missing headers for 2 files mandates a respin of > this RC . It is not really a case of 'incompatible' license or code > that shouldn't be shipped. > We have a top level license file that covers the entire project, > including these files. > IMO, We should fix it if there is a new RC for some other reason. But > this alone doesn't seem to make new RC necessary. > > Sushanth, Can you please reconsider your -1 ? > > > On Fri, Jun 17, 2016 at 3:06 PM, Sushanth Sowmyan <khorg...@gmail.com> wrote: >> -1, terribly sorry I didn't check for this earlier, but the RAT check >> fails for this. >> >> If you run mvn apache-rat:check , then you see the following issue: >> >> Unapproved licenses: >> >> >> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java >> >> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java >> >> Basically, these two files are missing the apache license header. We >> need to add them in. >> >> All other things are good, though. It has the oracle fix I asked for >> in RC2, md5s and signatures check out, compilation works on source >> package, and I'm able to run the hive binary from the binary package. >> I also tried a number of tests, and I've run a rat test on the release >> >> On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguez >> <jcamachorodrig...@hortonworks.com> wrote: >>> Apache Hive 2.1.0 Release Candidate 3 is available here: >>> >>> http://people.apache.org/~jcamacho/hive-2.1.0-rc3 >>> >>> Maven artifacts are available here: >>> >>> https://repository.apache.org/content/repositories/orgapachehive-1057/ >>> >>> Source tag for RC3 is at: >>> https://github.com/apache/hive/releases/tag/release-2.1.0-rc3 >>> >>> >>> Voting will conclude in 72 hours. >>> >>> Hive PMC Members: Please test and vote. >>> >>> Thanks. >>> >>> >>> >>>
Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3
-1, terribly sorry I didn't check for this earlier, but the RAT check fails for this. If you run mvn apache-rat:check , then you see the following issue: Unapproved licenses: /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java Basically, these two files are missing the apache license header. We need to add them in. All other things are good, though. It has the oracle fix I asked for in RC2, md5s and signatures check out, compilation works on source package, and I'm able to run the hive binary from the binary package. I also tried a number of tests, and I've run a rat test on the release On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguezwrote: > Apache Hive 2.1.0 Release Candidate 3 is available here: > > http://people.apache.org/~jcamacho/hive-2.1.0-rc3 > > Maven artifacts are available here: > > https://repository.apache.org/content/repositories/orgapachehive-1057/ > > Source tag for RC3 is at: > https://github.com/apache/hive/releases/tag/release-2.1.0-rc3 > > > Voting will conclude in 72 hours. > > Hive PMC Members: Please test and vote. > > Thanks. > > > >
Re: [VOTE] Apache Hive 2.1.0 Release Candidate 2
Without HIVE-14020, I'm afraid people will not be able to upgrade the hive metastore from an earlier version of hive to 2.1 if they use Oracle as a backing db. There are workarounds, in that the sql script is easily fixed, but since we're still in the process of voting a RC, I think this is a big enough problem that we should roll out a new RC. I think I'm a -0 on this. On Thu, Jun 16, 2016 at 2:58 PM, Jesus Camacho Rodriguezwrote: > Yes, exactly... I am taking care of that once again, do not worry. > If you want a precise list of which issues were actually fixed in > this release, you can check the release notes in RC2 :) > > > > > On 6/16/16, 10:32 PM, "Sergey Shelukhin" wrote: > >>Hmm… would this mean that all those issues changed from 2.1.1 to 2.1.0 >>would need to be changed back to 2.1.1 now? ;) >> >>On 16/6/16, 13:12, "Jesus Camacho Rodriguez" >> wrote: >> >>>I have been talking to Matt and HIVE-13974 will not make it to the >>>release as it needs some >>>additional time to be fixed. I will add info about this issue to the >>>release note. >>> >>>This means RC2 is still alive. >>> >>>We already got a +1 from Alan. Please, Hive PMC Members, test and vote so >>>we can move forward >>>with the release! >>> >>>Thanks! >>> >>> >>> >>>On 6/16/16, 11:02 AM, "Jesus Camacho Rodriguez" >>> wrote: >>> Sure, I am taking care of this each time we roll out a new RC. On 6/15/16, 10:43 PM, "Sergey Shelukhin" wrote: >Should all the 2.1.1-fixed JIRAs be converted to 2.1.0? > >On 16/6/15, 14:03, "Jesus Camacho Rodriguez" > wrote: > >>OK, vote for RC2 is cancelled. >> >>Matt, please push HIVE-13974 as soon as possible and I will restart the >>vote. >> >>Thanks, >>Jesús >> >> >> >> >> >>On 6/15/16, 9:47 PM, "Matthew McCline" >>wrote: >> >>> >>>-1 for HIVE-13974 ORC Schema Evolution doesn't support add columns to >>>non-last STRUCT columns >>> >>>This bug will prevent people with ORC tables that have added columns >>>to >>>inner STRUCT columns to not be able to read their tables. >>> >>> >>>From: Jesus Camacho Rodriguez >>>Sent: Wednesday, June 15, 2016 3:20 AM >>>To: dev@hive.apache.org >>>Subject: Re: [VOTE] Apache Hive 2.1.0 Release Candidate 2 >>> >>>Hive PMC members, >>> >>>Just a quick reminder that the vote for RC2 is still open and it needs >>>two additional votes to pass. >>> >>>Please test and cast your vote! >>> >>>Thanks, >>>Jesús >>> >>> >>> >>>On 6/10/16, 6:29 PM, "Alan Gates" wrote: >>> +1, checked signatures, did a build and ran a few simple unit tests. Alan. > On Jun 10, 2016, at 05:44, Jesus Camacho Rodriguez > wrote: > > Apache Hive 2.1.0 Release Candidate 2 is available here: > > http://people.apache.org/~jcamacho/hive-2.1.0-rc2 > > Maven artifacts are available here: > > >https://repository.apache.org/content/repositories/orgapachehive-105 >5/ > > Source tag for RC2 is at: > https://github.com/apache/hive/releases/tag/release-2.1.0-rc2 > > > Voting will conclude in 72 hours. > > Hive PMC Members: Please test and vote. > > Thanks. > > >>> > >>
[jira] [Created] (HIVE-13949) Investigate why Filter mechanism does not work for XSRF filtering from HS2
Sushanth Sowmyan created HIVE-13949: --- Summary: Investigate why Filter mechanism does not work for XSRF filtering from HS2 Key: HIVE-13949 URL: https://issues.apache.org/jira/browse/HIVE-13949 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Sushanth Sowmyan While working on HIVE-13853, it was found that simply using the constructed Filter as-is from ThriftHttpCLIService was not working, and thus, needed explicit calling of the filtering method from ThriftHttpServlet was needed. We should investigate why that other method did not work, and make it fall inline with filter usage, so as to not need to call functions inside the filter. Also, this is a prerequisite for eventually getting rid of our shim if we later update to always expecting hadoop versions that contain the filter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13941) Improve errors returned from SchemaTool
Sushanth Sowmyan created HIVE-13941: --- Summary: Improve errors returned from SchemaTool Key: HIVE-13941 URL: https://issues.apache.org/jira/browse/HIVE-13941 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan We've had feedback from Ambari folks on Schematool usage being opaque on errors. While, yes, the underlying error is present hidden in the stacktrace if you do a --verbose, that is often unwieldy and unusable. And without a --verbose, there is no indication of what actually went wrong. Thus, we need to fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13931) Add support for HikariCP and replace BoneCP usage with HikariCP
Sushanth Sowmyan created HIVE-13931: --- Summary: Add support for HikariCP and replace BoneCP usage with HikariCP Key: HIVE-13931 URL: https://issues.apache.org/jira/browse/HIVE-13931 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Currently, we use BoneCP as our primary connection pooling mechanism (overridable by users). However, BoneCP is no longer being actively developed, and is considered deprecated, replaced by HikariCP. Thus, we should add support for HikariCP, and try to replace our primary usage of BoneCP with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
Sushanth Sowmyan created HIVE-13853: --- Summary: Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat Key: HIVE-13853 URL: https://issues.apache.org/jira/browse/HIVE-13853 Project: Hive Issue Type: Bug Components: HiveServer2, WebHCat Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan There is a possibility that there may be a CSRF-based attack on various hadoop components, and thus, there is an effort to add a block for all incoming http requests if they do not contain a X-XSRF-Header header. (See HADOOP-12691 for motivation) This has potential to affect HS2 when running on thrift-over-http mode(if cookie-based-auth is used), and webhcat. We introduce new flags to determine whether or not we're using the filter, and if we are, we will automatically reject any http requests which do not contain this header. To allow this to work, we also need to make changes to our JDBC driver to automatically inject this header into any requests it makes. Also, any client-side programs/api not using the JDBC driver directly will need to make changes to add a X-XSRF-Header header to the request to make calls to HS2/WebHCat if this filter is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13738) Bump up httpcomponent.*.version deps in branch-1.2 to 4.4
Sushanth Sowmyan created HIVE-13738: --- Summary: Bump up httpcomponent.*.version deps in branch-1.2 to 4.4 Key: HIVE-13738 URL: https://issues.apache.org/jira/browse/HIVE-13738 Project: Hive Issue Type: Bug Affects Versions: 1.2.1, 1.2.2 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan apache-httpcomponents has had certain security issues (see HADOOP-12767) due to which upgrading to a newer dep version is recommended. We've already upped the dep. version to 4.4 in other branches of hive, we should do so here as well if we are going to do a new update of 1.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Getting ready for a Hive 1.2.2 release
Hi All, It has been nearly a year now since 1.2.1 was released, and I said then that I would keep the branch open for further bugfixes with a view of making another 1.2.2 stability upgrade release. There have been 64 such patches committed since then. I think it's time we revisited that and looked to making a 1.2.2 release to reflect 1.2.1 + all these updates, and I will go ahead and start rolling out release candidates after next weekend(May 16th) unless anyone has any objections. If anyone wants to get in any other patches before then, please feel free to do so. The original restrictions for commits to branch-1.2, that of no breaking changes, no db changes and no large features still applies, and I'll do a full test verification before pushing it out. If anyone has any patches that they think will take longer than next week, but are important fixes that need to be in this, please ping me. I will edit the wiki ( https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status ) to reflect this. Thanks, -Sushanth
[jira] [Created] (HIVE-13670) Improve Beeline reconnect semantics
Sushanth Sowmyan created HIVE-13670: --- Summary: Improve Beeline reconnect semantics Key: HIVE-13670 URL: https://issues.apache.org/jira/browse/HIVE-13670 Project: Hive Issue Type: Improvement Affects Versions: 2.1.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan For most users of beeline, chances are that they will be using it with a single HS2 instance most of the time. In this scenario, having them type out a jdbc uri for HS2 every single time to !connect can get tiresome. Thus, we should improve semantics so that if a user does a successful !connect, then we must store the last-connected-to-url, so that if they do a !close, and then a !reconnect, then !reconnect should attempt to connect to the last successfully used url. Also, if they then do a !save, then that last-successfully-used url must be saved, so that in subsequent sessions, they can simply do !reconnect rather than specifying a url for !connect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13645) Beeline needs null-guard around hiveVars and hiveConfVars read
Sushanth Sowmyan created HIVE-13645: --- Summary: Beeline needs null-guard around hiveVars and hiveConfVars read Key: HIVE-13645 URL: https://issues.apache.org/jira/browse/HIVE-13645 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 2.1.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Beeline has a bug wherein if a user does a !save ever, then on next load, if beeline.hiveVariables or beeline.hiveconfvariables are empty, i.e. \{\} or unspecified, then it loads it as null, and then, on next connect, there is no null-check on these variables leading to an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13480) Add hadoop2 metrics reporter for Codahale metrics
Sushanth Sowmyan created HIVE-13480: --- Summary: Add hadoop2 metrics reporter for Codahale metrics Key: HIVE-13480 URL: https://issues.apache.org/jira/browse/HIVE-13480 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Multiple other apache components allow sending metrics over to Hadoop2 metrics, which allow for monitoring solutions like Ambari Metrics Server to work against that to show metrics for components in one place. Our Codahale metrics works very well, so ideally, we would like to bridge the two, to allow Codahale to add a Hadoop2 reporter that enables us to continue to use Codahale metrics (i.e. not write another custom metrics impl) but report using Hadoop2. Apache Phoenix also had such a recent usecase and were in the process of adding in a stub piece that allows this forwarding. We should use the same reporter to minimize redundancy while pushing metrics to a centralized solution like Hadoop2 Metrics/AMS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13370) Add test for HIVE-11470
Sushanth Sowmyan created HIVE-13370: --- Summary: Add test for HIVE-11470 Key: HIVE-13370 URL: https://issues.apache.org/jira/browse/HIVE-13370 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13348) Add Event Nullification support for Replication
Sushanth Sowmyan created HIVE-13348: --- Summary: Add Event Nullification support for Replication Key: HIVE-13348 URL: https://issues.apache.org/jira/browse/HIVE-13348 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Replication, as implemented by HIVE-7973 works as follows: a) For every singly modification to the hive metastore, an event gets triggered that logs a notification object. b) Replication tools such as falcon can consume these notification objects as a HCatReplicationTaskIterator from HCatClient.getReplicationTasks(lastEventId, maxEvents, dbName, tableName). c) For each event, we generate statements and distcp requirements for falcon to export, distcp and import to do the replication (along with requisite changes to export and import that would allow state management). The big thing missing from this picture is that while it works, it is pretty dumb about how it works in that it will exhaustively process every single event generated, and will try to do the export-distcp-import cycle for all modifications, irrespective of whether or not that will actually get used at import time. We need to build some sort of filtering logic which can process a batch of events to identify events that will result in effective no-ops, and to nullify those events from the stream before passing them on. The goal is to minimize the number of events that the tools like Falcon would actually have to process. Examples of cases where event nullification would take place: a) CREATE-DROP cases: If an object is being created in event#34 that will eventually get dropped in event#47, then there is no point in replicating this along. We simply null out both these events, and also, any other event that references this object between event#34 and event#47. b) APPEND-APPEND : Some objects are replicated wholesale, which means every APPEND that occurs would cause a full export of the object in question. At this point, the prior APPENDS would all be supplanted by the last APPEND. Thus, we could nullify all the prior such events. Additional such cases can be inferred by analysis of the Export-Import relay protocol definition at https://issues.apache.org/jira/secure/attachment/12725999/EXIMReplicationReplayProtocol.pdf or by reasoning out various event processing orders possible. Replication, as implemented by HIVE-7973 is merely a first step for functional support. This work is needed for replication to be efficient at all, and thus, usable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [Discuss] MariaDB support
+1 to introduction of mariadb support - I think it's important that we support MariaDB - there is an increasing interest in the broader open source community of migrating from mysql to either postgres or mariadb. While they're compatible now, it's important that we be aware of gotchas that come up, which we'll be aware of only after there is active usage. +1 to not duplicating mysql scripts unless we find a need to diverge, and having schematool consider it an alias for now. On Wed, Mar 16, 2016 at 12:09 PM, Szehon Howrote: > Yea, +1 to point 2. > > For point one, I also agree that it is compatible with mysql and not be a > ton of work unless you want to optimize, on our observations we have seen > existing mysql scripts work fine against mariadb. > > On Wed, Mar 16, 2016 at 12:04 PM, Dmitry Tolpeko > wrote: >> >> +1 great idea >> >> On Wed, Mar 16, 2016 at 10:00 PM, Thejas Nair >> wrote: >>> >>> + Sergio, Szehon, Ashutosh, Sushanth, Sergey, >>> >>> Any thoughts on this ? >>> >>> >>> On Tue, Mar 15, 2016 at 7:08 PM, Thejas Nair >>> wrote: >>> > There seems to be increasing interest in supporting MariaDB as an >>> > option for storing metastore metadata. Supporting it as a database >>> > option is also easy as it is compatible with mysql. I thought it would >>> > be useful to discuss supporting it in the dev list before creating any >>> > jiras. >>> > >>> > There are two aspects I would like to discuss - >>> > >>> > 1. Changes in hive to support MariaDB >>> > >>> > The existing mysql schema creation/upgrade scripts in hive should just >>> > work for mariadb as well. >>> > However, MariaDB has some additional optimizations that we might want >>> > to use in future to optimize queries for it. That would mean creating >>> > specific scripts for mariadb. >>> > >>> > However, until we introduce such MariaDB specific tuning, I think it >>> > is better to avoid duplicating the mysql scripts. >>> > >>> > To make the transition to possibly using MariaDB optimized scripts >>> > easier, one option is to have schematool consider it as an alias for >>> > mysql until that happens. >>> > >>> > >>> > 2. Testing with MariaDB >>> > It would be useful to have tests for mariadb as well on the lines of >>> > what is available for mysql in >>> > https://issues.apache.org/jira/browse/HIVE-9800, to ensure that >>> > mariadb support is not broken. >>> > >>> > Thanks, >>> > Thejas >> >> >
CVE-2015-7521: Apache Hive authorization bug disclosure (update)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 CVE-2015-7521: Apache Hive authorization bug disclosure Severity: Important Vendor: The Apache Software Foundation Versions Affected: Apache Hive 0.13.x Apache Hive 0.14.x Apache Hive 1.0.0 - 1.0.1 Apache Hive 1.1.0 - 1.1.1 Apache Hive 1.2.0 - 1.2.1 Description: Some partition-level operations exist that do not explicitly also authorize privileges of the parent table. This can lead to issues when the parent table would have denied the operation, but no denial occurs because the partition-level privilege is not checked by the authorization framework, which defines authorization entities only from the table level upwards. This issue is known to affect Hive clusters protected by both Ranger as well as SqlStdHiveAuthorization. Mitigation: For Hive 0.13.x, 0.14.x, 1.0, 1.1 and 1.2, a separate jar is being made available, which users can put in their ${HIVE_HOME}/lib/, and this provides a hook for administrators to add to their hive-site.xml, by setting hive.semantic.analyzer.hook=org.apache.hadoop.hive.ql.parse.ParentTableAuthorizationHook . This parameter is a comma-separated-list and this hook can be appended to an existing list if one already exists in the setup. You will then want to make sure that you protect the hive.semantic.analyzer.hook parameter from being changed at runtime by adding it to hive.conf.restricted.list. This jar and associated source tarball are available for download over at : https://hive.apache.org/downloads.html along with their gpg-signed .asc signatures, as well as the md5sums for verification in the hive-parent-auth-hook/ directory. This issue has already been patched in all Hive branches that are affected, and is fixed in the recently released Hive 2.0.0. Hive 2.0.0 and any future release will not need these mitigation steps. Credit: This issue was discovered by Olaf Flebbe of science+computing ag. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) iQIVAwUBVsTAYh6tt4FFMLreAQJwiw/+JqSYNXefO6dAckvDke57Hv+TYqB36K06 pQt6JiRBQ1Ov084TkfrDESj9ftIIdxnL4MD8o2wmunSJSL6an6aFFR3uxMjmYDrW 6cTr1noxl3t1WQHVf0oE4aAKCjmYBp+6qtlymt4y//PKNxaVq+8bQ53jArMt78YA UZHV3ET+9vxQM2uoseh1QbdonFMsNMVFY2SfDiZ9OKk8o5eQuF9XhjJWpNKyboYR hxQhjCfZxkCcqA6ulG/lhpxjRvaqEN8JwePQfpNxEToTm6Y68PrQbR01ry+MENS2 Q2KQ9H8sr9LQMXM1U+pvf1NUDnEA5m6sWTC7JcLoz/4KP5aLy1yxSAoVKhDF5ewI 7d8ECRFsCtJo64yQzy1k7W6vdkg8wuciVKv86KVYaM926wFK0Lj9VFjxFO2G1AY5 nBDMxgEnGk0AiNb9qa8fnVSsiDTwrvfBglvQlmTawdCeBUBWFaNONvxP+9lohe04 NYZz3FKSUTFaqluijfw+2x+abP+0qbwy3JfnUgTdttXJ8R5Xxlf2vGmlj2mAJYI/ +hwfBgBkVeITQ5YK/wNaI2tr8FSFOitX4np/FtJA860ygGxi9C4P/Sl1Xj97cCJC HSfZjIOsJ6j11W+DFmI85FE5Pqp042EHq8yqIPrlcKAlmrNT3mtXyrWqdBXjESxs BXyP9rHZJxo= =5PjL -END PGP SIGNATURE-
CVE-2015-7521: Apache Hive authorization bug disclosure
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 CVE-2015-7521: Apache Hive authorization bug disclosure Severity: Important Vendor: The Apache Software Foundation Versions Affected: Apache Hive 1.0.0 - 1.0.1 Apache Hive 1.1.0 - 1.1.1 Apache Hive 1.2.0 - 1.2.1 Description: Some partition-level operations exist that do not explicitly also authorize privileges of the parent table. This can lead to issues when the parent table would have denied the operation, but no denial occurs because the partition-level privilege is not checked by the authorization framework, which defines authorization entities only from the table level upwards. This issue is known to affect Hive clusters protected by both Ranger as well as SqlStdHiveAuthorization. Mitigation: For Hive 1.0, 1.1 and 1.2, a separate jar is being made available, which users can put in their ${HIVE_HOME}/lib/, and this provides a hook for administrators to add to their hive-site.xml, by setting hive.semantic.analyzer.hook=org.apache.hadoop.hive.ql.parse.ParentTableAuthorizationHook . This parameter is a comma-separated-list and this hook can be appended to an existing list if one already exists in the setup. You will then want to make sure that you protect the hive.semantic.analyzer.hook parameter from being changed at runtime by adding it to hive.conf.restricted.list. This jar and associated source tarball are available for download over at : https://hive.apache.org/downloads.html along with their gpg-signed .asc signatures, as well as the md5sums for verification in the hive-parent-auth-hook/ directory. This issue has already been patched in all Hive branches that are affected, and any future release will not need these mitigation steps. Credit: This issue was discovered by Olaf Flebbe of science+computing ag. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) iQIVAwUBVqpoih6tt4FFMLreAQKkbA//f+r+DDDKiYPbymTXjOhCUqDIDirtiT2A OZHBn3LTNad3eQPZ6qrYadbw30iJpU+SCMtN+gO7F27TJRIdBfk/g9HjrG9i/uLb q4a/uzHEGbFFnfz61gXERtvyqHP/7RzbUz/WNBvCGitJJL2AZ/j3oTvUxc4r3fbC mVXSCtkY7fj28fbi9/jhj/go9Qr7aL0Tk/Tkb6RQ97YwoZVZTvPTFh7ALmX+f2Qh 0qPMg7phL9clTXR/cNGRA8LUFRbDuTahP5hptHmE2KgGQJK5fjKwvisoH6lvVKnh iv5UFs9jjcvPd1MpuBDRfHj/RA0L8emkDzz2/36XKM0AFyEB5gHf4U7ZT7AVj1P/ xwdxgNJZcqgRfSabkNIbNhEcYLVx9H2btIIAgkdDnu2HaxzCBErjRcv6hj7bbG5N 5NHQDcnjzj86u2D7XiA4hXPLnQE6JNJyc7cLaU4xRV18QiN9KzpDJQpIot0GvXs1 7q2+I6H6AxDxeotSCmQnwEE5NCVxl3ivUCKA8tA0jxEzhm8QE/bTaeM00OwJ+7wl ruDdGkfF3b854U4Fyzh14WCGy1b74wjc79iOt8tJfLEh9kdRNbA5Jb7QZYNpCJ4n Eb5lxZv5MQFyBvbJCttz59jgzxCcmunkyNZamGRGugmR3Dwu9jOQRCk2s+4pouCf 20RJ9WEkoXY= =Q0SZ -END PGP SIGNATURE-
[jira] [Created] (HIVE-12937) DbNotificationListener unable to clean up old notification events
Sushanth Sowmyan created HIVE-12937: --- Summary: DbNotificationListener unable to clean up old notification events Key: HIVE-12937 URL: https://issues.apache.org/jira/browse/HIVE-12937 Project: Hive Issue Type: Bug Affects Versions: 1.2.1, 1.3.0, 2.0.0, 2.1.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan There is a bug in ObjectStore, where we use pm.deletePersistent instead of pm.deletePersistentAll, which causes the persistenceManager to try and drop a org.datanucleus.store.rdbms.query.ForwardQueryResult instead of the appropriate associated org.apache.hadoop.hive.metastore.model.MNotificationLog. This results in an error that looks like this: {noformat} Exception in thread "CleanerThread" org.datanucleus.api.jdo.exceptions.ClassNotPersistenceCapableException: The class "org.datanucleus.store.rdbms.query.ForwardQueryResult" is not persistable. This means that it either hasnt been enhanced, or that the enhanced version of the file is not in the CLASSPATH (or is hidden by an unenhanced version), or the Meta-Data/annotations for the class are not found. at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:380) at org.datanucleus.api.jdo.JDOPersistenceManager.jdoDeletePersistent(JDOPersistenceManager.java:807) at org.datanucleus.api.jdo.JDOPersistenceManager.deletePersistent(JDOPersistenceManager.java:820) at org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:7149) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) at com.sun.proxy.$Proxy0.cleanNotificationEvents(Unknown Source) at org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:277) NestedThrowablesStackTrace: The class "org.datanucleus.store.rdbms.query.ForwardQueryResult" is not persistable. This means that it either hasnt been enhanced, or that the enhanced version of the file is not in the CLASSPATH (or is hidden by an unenhanced version), or the Meta-Data/annotations for the class are not found. org.datanucleus.exceptions.ClassNotPersistableException: The class "org.datanucleus.store.rdbms.query.ForwardQueryResult" is not persistable. This means that it either hasnt been enhanced, or that the enhanced version of the file is not in the CLASSPATH (or is hidden by an unenhanced version), or the Meta-Data/annotations for the class are not found. at org.datanucleus.ExecutionContextImpl.assertClassPersistable(ExecutionContextImpl.java:5698) at org.datanucleus.ExecutionContextImpl.deleteObjectInternal(ExecutionContextImpl.java:2495) at org.datanucleus.ExecutionContextImpl.deleteObjectWork(ExecutionContextImpl.java:2466) at org.datanucleus.ExecutionContextImpl.deleteObject(ExecutionContextImpl.java:2417) at org.datanucleus.ExecutionContextThreadedImpl.deleteObject(ExecutionContextThreadedImpl.java:245) at org.datanucleus.api.jdo.JDOPersistenceManager.jdoDeletePersistent(JDOPersistenceManager.java:802) at org.datanucleus.api.jdo.JDOPersistenceManager.deletePersistent(JDOPersistenceManager.java:820) at org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:7149) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) at com.sun.proxy.$Proxy0.cleanNotificationEvents(Unknown Source) at org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:277) {noformat} The end result of this bug is that users of DbNotificationListener will have an evergrowing number of notification events that are not cleaned up as they age. This is an easy enough fix, but also shows that we have a lack of code coverage here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()
Sushanth Sowmyan created HIVE-12875: --- Summary: Verify sem.getInputs() and sem.getOutputs() Key: HIVE-12875 URL: https://issues.apache.org/jira/browse/HIVE-12875 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan For every partition entity object present in sem.getInputs() and sem.getOutputs(), we must ensure that the appropriate Table is also added to the list of entities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12630) Import should create a new WriteEntity for the new table it's creating to mimic CREATETABLE behaviour
Sushanth Sowmyan created HIVE-12630: --- Summary: Import should create a new WriteEntity for the new table it's creating to mimic CREATETABLE behaviour Key: HIVE-12630 URL: https://issues.apache.org/jira/browse/HIVE-12630 Project: Hive Issue Type: Bug Components: Authorization, Import/Export Affects Versions: 1.2.0, 1.3.0, 2.0.0, 2.1.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan CREATE-TABLE creates a new WriteEntity for the new table being created, whereas IMPORT does not mimic that behaviour. While SQLStandardAuth itself does not care about this difference, external Authorizers, as with Ranger can and do make a distinction on this, and can have policies set up on patterns for objects that do not yet exist. Thus, we must emit a WriteEntity for the yet-to-be-created table as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12345) Followup for HIVE-9013 : Hidden commands still visible through beeline
Sushanth Sowmyan created HIVE-12345: --- Summary: Followup for HIVE-9013 : Hidden commands still visible through beeline Key: HIVE-12345 URL: https://issues.apache.org/jira/browse/HIVE-12345 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan HIVE-9013 introduced the ability to hide certain conf variables when output through the "set" command. However, there still exists one further bug in it that causes these variables to still be visible through beeline connecting to HS2, wherein HS2 exposes hidden variables such as the HS2's metastore password when "set" is run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty
Sushanth Sowmyan created HIVE-12083: --- Summary: HIVE-10965 introduces thrift error if partNames or colNames are empty Key: HIVE-12083 URL: https://issues.apache.org/jira/browse/HIVE-12083 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan In the fix for HIVE-10965, there is a short-circuit path that causes an empty AggrStats object to be returned if partNames is empty or colNames is empty: {code} diff --git metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java index 0a56bac..ed810d2 100644 --- metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java +++ metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( public AggrStats aggrColStatsForPartitions(String dbName, String tableName, List partNames, List colNames, boolean useDensityFunctionForNDVEstimation) throws MetaException { +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); // Nothing to aggregate. long partsFound = partsFoundForPartitions(dbName, tableName, partNames, colNames); List colStatsList; // Try to read from the cache first {code} This runs afoul of thrift requirements that AggrStats have required fields: {code} struct AggrStats { 1: required list colStats, 2: required i64 partsFound // number of partitions for which stats were found } {code} Thus, we get errors as follows: {noformat} 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is unset! Struct:AggrStats(colStats:null, partsFound:0) at org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Normally, this would not occur since HIVE-10965 does also include a guard on the client-side for colNames.isEmpty() to not call the metastore call at all, but there is no guard for partNames being empty, and would still cause an error on the metastore side if the thrift call were called directly, as would happen if the client is from an odler version before this was patched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11936) Support SQLAnywhere as a backing DB for the hive metastore
Sushanth Sowmyan created HIVE-11936: --- Summary: Support SQLAnywhere as a backing DB for the hive metastore Key: HIVE-11936 URL: https://issues.apache.org/jira/browse/HIVE-11936 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan I've had pings from people interested in enabling the metastore to work on top of SQLAnywhere (17+), and thus, opening this jira to track changes needed in hive to make SQLAnywhere work as a backing db for the metastore. I have it working and passing all tests currently in my setup, and will upload patches as I'm able to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: unit tests in patches
+1 to Siddharth's suggestion - it makes it easier on people used to dealing with other conventions. On Tue, Sep 22, 2015 at 3:21 PM, Siddharth Sethwrote: > Can a 'Target Version' field be added to jiras ? That would help to get > rid of the confusion caused by Fix Version being used to represent branches > a jira does go into. > > On Mon, Sep 21, 2015 at 12:55 PM, Ashutosh Chauhan > wrote: > >> Hi everyone, >> >> Generally, its a good idea to add unit tests in patches especially when its >> easy to repro (e.g,., NPE). This may not always be possible, but we should >> aim to add tests wherever we can. In addition to regression testing, tests >> also proves the existence of bug. I would especially like to call out the >> attention of committers that they make sure patches they are committing has >> a test case. In case its not possible to repro test, there should be an >> explanation on jira. >> >> Related to this is affect versions and fix versions. Reporter should update >> this field while creating jiras. There is some confusion around exactly >> what a fix version is. Fix version indicates earliest version on which this >> fix is available. So, it should be updated after patch is committed to >> reflect which upcoming version it will be available on. Please don't use it >> as 'target version' that is a version on which you would like to see it >> fixed. >> >> Examples of commits where I didn't follow what I am preaching :) but plan >> to improve on: >> >> https://issues.apache.org/jira/browse/HIVE-9377 >> >> https://issues.apache.org/jira/browse/HIVE-9507 >> >> https://issues.apache.org/jira/browse/HIVE-11285 >> >> https://issues.apache.org/jira/browse/HIVE-9386 >> >> https://issues.apache.org/jira/browse/HIVE-10808 >>
[jira] [Created] (HIVE-11852) numRows and rawDataSize table properties are not replicated
Sushanth Sowmyan created HIVE-11852: --- Summary: numRows and rawDataSize table properties are not replicated Key: HIVE-11852 URL: https://issues.apache.org/jira/browse/HIVE-11852 Project: Hive Issue Type: Bug Components: Import/Export Affects Versions: 1.2.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan numRows and rawDataSize table properties are not replicated when exported for replication and re-imported. {code} Table drdbnonreplicatabletable.vanillatable has different TblProps from drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}] java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has different TblProps from drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11697) Add Unit Test to test serializability/deserializability of HCatSplits
Sushanth Sowmyan created HIVE-11697: --- Summary: Add Unit Test to test serializability/deserializability of HCatSplits Key: HIVE-11697 URL: https://issues.apache.org/jira/browse/HIVE-11697 Project: Hive Issue Type: Test Reporter: Sushanth Sowmyan As HIVE-11344 found, we should have unit tests for this scenario, and we need to add one in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11585) Explicitly set pmf.setDetachAllOnCommit on metastore unless configured otherwise
Sushanth Sowmyan created HIVE-11585: --- Summary: Explicitly set pmf.setDetachAllOnCommit on metastore unless configured otherwise Key: HIVE-11585 URL: https://issues.apache.org/jira/browse/HIVE-11585 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan datanucleus.detachAllOnCommit has a default value of false. However, we've observed a number of objects (especially FieldSchema objects) being retained that causes us OOM issues on the metastore. Hive should prefer using a default of datanucleus.detachAllOnCommit as true, unless otherwise explicitly overridden by users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [ANNOUNCE] New Hive Committer - Dmitry Tolpeko
Congrats, Dmitry! On Mon, Aug 3, 2015 at 2:02 PM, Jimmy Xiang jxi...@cloudera.com wrote: Congrats! On Mon, Aug 3, 2015 at 1:57 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: Congrats Dmitry! On Aug 3, 2015, at 1:54 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Congrats! On 15/8/3, 12:57, Vaibhav Gumashta vgumas...@hortonworks.com wrote: Congrats Dmitry! ‹Vaibhav On 8/3/15, 12:31 PM, Lefty Leverenz leftylever...@gmail.com wrote: Congratulations Dmitry! -- Lefty On Mon, Aug 3, 2015 at 2:33 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Dmitry Tolpeko a committer on the Apache Hive Project. Please join me in congratulating Dmitry! Thanks. - Carl
[jira] [Created] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it
Sushanth Sowmyan created HIVE-11344: --- Summary: HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it Key: HIVE-11344 URL: https://issues.apache.org/jira/browse/HIVE-11344 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan HIVE-9845 introduced a notion of compression for HCatSplits so that when serializing, it finds commonalities between PartInfo and TableInfo objects, and if the two are identical, it nulls out that field in PartInfo, thus making sure that when PartInfo is then serialized, info is not repeated. This, however, has the side effect of making the PartInfo object unusable if HCatSplit.write has been called. While this does not affect M/R directly, since they do not know about the PartInfo objects and once serialized, the HCatSplit object is recreated by deserializing on the backend, which does restore the split and its PartInfo objects, this does, however, affect framework users of HCat that try to mimic M/R and then use the PartInfo objects to instantiate distinct readers. Thus, we need to make it so that PartInfo is still usable after HCatSplit.write is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [ANNOUNCE] New Hive PMC Member - Sushanth Sowmyan
Thanks, all! :)
Re: [ANNOUNCE] New Hive Committer - Pengcheng Xiong
Congrats, Pengcheng! On Jul 16, 2015 11:17, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: Congrats Pengcheng! On Jul 16, 2015, at 11:11 AM, Vikram Dixit K vikram.di...@gmail.com wrote: Congratulations Pengcheng! On Thu, Jul 16, 2015 at 10:10 AM, Hari Subramaniyan hsubramani...@hortonworks.com wrote: Congrats Pengcheng! From: Chao Sun c...@cloudera.com Sent: Thursday, July 16, 2015 10:06 AM To: dev@hive.apache.org Subject: Re: [ANNOUNCE] New Hive Committer - Pengcheng Xiong Congrats Pengcheng! On Thu, Jul 16, 2015 at 10:03 AM, Szehon Ho sze...@cloudera.com wrote: Congrats! On Thu, Jul 16, 2015 at 6:47 AM, Vaibhav Gumashta vgumas...@hortonworks.com wrote: Congrats Pengcheng! ‹Vaibhav On 7/16/15, 7:12 PM, Chaoyu Tang ctang...@gmail.com wrote: Congratulations to Pengcheng! On Thu, Jul 16, 2015 at 9:10 AM, Xuefu Zhang xzh...@cloudera.com wrote: Congratulations, Pengcheng! On Thu, Jul 16, 2015 at 4:50 AM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Pengcheng Xiong a committer on the Apache Hive Project. Please join me in congratulating Pengcheng! Thanks. - Carl -- Nothing better than when appreciated for hard work. -Mark
[ANNOUNCE] Apache Hive 1.2.1 Released
The Apache Hive team is proud to announce the the release of Apache Hive version 1.2.1. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides: * Tools to enable easy data extract/transform/load (ETL) * A mechanism to impose structure on a variety of data formats * Access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM) * Query execution via Apache Hadoop MapReduce, Apache Tez Apache Spark frameworks. For Hive release details and downloads, please visit: https://hive.apache.org/downloads.html Hive 1.2.1 is an incremental release on top of Hive 1.2.0 and release notes are available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12332384styleName=TextprojectId=12310843 We would like to thank the many contributors who made this release possible. Regards, The Apache Hive Team
Re: [VOTE] Apache Hive 1.2.1 Release Candidate 0
I just sent out the announce mail for 1.2.1. I have now updated the wiki page to reflect rules for further commits to branch-1.2: a) The commit must not introduce any schema or interface changes. b) The commit must fix a bug that causes an outage/breakage (such as an NPE) in regular hive operation, or it must fix a data corruption issue or it must be a security fix with an appropriate CVE. If it meets those bars, you do not need to cc me or ask for my permission, you may go ahead and commit to branch-1.2, and I will keep watch on this branch. If it does not meet those bars, you are directed to target branch-1 or master instead. The goal for an eventual 1.2.2 is to ensure that this branch stays live for breaking fixes, but not so that we may keep landing new patches in released branches. Thanks all! -Sushanth On Tue, Jun 23, 2015 at 11:45 AM, Sushanth Sowmyan khorg...@gmail.com wrote: Thanks for testing and verifying, folks! With 4 PMC votes and 105 hours( 72 hours ) now having passed, the vote for releasing 1.2.1 RC0 as Hive 1.2.1 passes. I will go ahead and publish artifacts for the 1.2.1 release and send out mail about general availability. With this release, please note that commits to branch-1.2 are now restricted to a higher bar of necessity, and will require it to be fixing a product outage(such as an NPE when you run a query). I will update the wiki to that effect to indicate the process for further commits to the branch. For most part, please restrict commits to branch-1 and master from now on. I am amenable to doing a 1.2.2 release eventually if we have enough such issues, maybe about 3+ months out. Thanks all! -Sushanth On Sun, Jun 21, 2015 at 6:13 PM, Vikram Dixit K vikram.di...@gmail.com wrote: +1 built on both profiles and ran a simple query on the rc. Thanks Vikram. On Sat, Jun 20, 2015 at 7:47 AM, Thejas Nair thejas.n...@gmail.com wrote: +1 Checked signatures, checksums Checked release notes Reviewed changes in pom files. Built with hadoop2 and hadoop1. Ran some simple queries in local mode. On Fri, Jun 19, 2015 at 5:00 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: +1 Checked signatures, compiled, ran some tests. Thanks, Gunther. -- *From:* Alan Gates alanfga...@gmail.com *Sent:* Friday, June 19, 2015 11:44 AM *To:* dev@hive.apache.org *Subject:* Re: [VOTE] Apache Hive 1.2.1 Release Candidate 0 +1. Checked signatures, looked for binary files, compiled the code, and ran a rat check. Alan. Sushanth Sowmyan khorg...@gmail.com June 19, 2015 at 2:44 Hi Folks, It's been a month since 1.2.0, and I promised to do a stabilization 1.2.1 release, and this is it. A large number of patches have been applied since 1.2.0, and major known issues have been cleared/fixed. A few jiras were deferred out to 1.3/2.0 as not being ready to commit into 1.2.1 at this time. More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.1 Release Candidate 0 is available here: https://people.apache.org/~khorgath/releases/1.2.1_RC0/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1040/ Source tag for RC0 is up on the apache git repo as tag release-1.2.1-rc0 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=0f6ee99efc911cbc1566f9bbbc63a51600302703 ) Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, -Sushanth -- Nothing better than when appreciated for hard work. -Mark
Re: [VOTE] Apache Hive 1.2.1 Release Candidate 0
Thanks for testing and verifying, folks! With 4 PMC votes and 105 hours( 72 hours ) now having passed, the vote for releasing 1.2.1 RC0 as Hive 1.2.1 passes. I will go ahead and publish artifacts for the 1.2.1 release and send out mail about general availability. With this release, please note that commits to branch-1.2 are now restricted to a higher bar of necessity, and will require it to be fixing a product outage(such as an NPE when you run a query). I will update the wiki to that effect to indicate the process for further commits to the branch. For most part, please restrict commits to branch-1 and master from now on. I am amenable to doing a 1.2.2 release eventually if we have enough such issues, maybe about 3+ months out. Thanks all! -Sushanth On Sun, Jun 21, 2015 at 6:13 PM, Vikram Dixit K vikram.di...@gmail.com wrote: +1 built on both profiles and ran a simple query on the rc. Thanks Vikram. On Sat, Jun 20, 2015 at 7:47 AM, Thejas Nair thejas.n...@gmail.com wrote: +1 Checked signatures, checksums Checked release notes Reviewed changes in pom files. Built with hadoop2 and hadoop1. Ran some simple queries in local mode. On Fri, Jun 19, 2015 at 5:00 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: +1 Checked signatures, compiled, ran some tests. Thanks, Gunther. -- *From:* Alan Gates alanfga...@gmail.com *Sent:* Friday, June 19, 2015 11:44 AM *To:* dev@hive.apache.org *Subject:* Re: [VOTE] Apache Hive 1.2.1 Release Candidate 0 +1. Checked signatures, looked for binary files, compiled the code, and ran a rat check. Alan. Sushanth Sowmyan khorg...@gmail.com June 19, 2015 at 2:44 Hi Folks, It's been a month since 1.2.0, and I promised to do a stabilization 1.2.1 release, and this is it. A large number of patches have been applied since 1.2.0, and major known issues have been cleared/fixed. A few jiras were deferred out to 1.3/2.0 as not being ready to commit into 1.2.1 at this time. More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.1 Release Candidate 0 is available here: https://people.apache.org/~khorgath/releases/1.2.1_RC0/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1040/ Source tag for RC0 is up on the apache git repo as tag release-1.2.1-rc0 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=0f6ee99efc911cbc1566f9bbbc63a51600302703 ) Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, -Sushanth -- Nothing better than when appreciated for hard work. -Mark
[jira] [Created] (HIVE-11059) hcatalog-server-extensions tests scope should depend on hive-exec
Sushanth Sowmyan created HIVE-11059: --- Summary: hcatalog-server-extensions tests scope should depend on hive-exec Key: HIVE-11059 URL: https://issues.apache.org/jira/browse/HIVE-11059 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[VOTE] Apache Hive 1.2.1 Release Candidate 0
Hi Folks, It's been a month since 1.2.0, and I promised to do a stabilization 1.2.1 release, and this is it. A large number of patches have been applied since 1.2.0, and major known issues have been cleared/fixed. A few jiras were deferred out to 1.3/2.0 as not being ready to commit into 1.2.1 at this time. More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.1 Release Candidate 0 is available here: https://people.apache.org/~khorgath/releases/1.2.1_RC0/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1040/ Source tag for RC0 is up on the apache git repo as tag release-1.2.1-rc0 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=0f6ee99efc911cbc1566f9bbbc63a51600302703 ) Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, -Sushanth
Re: Getting ready for 1.2.1
Hi All, Please consider branch-1.2 frozen for now, till after the release process. Thanks, -Sushanth On Tue, Jun 16, 2015 at 12:28 AM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi folks, It's been nearly a month since 1.2.0, and when I did that release, I said I'd keep the branch open for any further non-db-changing, non-breaking patches, and from the sheer number of patches registered on the status page, that's been a good idea. Now, I think it's time to start drawing that to a close to see an stabilization update, and I would like to begin the process of rolling out release candidates for 1.2.1. I would like to start rolling out an RC0 by Wednesday night if no one objects. For now, the rules on committing to branch-1.2 remain the same: a) commit to branch-1 master first b) add me as a watcher on that jira c) add the bug to the release status wiki. Once I start the release process, I will once again increase the bar for commits as we did the last time. That said, this time, once we finish the release for 1.2.1, the bar on further commits to branch-1.2 is intended to remain at a higher level, so as to make sure we don't have too much of a back porting hassle - we will soon try to limit our commits to branch-1 and master only. Cheers, -Sushanth
[jira] [Created] (HIVE-11047) Update versions of branch-1.2 to 1.2.1
Sushanth Sowmyan created HIVE-11047: --- Summary: Update versions of branch-1.2 to 1.2.1 Key: HIVE-11047 URL: https://issues.apache.org/jira/browse/HIVE-11047 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Getting ready for 1.2.1
Hi All, I've gotten a couple of requests from a couple of folks about some important patches that should be part of 1.2.1, if we do have additional RCs, to keep the branch unfrozen, so I'm unfreezing it again for the time being. The same rules hold as before - to make any commits to branch-1.2, it should have already been committed to master and branch-1, and must be added to the wiki over at https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status . If you do not have permission to edit the wiki, please ping me, and I'll add it for you. On Thu, Jun 18, 2015 at 10:55 AM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi All, Please consider branch-1.2 frozen for now, till after the release process. Thanks, -Sushanth On Tue, Jun 16, 2015 at 12:28 AM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi folks, It's been nearly a month since 1.2.0, and when I did that release, I said I'd keep the branch open for any further non-db-changing, non-breaking patches, and from the sheer number of patches registered on the status page, that's been a good idea. Now, I think it's time to start drawing that to a close to see an stabilization update, and I would like to begin the process of rolling out release candidates for 1.2.1. I would like to start rolling out an RC0 by Wednesday night if no one objects. For now, the rules on committing to branch-1.2 remain the same: a) commit to branch-1 master first b) add me as a watcher on that jira c) add the bug to the release status wiki. Once I start the release process, I will once again increase the bar for commits as we did the last time. That said, this time, once we finish the release for 1.2.1, the bar on further commits to branch-1.2 is intended to remain at a higher level, so as to make sure we don't have too much of a back porting hassle - we will soon try to limit our commits to branch-1 and master only. Cheers, -Sushanth
[jira] [Created] (HIVE-11039) Write a tool to allow people with datanucelus.identifierFactory=datanucleus2 to migrate their metastore to datanucleus1 naming
Sushanth Sowmyan created HIVE-11039: --- Summary: Write a tool to allow people with datanucelus.identifierFactory=datanucleus2 to migrate their metastore to datanucleus1 naming Key: HIVE-11039 URL: https://issues.apache.org/jira/browse/HIVE-11039 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Critical We hit an interesting bug in a case where datanucleus.identifierFactory = datanucleus2 . The problem is that directSql handgenerates SQL strings assuming datanucleus1 naming scheme. If a user has their metastore JDO managed by datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are incorrect. One simple example of what this results in is the following: whenever DN persists a field which is held as a ListT, it winds up storing each T as a separate line in the appropriate mapping table, and has a column called INTEGER_IDX, which holds the position in the list. Then, upon reading, it automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which results in the list retaining its order. In DN2 naming scheme, the column is called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX and IDX. Whenever they use JDO, such as with all writes, it will then use the IDX field, and when they do any sort of optimized reads, such as through directSQL, it will ORDER BY INTEGER_IDX. An immediate danger is seen when we consider that the schema of a table is stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch schema for the table can come up mixed up in the table's native hashing order, rather than sorted by the index. This can then result in schema ordering being different from the actual table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this may return (c:string, a:int, b: string), and thus, queries which are inserting after selecting from another table can have ClassCastExceptions when trying to insert data in the wong order - this is how we discovered this bug. This problem, however, can be far worse, if there are no type problems - it is possible, for eg., that if a,bc were all strings, that that insert query would succeed but mix up the order, which then results in user table data being mixed up. This has the potential to be very bad. We should write a tool to help convert metastores that use datanucleus2 to datanucleus1(more difficult, needs more one-time testing) or change directSql to support both(easier to code, but increases test-coverage matrix significantly and we should really then be testing against both schemes). But in the short term, we should disable directSql if we see that the identifierfactory is datanucleus2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2
Sushanth Sowmyan created HIVE-11023: --- Summary: Disable directSQL if datanucleus.identifierFactory = datanucleus2 Key: HIVE-11023 URL: https://issues.apache.org/jira/browse/HIVE-11023 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan We hit an interesting bug in a case where datanucleus.identifierFactory = datanucleus2 . The problem is that directSql handgenerates SQL strings assuming datanucleus1 naming scheme. If a user has their metastore JDO managed by datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are incorrect. One simple example of what this results in is the following: whenever DN persists a field which is held as a ListT, it winds up storing each T as a separate line in the appropriate mapping table, and has a column called INTEGER_IDX, which holds the position in the list. Then, upon reading, it automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which results in the list retaining its order. In DN2 naming scheme, the column is called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX and IDX. Whenever they use JDO, such as with all writes, it will then use the IDX field, and when they do any sort of optimized reads, such as through directSQL, it will ORDER BY INTEGER_IDX. An immediate danger is seen when we consider that the schema of a table is stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch schema for the table can come up mixed up in the table's native hashing order, rather than sorted by the index. This can then result in schema ordering being different from the actual table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this may return (c:string, a:int, b: string), and thus, queries which are inserting after selecting from another table can have ClassCastExceptions when trying to insert data in the wong order - this is how we discovered this bug. This problem, however, can be far worse, if there are no type problems - it is possible, for eg., that if a,bc were all strings, that that insert query would succeed but mix up the order, which then results in user table data being mixed up. This has the potential to be very bad. We should write a tool to help convert metastores that use datanucleus2 to datanucleus1(more difficult, needs more one-time testing) or change directSql to support both(easier to code, but increases test-coverage matrix significantly and we should really then be testing against both schemes). But in the short term, we should disable directSql if we see that the identifierfactory is datanucleus2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Getting ready for 1.2.1
Hi folks, It's been nearly a month since 1.2.0, and when I did that release, I said I'd keep the branch open for any further non-db-changing, non-breaking patches, and from the sheer number of patches registered on the status page, that's been a good idea. Now, I think it's time to start drawing that to a close to see an stabilization update, and I would like to begin the process of rolling out release candidates for 1.2.1. I would like to start rolling out an RC0 by Wednesday night if no one objects. For now, the rules on committing to branch-1.2 remain the same: a) commit to branch-1 master first b) add me as a watcher on that jira c) add the bug to the release status wiki. Once I start the release process, I will once again increase the bar for commits as we did the last time. That said, this time, once we finish the release for 1.2.1, the bar on further commits to branch-1.2 is intended to remain at a higher level, so as to make sure we don't have too much of a back porting hassle - we will soon try to limit our commits to branch-1 and master only. Cheers, -Sushanth
[jira] [Created] (HIVE-10892) TestHCatClient should not accept external metastore param from -Dhive.metastore.uris
Sushanth Sowmyan created HIVE-10892: --- Summary: TestHCatClient should not accept external metastore param from -Dhive.metastore.uris Key: HIVE-10892 URL: https://issues.apache.org/jira/browse/HIVE-10892 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan HIVE-10074 added the ability to specify a -Dhive.metastore.uris from the commandline, so as to run the test against a deployed metastore. However, because of the way HiveConf is written, this results in that parameter always overriding any value specified in the conf passed into it for instantiation, since it accepts System Var Overrides. This results in some tests, notably those that attempt to connect between two metastores (such as TestHCatClient#testPartitionRegistrationWithCustomSchema to fail. Fixing this in HiveConf is not a good idea, since that behaviour is desired for HiveConf. Fixing this in HCatUtil.getHiveConf doesn't really work either, since that is a utility wrapper on HiveConf, and is supposed to behave similarly. Thus, the fix for this then becomes something to use in all our testcases, where we instantiate Configuration objects. It seems more appropriate to change the parameter we use to specify test parameters then, than to change each config object. Thus, we should change semantics for running this test against an external metastore by specifying the override in a different parameter name, say test.hive.metastore.uris, instead of hive.metastore.uris, which has a specific meaning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [ANNOUNCE] New Hive Committer - Chaoyu Tang
Congrats Chaoyu, welcome aboard! :) On May 20, 2015 3:45 PM, Vaibhav Gumashta vgumas...@hortonworks.com wrote: Congratulations! ‹Vaibhav On 5/20/15, 3:40 PM, Jimmy Xiang jxi...@cloudera.com wrote: Congrats!! On Wed, May 20, 2015 at 3:29 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Chaoyu Tang a committer on the Apache Hive Project. Please join me in congratulating Chaoyu! Thanks. - Carl