A question on how to mark Hive for HCatalog releases
All, I have a question of how we should manage marking Hive source code for HCatalog releases. For HCatalog 0.1 we used the the 0.7 version of Hive. But we will get to HCatalog 0.2 before the Hive community gets to 0.8. We have added features to Hive since 0.7 that we need in HCatalog 0.2. In general, it is not reasonable to assume that Hive and HCatalog releases will line up such that HCatalog can always depend on a released version of Hive. So how should we mark the proper version of Hive code for an HCatalog release? The only thing that comes to mind is tagging a particular revision, with the option to branch at that revision if necessary. We would only need a branch if something got checked in post tag that HCatalog needed or wanted, but there were other intervening check ins we did not want. In that case we would need to ask you to branch and port the needed change(s). Are you okay with this approach? Are there other approaches you would suggest or prefer? Alan.
Re: A question on how to mark Hive for HCatalog releases
Alright, qui tacit consentit (except for John, who explicitly agreed). We'll go with this plan. Alan. On Jun 22, 2011, at 12:46 PM, John Sichi wrote: Sounds good to me. JVS On Jun 21, 2011, at 4:21 PM, Alan Gates wrote: All, I have a question of how we should manage marking Hive source code for HCatalog releases. For HCatalog 0.1 we used the the 0.7 version of Hive. But we will get to HCatalog 0.2 before the Hive community gets to 0.8. We have added features to Hive since 0.7 that we need in HCatalog 0.2. In general, it is not reasonable to assume that Hive and HCatalog releases will line up such that HCatalog can always depend on a released version of Hive. So how should we mark the proper version of Hive code for an HCatalog release? The only thing that comes to mind is tagging a particular revision, with the option to branch at that revision if necessary. We would only need a branch if something got checked in post tag that HCatalog needed or wanted, but there were other intervening check ins we did not want. In that case we would need to ask you to branch and port the needed change(s). Are you okay with this approach? Are there other approaches you would suggest or prefer? Alan.
Re: [VOTE] Apache Hive 0.9.0 Release Candidate 2
+1. Ran through our end-to-end test framework (see https://issues.apache.org/jira/browse/HIVE-2670), results look good. Alan. On Apr 24, 2012, at 2:25 PM, Ashutosh Chauhan wrote: Downloaded the bits. Installed on 5 node cluster. Did create table. Ran basic queries. Ran unit tests. All looks good. +1 Thanks, Ashutosh On Tue, Apr 24, 2012 at 12:29, Ashutosh Chauhan hashut...@apache.orgwrote: Hey all, Apache Hive 0.9.0 Release Candidate 2 is available here: http://people.apache.org/~hashutosh/hive-0.9.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-094/ Change List is available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843version=12317742 Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Ashutosh
Re: Turn around on patches that do not need full unit testing
One approach I've seen other projects take is to have an ant test-commit target that users are responsible to run before committing. This is a short (15 min or less) target that runs all true unit tests (tests that exercise just a class or two in isolation) and a couple of functional tests that exercise major functionality but not every last thing. The full test suite can then be run nightly and any issues addressed. Alan. On Jun 11, 2012, at 6:17 AM, Edward Capriolo wrote: I agree. Having a short-test and long-test might make more sense. IE long-test includes funky serdes and UDFs. As for In the meanwhile, check in without test may introduce bug which can break production cluster.costly. the solution is not to run trunk. Run only releases. All the tests are run by jenkins post commit so we know when trunk is broken and we should not cut a release if all the tests are not passing. Also we should not knowingly break the build or leave it broken. IE would should strive to have all tests passing on trunk at all times, but not committing a typo patch for fear that the build might break does not make much sense. We can easily revert things in such a case. Edward On Sun, Jun 10, 2012 at 11:14 PM, Gang Liu g...@fb.com wrote: Yeah it is frustrated to take a long time to turn around for a tiny change. It is understood. In the meanwhile, check in without test may introduce bug which can break production cluster.costly. I think the problem is not if we should run test but running tests takes long time. If it takes reasonable time like 30 minutes, we have less pain. In a summary let us keep high quality via running test for every commit. Target to make unit test fast. Btw we can run test in parallel a hive wiki has details Thanks Sent from my iPhone On Jun 10, 2012, at 7:29 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Hive's unit tests take a long time. There are many simple patches we can get into hive earlier if we drop the notion of running the full test suite to QA every patch. For example: https://issues.apache.org/jira/browse/HIVE-3081 -- spelling mistakes that involved types https://issues.apache.org/jira/browse/HIVE-3061 -- patches with code cleanup https://issues.apache.org/jira/browse/HIVE-3048 -- patches that are one or two lines of code https://issues.apache.org/jira/browse/HIVE-2288 -- patches that are only additive Also I do not believe we should kick a patch back to someone for every tiny change. For example, suppose someone commits 9000 lines of code, with one typo. I have seen similar situations where the status gets reverted back to OPEN. It takes the person working on it a day to get back into the patch again, then by the time someone comes back around to reviewing another 3 days might go by. This is similar to a situation in the supermarket where You can only use one coupon so people walk in and out of the store 6 times to buy 6 items. Procedure and rules are followed, end results is really the same, but 6 times the work. In this case the committer should just make he change, re upload the patch and say 'committed with typo fixed' and commit. please comment, Edward
Fwd: DesignLounge @ HadoopSummit
Begin forwarded message: From: Eric Baldeschwieler eri...@hortonworks.com Date: June 11, 2013 10:46:25 AM PDT To: common-...@hadoop.apache.org common-...@hadoop.apache.org Subject: DesignLounge @ HadoopSummit Reply-To: common-...@hadoop.apache.org Hi Folks, We thought we'd try something new at Hadoop Summit this year to build upon two pieces of feedback I've heard a lot this year: Apache project developers would like to take advantage of the Hadoop summit to meet with their peers to on work on specific technical details of their projects That they want to do this during the summit, not before it starts or at night. I've been told BoFs and other such traditional formats have not historically worked for them, because they end up being about educating users about their projects, not actually working with their peers on how to make their projects better. So we are creating a space in the summit - marked in the event guide as DesignLounge - concurrent with the presentation tracks where Apache Project contributors can meet with their peers to plan the future of their project or work through various technical issues near and dear to their hearts. We're going to provide white boards and message boards and let folks take it from there in an unconference style. We think there will be room for about 4 groups to meet at once. Interested? Let me know what you think. Send me any ideas for how we can make this work best for you. The room will be 231A and B at the Hadoop Summit and will run from 10:30am to 5:00pm on Day 1 (26th June), and we can also run from 10:30am to 5:00pm on Day 2 (27th June) if we have a lot of topics that folk want to cover. Some of the early topics some folks told me they hope can be covered: Hadoop Core security proposals. There are a couple of detailed proposals circulating. Let's get together and hash out the differences. Accumulo 1.6 features The Hive vectorization project. Discussion of the design and how to phase it in incrementally with minimum complexity. Finishing Yarn - what things need to get done NOW to make Yarn more effective If you are a project lead for one of the Apache projects, look at the schedule below and suggest a few slots when you think it would be best for your project to meet. I'll try to work out a schedule where no more than 2 projects are using the lounge at once. Day 1, 26th June: 10:30am - 12:30pm, 1:45pm - 3:30pm, 3:45pm - 5:00pm Day 2, 27th June: 10:30am - 12:30pm, 1:45pm - 3:30pm, 3:45pm - 5:00pm It will be up to you, the hadoop contributors, from there. Look forward to seeing you all at the summit, E14 PS Please forward to the other -dev lists. This event is for folks on the -dev lists.
Fwd: DesignLounge @ HadoopSummit
Begin forwarded message: From: Eric Baldeschwieler eri...@hortonworks.com Date: June 23, 2013 9:32:12 PM PDT To: common-...@hadoop.apache.org common-...@hadoop.apache.org, mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.org, hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org Subject: DesignLounge @ HadoopSummit Reply-To: common-...@hadoop.apache.org Hi Folks, I've integrated the feedback I've gotten on the design lounge. A couple of clarifications: 1) The space will be open both days of the formal summit. Apache Committers / contributors are invited to stop by any time and use the space to meet / network any time during the show. 2) Below I've listed the times that various project members have suggested they will be present to talk with others contributors about their project. If we get a big showing for any of these slots we'll encourage folks to do the unconference thing: Select a set of topics they want to talk about and break up into groups to do so. 3) This is an experiment. Our goal is to make the summit as useful as possible to the folks who build the Apache projects in the Apache Hadoop stack. Please let me know how it works for you and ideas for making this even more effective. Committed times so far, with topic champion (Note - I've adjusted suggested times to fit with the program a bit more smoothly): Wednesday 11-1 - Hive - Ashutosh - The stinger initiative and other Hive activities 2 - 4 - Security breakout - Kevin Minder - HSSO, Knox, Rhino 3 - 4 - Frameworks to run services like HBase on Yarn - Weave, Hoya … - Devaraj Das 4 - 5 - Accumulo - Billie Rinaldi Thursday 11-1 - Finishing Yarn - Arun Murthy - Near term improvements needed 2 - 4 - HDFS - Suresh Sanjay 4 - 5 - Getting involved in Apache - Billie Rinaldi See you all soon! E14 PS Please forward to other Apache -dev lists and CC me. Thanks! On Jun 11, 2013, at 10:42 AM, Eric Baldeschwieler eri...@hortonworks.com wrote: Hi Folks, We thought we'd try something new at Hadoop Summit this year to build upon two pieces of feedback I've heard a lot this year: • Apache project developers would like to take advantage of the Hadoop summit to meet with their peers to on work on specific technical details of their projects • That they want to do this during the summit, not before it starts or at night. I've been told BoFs and other such traditional formats have not historically worked for them, because they end up being about educating users about their projects, not actually working with their peers on how to make their projects better. So we are creating a space in the summit - marked in the event guide as DesignLounge - concurrent with the presentation tracks where Apache Project contributors can meet with their peers to plan the future of their project or work through various technical issues near and dear to their hearts. We're going to provide white boards and message boards and let folks take it from there in an unconference style. We think there will be room for about 4 groups to meet at once. Interested? Let me know what you think. Send me any ideas for how we can make this work best for you. The room will be 231A and B at the Hadoop Summit and will run from 10:30am to 5:00pm on Day 1 (26th June), and we can also run from 10:30am to 5:00pm on Day 2 (27th June) if we have a lot of topics that folk want to cover. Some of the early topics some folks told me they hope can be covered: • Hadoop Core security proposals. There are a couple of detailed proposals circulating. Let's get together and hash out the differences. • Accumulo 1.6 features • The Hive vectorization project. Discussion of the design and how to phase it in incrementally with minimum complexity. • Finishing Yarn - what things need to get done NOW to make Yarn more effective If you are a project lead for one of the Apache projects, look at the schedule below and suggest a few slots when you think it would be best for your project to meet. I'll try to work out a schedule where no more than 2 projects are using the lounge at once. Day 1, 26th June: 10:30am - 12:30pm, 1:45pm - 3:30pm, 3:45pm - 5:00pm Day 2, 27th June: 10:30am - 12:30pm, 1:45pm - 3:30pm, 3:45pm - 5:00pm It will be up to you, the hadoop contributors, from there. Look forward to seeing you all at the summit, E14 PS Please forward to the other -dev lists. This event is for folks on the -dev lists.
Re: call it Hive-SQL instead of HiveQL ?
I'm +1 for calling it Hive SQL. No one knows what HQL is when they see the initials. Hive Query Language? Hadoop Query Language? Harold's Query Language? I agree with Ed that we should be up front about what Hive is and isn't and about where it's going and where it isn't. Whenever people ask me if being fully SQL-92 or SQL-2003 compliant or whatever is a goal I always say no. There's stuff in those specs Hive probably will never do. But to me that doesn't mean it isn't SQL. Apache Derby calls its access language SQL. Yet it doesn't support outer joins or tiny int or a number of other things Hive does. SQLite calls its access language SQL and it has similar restrictions. People understand that every data store has different dialect of SQL. Hive's dialect is particularly crude in some respects (lacking some standard features and datatypes) and doing anything real requires concepts not known in other SQL dialects (like what SerDe do you want your table to use). Some of these we can address and some are a part of being on Hadoop. One final analogy. When a child is learning a language they 1) don't know as many words as an adult does and often don't understand adult usage even when they know all the words; and 2) use made up/nonsense words. Yet no one says the child doesn't speak the language or speaks a different language. You just recognize that the child is growing and learning the language. How is Hive different? It is growing and adding more parts of SQL all the time. Alan. On Jul 3, 2013, at 12:26 AM, Thejas Nair wrote: On Tue, Jul 2, 2013 at 8:39 PM, Edward Capriolo edlinuxg...@gmail.com wrote: What is in a name? :) Which SQL feature you are talking about here, that forces single reducer and hence should not be supported? Joining on anything besides = comes to mind Pretty sure the query mentioned here will not work (without being re-written) http://en.wikipedia.org/wiki/SQL SELECT isbn, title, price FROM Book WHERE price (SELECT AVG(price) FROM Book) ORDER BY title; Don't you think hive should be supporting this ? Don't you think our users would want this ? You can do theta joins without using single reducer (cartesian product can be done in parallel). But that is besides the point. I don't expect hive to be 100% sql compliant. I don't see 100% sql compliance as a goal, but I see more SQL compliance as desirable. That is why I prefer the term Hive-SQL. Hive-SQL looks like it is trying to convey the idea that hive supports extensions like T-SQL http://en.wikipedia.org/wiki/Transact-SQL or PL/SQL. http://www.oracle.com/technetwork/database/features/plsql/index.html. If I refert to something as Oracle-SQL or DB2-SQL, I think people understand that it is a Oracle or DB2 dialect of SQL that I refer to. Lessons from my mother. You can't be half a saint. considering how much other databases deviate from the standard - http://troels.arvin.dk/db/rdbms/ . See how much deviation is there for example in 'limit clause' or the data types supported (and details of data type support) - If all your friends jumped off a bridge would you do it? My friends are very smart, if they jump of the bridge, there is probably a very good reason to do so, and I would seriously consider it. I think hive has many smart friends like DB2, Oracle, teradata, vertica, impala, and even phoenix (https://github.com/forcedotcom/phoenix). As you can see there is a wide range in SQL compliance across products. I don't see anything wrong in saying that hive is SQL on hadoop. I think I have conveyed everything I wanted to say on this topic. I will stop and listen to what others think before we go from half saints and jumping over the bridge to Hitler :) (http://en.wikipedia.org/wiki/Godwin's_law) (there I said it!!) I am looking forward to hearing if anybody else thinks calling it Hive-SQL will make them confuse it for something like PL/SQL. Also want to know if others think calling it HiveQL gives more clarity about it aiming to be SQL on hadoop Thanks, Thejas
Re: Tez branch and tez based patches
On Jul 13, 2013, at 9:48 AM, Edward Capriolo wrote: I have started to see several re factoring patches around tez. https://issues.apache.org/jira/browse/HIVE-4843 This is the only mention on the hive list I can find with tez: Makes sense. I will create the branch soon. Thanks, Ashutosh On Tue, Jun 11, 2013 at 7:44 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: Hi, I am starting to work on integrating Tez into Hive (see HIVE-4660, design doc has already been uploaded - any feedback will be much appreciated). This will be a fair amount of work that will take time to stabilize/test. I'd like to propose creating a branch in order to be able to do this incrementally and collaboratively. In order to progress rapidly with this, I would also like to go commit-then-review. Thanks, Gunther. These refactor-ings are largely destructive to a number of bugs and language improvements in hive.The language improvements and bug fixes that have been sitting in Jira for quite some time now marked patch-available and are waiting for review. There are a few things I want to point out: 1) Normally we create design docs in out wiki (which it is not) 2) Normally when the change is significantly complex we get multiple committers to comment on it (which we did not) On point 2 no one -1 the branch, but this is really something that should have required a +1 from 3 committers. The Hive bylaws, https://cwiki.apache.org/confluence/display/Hive/Bylaws , lay out what votes are needed for what. I don't see anything there about needing 3 +1s for a branch. Branching would seem to fall under code change, which requires one vote and a minimum length of 1 day. I for one am not completely sold on Tez. http://incubator.apache.org/projects/tez.html. directed-acyclic-graph of tasks for processing data this description sounds like many things which have never become popular. One to think of is oozie Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.. I am sure I can find a number of libraries/frameworks that make this same claim. In general I do not feel like we have done our homework and pre-requisites to justify all this work. If we have done the homework, I am sure that it has not been communicated and accepted by hive developers at large. A request for better documentation on Tez and a project road map seems totally reasonable. If we have a branch, why are we also committing on trunk? Scanning through the tez doc the only language I keep finding language like minimal changes to the planner yet, there is ALREADY lots of large changes going on! Really none of the above would bother me accept for the fact that these minimal changes are causing many patch available ready-for-review bugs and core hive features to need to be re based. I am sure I have mentioned this before, but I have to spend 12+ hours to test a single patch on my laptop. A few days ago I was testing a new core hive feature. After all the tests passed and before I was able to commit, someone unleashed a tez patch on trunk which caused the thing I was testing for 12 hours to need to be rebased. I'm not cool with this.Next time that happens to me I will seriously consider reverting the patch. Bug fixes and new hive features are more important to me then integrating with incubator projects. (With my Apache member hat on) Reverting patches that aren't breaking the build is considered very bad form in Apache. It does make sense to request that when people are going to commit a patch that will break many other patches they first give a few hours of notice so people can say something if they're about to commit another patch and avoid your fate of needing to rerun the tests. The other thing is we need to get get the automated build of patches working on Hive so committers are forced to run all of the tests themselves. We are working on it, but we're not there yet. Alan.
Re: Tez branch and tez based patches
Ed, I'm not sure I understand your argument, so I'm going to try to restate it. Please tell me if I understand it correctly. I think you're saying we should not embark on big projects in Hive because: 1) There were big projects in the past that were abandoned or are not currently making progress (such as Oracle integration, Hive StorageHandler) 2) There are other big projects going on (ORC, Vectorization) 3) There are lots of out standing patches that need to be dealt with. I would respond with two points to this. First, I agree that the large out standing patch count is very bad. It keeps people from getting involved in Hive. It deprives Hive of fixes and improvements it would otherwise have. Several of the committers are working to address this by checking in peoples' patches, but they are unable to keep up. The best solution is to encourage other committers to check in patches as well and to find willing and able contributors and mentor them to committership as quickly as possible. Second, the way Apache works is that contributors scratch the itch that bothers them. So to argue We shouldn't do X because we never finished Y or We shouldn't do X because we're doing Y (where X and Y are independent) is not valid in Apache projects. It's fine to argue that Tez hasn't been adequately explained (I think you hinted at this in previous emails) and ask for clarifications on what it is and what the planned changes are. If after a full explanation you think it's a bad idea it's fine to argue Tez is the wrong direction for Hive and try to convince the rest of the community. But assuming the community accepts that Tez is a reasonable direction and there are volunteers who want to do the work, then you can't argue they should work on something else instead. Alan. On Jul 15, 2013, at 6:51 PM, Edward Capriolo wrote: The Hive bylaws, https://cwiki.apache.org/confluence/display/Hive/Bylaws, lay out what votes are needed for what. I don't see anything there about needing 3 +1s for a branch. Branching would seem to fall under code change, which requires one vote and a minimum length of 1 day. You could argue that all you need is one +1 to create a branch, but this is more then a branch. If you are talking about something that is: 1) going to cause major re-factoring of critical pieces of hive like ExecDriver and MapRedTask 2) going to be very disruptive to the efforts of other committers 3) something that may be a major architectural change Getting the project on board with the idea is a good idea. Now I want to point something out. Here are some recent initiatives in hive: 1) At one point there was a big initiative to support oracle after the initial work, there are patches in Jira no one seems to care about oracle support. 2) Another such decisions was this support windows one, there are probably 4 windows patches waiting reviews. 3) I still have no clue what the official hadoop1 hadoop2, hadoop 0.23 support prospective is, but every couple weeks we get another jira about something not working/testing on one of those versions, seems like several builds are broken. 4) Hive-storage handler, after the initial implementation no one cares to review any other storage handler implementation, 3 patches there or more, could not even find anyone willing to review the cassandra storage handler I spent months on. 5) OCR, Vectorization 6) Windowing: committed, numerous check-style violations. We have !!!160+!!! PATCH_AVAILABLE Jira issues. Few active committers. We are spread very thin, and embarking on another side project not involved with core hive seems like the wrong direction at the moment. On Mon, Jul 15, 2013 at 8:37 PM, Alan Gates ga...@hortonworks.com wrote: On Jul 13, 2013, at 9:48 AM, Edward Capriolo wrote: I have started to see several re factoring patches around tez. https://issues.apache.org/jira/browse/HIVE-4843 This is the only mention on the hive list I can find with tez: Makes sense. I will create the branch soon. Thanks, Ashutosh On Tue, Jun 11, 2013 at 7:44 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: Hi, I am starting to work on integrating Tez into Hive (see HIVE-4660, design doc has already been uploaded - any feedback will be much appreciated). This will be a fair amount of work that will take time to stabilize/test. I'd like to propose creating a branch in order to be able to do this incrementally and collaboratively. In order to progress rapidly with this, I would also like to go commit-then-review. Thanks, Gunther. These refactor-ings are largely destructive to a number of bugs and language improvements in hive.The language improvements and bug fixes that have been sitting in Jira for quite some time now marked patch-available and are waiting for review. There are a few things I want to point out: 1) Normally we create design
Re: Tez branch and tez based patches
are talking about something that is: 1) going to cause major re-factoring of critical pieces of hive like ExecDriver and MapRedTask 2) going to be very disruptive to the efforts of other committers 3) something that may be a major architectural change Getting the project on board with the idea is a good idea. Now I want to point something out. Here are some recent initiatives in hive: 1) At one point there was a big initiative to support oracle after the initial work, there are patches in Jira no one seems to care about oracle support. 2) Another such decisions was this support windows one, there are probably 4 windows patches waiting reviews. 3) I still have no clue what the official hadoop1 hadoop2, hadoop 0.23 support prospective is, but every couple weeks we get another jira about something not working/testing on one of those versions, seems like several builds are broken. 4) Hive-storage handler, after the initial implementation no one cares to review any other storage handler implementation, 3 patches there or more, could not even find anyone willing to review the cassandra storage handler I spent months on. 5) OCR, Vectorization 6) Windowing: committed, numerous check-style violations. We have !!!160+!!! PATCH_AVAILABLE Jira issues. Few active committers. We are spread very thin, and embarking on another side project not involved with core hive seems like the wrong direction at the moment. On Mon, Jul 15, 2013 at 8:37 PM, Alan Gates ga...@hortonworks.com wrote: On Jul 13, 2013, at 9:48 AM, Edward Capriolo wrote: I have started to see several re factoring patches around tez. https://issues.apache.org/jira/browse/HIVE-4843 This is the only mention on the hive list I can find with tez: Makes sense. I will create the branch soon. Thanks, Ashutosh On Tue, Jun 11, 2013 at 7:44 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: Hi, I am starting to work on integrating Tez into Hive (see HIVE-4660, design doc has already been uploaded - any feedback will be much appreciated). This will be a fair amount of work that will take time to stabilize/test. I'd like to propose creating a branch in order to be able to do this incrementally and collaboratively. In order to progress rapidly with this, I would also like to go commit-then-review. Thanks, Gunther. These refactor-ings are largely destructive to a number of bugs and language improvements in hive.The language improvements and bug fixes that have been sitting in Jira for quite some time now marked patch-available and are waiting for review. There are a few things I want to point out: 1) Normally we create design docs in out wiki (which it is not) 2) Normally when the change is significantly complex we get multiple committers to comment on it (which we did not) On point 2 no one -1 the branch, but this is really something that should have required a +1 from 3 committers. The Hive bylaws, https://cwiki.apache.org/confluence/display/Hive/Bylaws, lay out what votes are needed for what. I don't see anything there about needing 3 +1s for a branch. Branching would seem to fall under code change, which requires one vote and a minimum length of 1 day. I for one am not completely sold on Tez. http://incubator.apache.org/projects/tez.html. directed-acyclic-graph of tasks for processing data this description sounds like many things which have never become popular. One to think of is oozie Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.. I am sure I can find a number of libraries/frameworks that make this same claim. In general I do not feel like we have done our homework and pre-requisites to justify all this work. If we have done the homework, I am sure that it has not been communicated and accepted by hive developers at large. A request for better documentation on Tez and a project road map seems totally reasonable. If we have a branch, why are we also committing on trunk? Scanning through the tez doc the only language I keep finding language like minimal changes to the planner yet, there is ALREADY lots of large changes going on! Really none of the above would bother me accept for the fact that these minimal changes are causing many patch available ready-for-review bugs and core hive features to need to be re based. I am sure I have mentioned this before, but I have to spend 12+ hours to test a single patch on my laptop. A few days ago I was testing a new core hive feature. After all the tests passed and before I was able to commit, someone unleashed a tez patch on trunk which caused the thing I was testing for 12 hours to need to be rebased. I'm not cool with this.Next time that happens to me I will seriously consider reverting the patch. Bug fixes and new hive features are more important to me then integrating with incubator projects. (With my Apache member
Re: Tez branch and tez based patches
On Jul 17, 2013, at 1:41 PM, Edward Capriolo wrote: In my opinion we should limit the amount of tez related optimizations to and trunk Refactoring that cleans up code is good, but as you have pointed out there wont be a tez release until sometime this fall, and this branch will be open for an extended period of time. Thus code cleanups and other tez related refactoring does not need to be disruptive to trunk. I agree with this, though I suspect people will end up arguing about the meaning of code cleanup and disruptive. In my discussions with Gunther he said he was doing code cleanup and it was not disruptive. You obviously disagreed. I've already suggested that any future patches that break lots of others should have their checkin preceded by a few hours notice that the patch will break things so others can say something if they are about to check in too. I'd also be interested to hear from Gunther how much more general cleanup he feels is necessary on trunk. I have another relevant question, which I already probably know the answer to, but I will ask it anyway. Because tez is a YARN application, does this mean that Tez will be the first hive feature that will require YARN? (It seems like the answer is yes) Yes, it will only work in the Hadoop 2.x world. So obviously all this work needs to be done in a way that still allows Hive to use the MR execution engine in the Hadoop 1.x world. Alan.
Re: HIVE-4266 - Refactor HCatalog code to org.apache.hive.hcatalog
It won't be committed Friday. The patch will need to sit a few days for people to look over it. I doubt it will be committed before 8/5. Alan. On Jul 24, 2013, at 10:37 AM, Eugene Koifman wrote: I'm hoping to have the patch ready Friday, not sure if it will get committed the same day or not. On Wed, Jul 24, 2013 at 7:27 AM, Brock Noland br...@cloudera.com wrote: Hi, What day do you plan on doing this? I am working on a change which I don't believe will be ready this week. Cheers! Brock On Tue, Jul 23, 2013 at 1:09 PM, Eugene Koifman ekoif...@hortonworks.comwrote: I'm planning to change the package name of all hcatalog classes sometime this week (as was promised for 0.12). This is likely to affect any outstanding hcatalog patches on trunk. Please try to have them checked in as soon as possible. Thanks, Eugene -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
Re: [Discuss] project chop up
I'm not sure how this is different from what hcat does today. It needs Hive's jars to compile, so it's one of the last things in the compile step. Would moving the other modules you note to be in the same category be enough? Did you want to also make it so that the default ant target doesn't compile those? Alan. On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote: My mistake on saying hcat was a fork metastore. I had a brain fart for a moment. One way we could do this is create a folder called downstream. In our release step we can execute the downstream builds and then copy the files we need back. So nothing downstream will be on the classpath of the main project. This could help us breakup ql as well. Things like exotic file formats , and things that are pluggable like zk locking can go here. That might be overkill. For now we can focus on building downstream and hivethrift1might be the first thing to try to downstream. On Friday, July 26, 2013, Thejas Nair the...@hortonworks.com wrote: +1 to the idea of making the build of core hive and other downstream components independent. bq. I was under the impression that Hcat and hive-metastore was supposed to merge up somehow. The metastore code was never forked. Hcat was just using hive-metastore and making the metadata available to rest of hadoop (pig, java MR..). A lot of the changes that were driven by hcat goals were being made in hive-metastore. You can think of hcat as set of libraries that let pig and java MR use hive metastore. Since hcat is closely tied to hive-metastore, it makes sense to have them in same project. On Fri, Jul 26, 2013 at 6:33 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Also i believe hcatalog web can fall into the same designation. Question , hcatalog was initily a big hive-metastore fork. I was under the impression that Hcat and hive-metastore was supposed to merge up somehow. What is the status on that? I remember that was one of the core reasons we brought it in. On Friday, July 26, 2013, Edward Capriolo edlinuxg...@gmail.com wrote: I prefer option 3 as well. On Fri, Jul 26, 2013 at 12:52 AM, Brock Noland br...@cloudera.com wrote: On Thu, Jul 25, 2013 at 9:48 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I have been developing my laptop on a duel core 2 GB Ram laptop for years now. With the addition of hcatalog, hive-thrift2, and some other growth trying to develop hive in a eclipse on this machine craws, especially if 'build automatically' is turned on. As we look to add on more things this is only going to get worse. I am also noticing issues like this: https://issues.apache.org/jira/browse/HIVE-4849 What I think we should do is strip down/out optional parts of hive. 1) Hive Hbase This should really be it's own project to do this right we really have to have multiple branches since hbase is not backwards compatible. 2) Hive Web Interface Now really a big project but not really critical can be just as easily be build separately 3) hive thrift 1 We have hive thrift 2 now, it is time for the sun to set on hivethrift1, 4) odbc Not entirely convinced about this one but it is really not critical to running hive. What I think we should do is create sub-projects for the above things or simply move them into directories that do not build with hive. Ideally they would use maven to pull dependencies. What does everyone think? I agree that projects like the HBase handler and probably others as well should somehow be downstream projects which simply depend on the hive jars. I see a couple alternatives for this: * Take the module in question to the Apache Incubator * Move the module in question to the Apache Extras * Breakup the projects within our own source tree I'd prefer the third option at this point. Brock Brock
Re: Tez branch and tez based patches
Which talk are you referencing here? AFAIK all the Hive code we've written is being pushed back into the Tez branch, so you should be able to see it there. Alan. On Jul 29, 2013, at 9:02 PM, Edward Capriolo wrote: At ~25:00 There is a working prototype of hive which is using tez as the targeted runtime Can I get a look at that code? Is it on github? Edward On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates ga...@hortonworks.com wrote: Answers to some of your questions inlined. Alan. On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote: There are some points I want to bring up. First, I am on the PMC. Here is something I find relevant: http://www.apache.org/foundation/how-it-works.html -- The role of the PMC from a Foundation perspective is oversight. The main role of the PMC is not code and not coding - but to ensure that all legal issues are addressed, that procedure is followed, and that each and every release is the product of the community as a whole. That is key to our litigation protection mechanisms. Secondly the role of the PMC is to further the long term development and health of the community as a whole, and to ensure that balanced and wide scale peer review and collaboration does happen. Within the ASF we worry about any community which centers around a few individuals who are working virtually uncontested. We believe that this is detrimental to quality, stability, and robustness of both code and long term social structures. https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different - All other decisions happen on the dev list, discussions on the private list are kept to a minimum. If it didn't happen on the dev list, it didn't happen - which leads to: a) Elections of committers and PMC members are published on the dev list once finalized. b) Out-of-band discussions (IRC etc.) are summarized on the dev list as soon as they have impact on the project, code or community. - https://issues.apache.org/jira/browse/HIVE-4660 ironically titled Let their be Tez has not be +1 ed by any committer. It was never discussed on the dev or the user list (as far as I can tell). As all JIRA creations and updates are sent to dev@hive, creating a JIRA is de facto posting to the list. As a PMC member I feel we need more discussion on Tez on the dev list along with a wiki-fied design document. Topics of discussion should include: I talked with Gunther and he's working on posting a design doc on the wiki. He has a PDF on the JIRA but he doesn't have write permissions yet on the wiki. 1) What is tez? In Hadoop 2.0, YARN opens up the ability to have multiple execution frameworks in Hadoop. Hadoop apps are no longer tied to MapReduce as the only execution option. Tez is an effort to build an execution engine that is optimized for relational data processing, such as Hive and Pig. The biggest change here is to move away from only Map and Reduce as processing options and to allow alternate combinations of processing, such as map - reduce - reduce or tasks that take multiple inputs or shuffles that avoid sorting when it isn't needed. For a good intro to Tez, see Arun's presentation on it at the recent Hadoop summit (video http://www.youtube.com/watch?v=9ZLLzlsz7h8 slides http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212) 2) How is tez different from oozie, http://code.google.com/p/hop/, http://cs.brown.edu/~backman/cmr.html , and other DAG and or streaming map reduce tools/frameworks? Why should we use this and not those? Oozie is a completely different thing. Oozie is a workflow engine and a scheduler. It's core competencies are the ability to coordinate workflows of disparate job types (MR, Pig, Hive, etc.) and to schedule them. It is not intended as an execution engine for apps such as Pig and Hive. I am not familiar with these other engines, but the short answer is that Tez is built to work on YARN, which works well for Hive since it is tied to Hadoop. 3) When can we expect the first tez release? I don't know, but I hope sometime this fall. 4) How much effort is involved in integrating hive and tez? Covered in the design doc. 5) Who is ready to commit to this effort? I'll let people speak for themselves on that one. 6) can we expect this work to be done in one hive release? Unlikely. Initial integration will be done in one release, but as Tez is a new project I expect it will be adding features in the future that Hive will want to take advantage of. In my opinion we should not start any work on this tez-hive until these questions are answered to the satisfaction of the hive developers. Can we change this to not commit patches? We can't tell willing people not to work
Re: Tez branch and tez based patches
On Jul 29, 2013, at 9:53 PM, Edward Capriolo wrote: Also watched http://www.ustream.tv/recorded/36323173 I definitely see the win in being able to stream inter-stage output. I see some cases where small intermediate results can be kept In memory. But I was somewhat under the impression that the map reduce spill settings kept stuff in memory, isn't that what spill settings are? No. MapReduce always writes shuffle data to local disk. And intermediate results between MR jobs are always persisted to HDFS, as there's no other option. When we talk of being able to keep intermediate results in memory we mean getting rid of both of these disk writes/reads when appropriate (meaning not always, there's a trade off between speed and error handling to be made here, see below for more details). There is a few bullet points that came up repeatedly that I do not follow: Something was said to the effect of Container reuse makes X faster. Hadoop has jvm reuse. Not following what the difference is here? Not everyone has a 10K node cluster. Sharing JVMs across users is inherently insecure (we can't guarantee what code the first user left behind that may interfere with later users). As I understand container re-use in Tez it constrains the re-use to one user for security reasons, but still avoids additional JVM start up costs. But this is a question that the Tez guys could answer better on the Tez lists (d...@tez.incubator.apache.org) Joins in map reduce are hard Really? I mean some of them are I guess, but the typical join is very easy. Just shuffle by the join key. There was not really enough low level details here saying why joins are better in tez. Join is not a natural operation in MapReduce. MR gives you one input and one output. You end up having to bend the rules to do have multiple inputs. The idea here is that Tez can provide operators that naturally work with joins and other operations that don't fit the one input/one output model (eg unions, etc.). Chosing the number of maps and reduces is hard Really? I do not find it that hard, I think there are times when it's not perfect but I do not find it hard. The talk did not really offer anything here technical on how tez makes this better other then it could make it better. Perhaps manual would be a better term here than hard. In our experience it takes quite a bit of engineer trial and error to determine the optimal numbers. This may be ok if you're going to invest the time once and then run the same query every day for 6 months. But obviously it doesn't work for the ad hoc case. Even in the batch case it's not optimal because every once and a while an engineer has to go back and re-optimize the query to deal with changing data sizes, data characteristics, etc. We want the optimizer to handle this without human intervention. The presentations mentioned streaming data, how do two nodes stream data between a tasks and how it it reliable? If the sender or receiver dies does the entire process have to start again? If the sender or receiver dies then the query has to be restarted from some previous point where data was persisted to disk. The idea here is that speed vs error recovery trade offs should be made by the optimizer. If the optimizer estimates that a query will complete in 5 seconds it can stream everything and if a node fails it just re-runs the whole query. If it estimates that a particular phase of a query will run for an hour it can choose to persist the results to HDFS so that in the event of a failure downstream the long phase need not be re-run. Again we want this to be done automatically by the system so the user doesn't need to control this level of detail. Again one of the talks implied there is a prototype out there that launches hive jobs into tez. I would like to see that, it might answer more questions then a power point, and I could profile some common queries. As mentioned in a previous email afaik Gunther's pushed all these changes to the Tez branch in Hive. Alan. Random late night thoughts over, Ed On Tue, Jul 30, 2013 at 12:02 AM, Edward Capriolo edlinuxg...@gmail.comwrote: At ~25:00 There is a working prototype of hive which is using tez as the targeted runtime Can I get a look at that code? Is it on github? Edward On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates ga...@hortonworks.com wrote: Answers to some of your questions inlined. Alan. On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote: There are some points I want to bring up. First, I am on the PMC. Here is something I find relevant: http://www.apache.org/foundation/how-it-works.html -- The role of the PMC from a Foundation perspective is oversight. The main role of the PMC is not code and not coding - but to ensure that all legal issues are addressed, that procedure is followed, and that each and every
Re: [VOTE] Apache Hive 0.12.0 Release Candidate 0
There's already a JIRA for this, https://issues.apache.org/jira/browse/HIVE-4731 it just needs a patch. Given that Brock is working to move the build to Maven we should wait until that is done before adding this to the build. Alan. On Oct 8, 2013, at 5:13 PM, Mark Grover wrote: Thejas, Thanks for working on Hive 0.12 release! I work on Apache Bigtop http://bigtop.apache.org and we build rpm and deb packages by building and packaging the source tarballs. Most components (if not all) release a source tarball. Releasing a source tarball would make it make Hive consistent with other projects in terms of what is released, and would make life easier for those users who may not want binaries (like some Hive developers and Bigtop). I don't know how much work it will be, but both I personally and the larger Bigtop community would greatly appreciate if Hive released a source tarball for 0.12 release. Would love to hear what you think. Thanks again! Mark On Tue, Oct 8, 2013 at 3:56 PM, Thejas Nair the...@hortonworks.com wrote: On Tue, Oct 8, 2013 at 8:18 AM, Brock Noland br...@cloudera.com wrote: Hi Thejas, Again thank you very much for all the hard work! Two items of discussion: The tag contains .gitignore files so I believe the source tarball (src/ directory) should as well. It is strange that other files files with . prefix do get included (.checkstyle, .arcconfig ), but .gitignore doesn't get included. This might be a wider item than the current release. However, our source tarball actually contains all the hive-*.jar files in addition to the all the libraries. Beyond that the source tarball actually doesn't match the tag structure, the src directory of the source tarball does. I think we should change this at some point so the source tarball structure exactly matches the tag. Yes, I think we should address this for the next release. It might take some time to get this done right. Brock On Mon, Oct 7, 2013 at 11:02 PM, Thejas Nair the...@hortonworks.com wrote: Carl pointed some issues with the RC. I will be rolling out a new RC to address those (hopefully sometime tomorrow). If anybody finds additional issues, please let me know, so that I can address those as well in the next RC. HIVE-5489 - NOTICE copyright dates are out of date HIVE-5488 - some files are missing apache license headers On Mon, Oct 7, 2013 at 4:38 PM, Thejas Nair the...@hortonworks.com wrote: Yes, that is the correct tag. Thanks for pointing it out. I also update the tag as it was a little behind what is in the RC (found some issues with maven-publish). I have also updated the release vote email template in hive HowToRelease wiki page, to include note about the tag . Thanks, Thejas On Mon, Oct 7, 2013 at 4:26 PM, Brock Noland br...@cloudera.com wrote: Hi Thejas, Thank you very much for the hard work! I believe the vote email should contain a link to the tag we are voting on. I assume the tag is: release-0.12.0-rc0 ( http://svn.apache.org/viewvc/hive/tags/release-0.12.0-rc0/). Is that correct? Brock On Mon, Oct 7, 2013 at 6:02 PM, Thejas Nair the...@hortonworks.com wrote: Apache Hive 0.12.0 Release Candidate 0 is available here: http://people.apache.org/~thejas/hive-0.12.0-rc0/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-138/ This release has 406 issues fixed. This includes several new features such as data types date and varchar, optimizer improvements, ORC format improvements and many bug fixes. Hcatalog packages have now moved to org.apache.hive.hcatalog (from org.apache.hcatalog), and the maven packages are published under org.apache.hive.hcatalog. Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Thejas -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this
Re: [DISCUSS] HCatalog becoming a subproject of Hive
Changes made. Alan. On Jan 21, 2013, at 12:39 PM, Carl Steinbach wrote: Hi Alan, Overall this looks good to me. I have a couple small suggestions: * Replace occurrences of Hive's subversion repository with Hive's source code repository. * In the Actions table the sentence This also covers the creation of new sub-projects within the project should be changed to This also covers the creation of new sub-projects and sub-modules within the project. Thanks. Carl On Fri, Jan 18, 2013 at 4:42 PM, Alan Gates ga...@hortonworks.com wrote: I've created a wiki page for my proposed changes at https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committers Text to be removed is struck through. Text to be added is in italics. Any recommended changes before we vote? Alan. On Jan 17, 2013, at 2:08 PM, Carl Steinbach wrote: Sounds like a good plan to me. Since Ashutosh is a member of both the Hive and HCatalog PMCs it probably makes more sense for him to call the vote, but I'm willing to do it too. On Wed, Jan 16, 2013 at 8:24 AM, Alan Gates ga...@hortonworks.com wrote: If you think that's the best path forward that's fine. I can't call a vote I don't think, since I'm not part of the Hive PMC. But I'm happy to draft a resolution for you and then let you call the vote. Should I do that? Alan. On Jan 11, 2013, at 4:34 PM, Carl Steinbach wrote: Hi Alan, I agree that submitting this for a vote is the best option. If anyone has additional proposed modifications please make them. Otherwise I propose that the Hive PMC vote on this proposal. In order for the Hive PMC to be able to vote on these changes they need to be expressed in terms of one or more of the actions listed at the end of the Hive project bylaws: https://cwiki.apache.org/confluence/display/Hive/Bylaws So I think we first need to amend to the bylaws in order to define the rights and privileges of a submodule committer, and then separately vote the HCatalog committers in as Hive submodule committers. Does this make sense? Thanks. Carl
Re: [VOTE] Amend Hive Bylaws + Add HCatalog Submodule
Most excellent. I'll start the vote in the HCatalog PPMC to approve this, and assuming that passes I'll then start a vote in the IPMC per the guidelines at http://incubator.apache.org/guides/graduation.html#subproject Alan. On Feb 4, 2013, at 2:27 PM, Carl Steinbach wrote: The following active Hive PMC members have cast votes: Carl Steinbach: +1, +1 Ashutosh Chauhan: +1, +1 Edward Capriolo: +1, +1 Ashish Thusoo: +1, +1 Yongqiang He: +1, +1 Namit Jain: +1, +1 Three active PMC members have abstained from voting. Over the last week the following four Hive PMC members requested that their status be changed from active to emeritus member: jvs, prasadc, zhao, pauly. Voting on these measures is now closed. Both measures have been approved with the required 2/3 majority of active Hive PMC members. Thanks. Carl On Thu, Jan 31, 2013 at 2:04 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: +1 and +1 non-binding. Great to see this happen! Thanks, +Vinod On Thu, Jan 31, 2013 at 12:14 AM, Namit Jain nj...@fb.com wrote: +1 and +1 On 1/30/13 6:53 AM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: +1 and +1 Thanks, Gunther. On Tue, Jan 29, 2013 at 5:18 PM, Edward Capriolo edlinuxg...@gmail.comwrote: Measure 1: +1 Measure 2: +1 On Mon, Jan 28, 2013 at 2:47 PM, Carl Steinbach c...@apache.org wrote: I am calling a vote on the following two measures. Measure 1: Amend Hive Bylaws to Define Submodules and Submodule Committers If this measure passes the Apache Hive Project Bylaws will be amended with the following changes: https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive +Bylaws+for+Submodule+Committers The motivation for these changes is discussed in the following email thread which appeared on the hive-dev and hcatalog-dev mailing lists: http://markmail.org/thread/u5nap7ghvyo7euqa Measure 2: Create HCatalog Submodule and Adopt HCatalog Codebase This measure provides for 1) the establishment of an HCatalog submodule in the Apache Hive Project, 2) the adoption of the Apache HCatalog codebase into the Hive HCatalog submodule, and 3) adding all currently active HCatalog committers as submodule committers on the Hive HCatalog submodule. Passage of this measure depends on the passage of Measure 1. Voting: Both measures require +1 votes from 2/3 of active Hive PMC members in order to pass. All participants in the Hive project are encouraged to vote on these measures, but only votes from active Hive PMC members are binding. The voting period commences immediately and shall last a minimum of six days. Voting is carried out by replying to this email thread. You must indicate which measure you are voting on in order for your vote to be counted. More details about the voting process can be found in the Apache Hive Project Bylaws: https://cwiki.apache.org/confluence/display/Hive/Bylaws -- +Vinod Hortonworks Inc. http://hortonworks.com/
Fwd: [VOTE] Graduate HCatalog from the incubator and become part of Hive
FYI. Alan. Begin forwarded message: From: Alan Gates ga...@hortonworks.com Date: February 4, 2013 10:18:09 PM PST To: hcatalog-...@incubator.apache.org Subject: [VOTE] Graduate HCatalog from the incubator and become part of Hive The Hive PMC has voted to accept HCatalog as a submodule of Hive. You can see the vote thread at http://mail-archives.apache.org/mod_mbox/hive-dev/201301.mbox/%3cCACf6RrzktBYD0suZxn3Pfv8XkR=vgwszrzyb_2qvesuj2vh...@mail.gmail.com%3e . We now need to vote to graduate from the incubator and become a submodule of Hive. This entails the following: 1) the establishment of an HCatalog submodule in the Apache Hive Project; 2) the adoption of the Apache HCatalog codebase into the Hive HCatalog submodule; and 3) adding all currently active HCatalog committers as submodule committers on the Hive HCatalog submodule. Definitions for all these can be found in the (now adopted) Hive bylaws at https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committer. This vote will stay open for at least 72 hours (thus 23:00 PST on 2/7/13). PPMC members votes are binding in this vote, though input from all is welcome. If this vote passes the next step will be to submit the graduation motion to the Incubator PMC. Here's my +1. Alan.
Merging HCatalog into Hive
Alright, our vote has passed, it's time to get on with merging HCatalog into Hive. Here's the things I can think of we need to deal with. Please add additional issues I've missed: 1) Moving the code 2) Dealing with domain names in the code 3) The mailing lists 4) The JIRA 5) The website 6) Committer rights 7) Make a proposal for how HCat is released going forward 8) Publish an FAQ Proposals for how we handle these: Below I propose an approach for how to handle each of these. Feedback welcome. 1) Moving the code I propose that HCat move into a subdirectory of Hive. This fits nicely into Hive's structure since it already has metastore, ql, etc. We'd just add 'hcatalog' as a new directory. This directory would contain hcatalog as it is today. It does not follow Hive's standard build model so we'd need to do some work to make it so that building Hive also builds HCat, but this should be minimal. 2) Dealing with domain names HCat code currently is under org.apache.hcatalog. Do we want to change it? In time we probably should change it to match the rest of Hive (org.apache.hadoop.hive.hcatalog). We need to do this in a backward compatible way. I propose we leave it as is for now and if we decide to in the future we can move the actual code to org.apache.hadoop.hive.hcatalog and create shell classes under org.apache.hcatalog. 3) The mailing lists Given that our goal is to merge the projects and not create a subproject we should merge the mailing lists rather than keep hcat specific lists. We can ask infra to remove hcatalog-*@incubator.apache.org and forward any new mail to the appropriate Hive lists. We need to find out if they can auto-subscribe people from the hcat lists to the hive lists. Given that traffic on the Hive lists is an order of magnitude higher we should warn people before we auto-subscribe them and allow them a chance to get off. 4) JIRA We can create an hcatalog component in Hive's JIRA. All new HCat issues could be filed there. I don't know if there's a way to upload existing JIRAs into Hive's JIRA, but I think it would be better to leave them where they are. We should see if infra can turn off the ability to create new JIRAs in hcatalog. 5) Website We will need to integrate HCatalog's website with Hive's. This should be easy except for the documentation. HCat uses forrest for docs, Hive uses wiki. We will need to put links under 'Documentation' for older versions of HCat docs so users can find them. As far as how docs are handled for the next version of HCatalog, I think that depends on the answer to question 7 (next release of HCat), but I propose that HCat needs to conform to the way Hive does docs on wiki. Though I would strongly encourage the HCat docs to be version specific (that is, have a set of wiki pages for each version). incubator.apache.org/hcatalog should be changed to forward to hive.apache.org. 6) Committer rights Carl will need to set up committer rights for all the new HCat committers. Based on our discussion of making active HCat committers Hive submodule committers this would add the following set: Alan, Sushanth, Francis, Daniel, Vandana, Travis, and Mithun. Ashutosh and Paul are already Hive committers, and neither Devaraj nor Mac have been active in HCat in over a year. 7) Future releases We need to discuss how future releases will happen, as I think this will help developers and users know how to respond to the merge. I propose that HCat will simply become part of future Hive releases. Thus Hive 0.11 (or whatever the next major release is) will include HCatalog. If there are issues found we may need to make HCatalog 0.5.x releases from Hive, which should be fine. But I propose there would not be an HCat 0.6. To be clear I am not proposing that HCat functionality would be subsumed into Hive jars. Just that the existing hcat jars would become part of Hive's release. 8) Communicate all of this We should put up an FAQ page that has this information, as well as tracks our progress while we work on getting these things done. Alan.
Re: Merging HCatalog into Hive
On Feb 24, 2013, at 12:22 PM, Brock Noland wrote: Looks good from my perspective and I glad to see this moving forward. Regarding #4 (JIRA) I don't know if there's a way to upload existing JIRAs into Hive's JIRA, but I think it would be better to leave them where they are. JIRA has a bulk move feature, but I am curious as why we would leave them under the old project? There might be good reason to orphan them, but my first thought is that it would be nice to have them under the HIVE project simply for search purposes. I was thinking it would be hard for people who had bookmarks or pointers to the existing JIRAs. Also, since it would change all the JIRA numbers on closed JIRAs it would make records from previous releases a mess. But I see what you're saying about making search hard. Maybe there's a way to leave the historical info where it is while importing any active JIRAs into Hive so people can search them. Alan. Brock
Re: Merging HCatalog into Hive
Alright, I've gotten some feedback from Brock around the JIRA stuff and Carl in a live conversation expressed his desire to move hcat into the Hive namespace sooner rather than later. So the proposal is that we'd move the code to org.apache.hive.hcatalog, though we would create shell classes and interfaces in org.apache.hcatalog for all public classes and interfaces so that it will be backward compatible. I'm fine with doing this now. So, let's get started. Carl, could you create an hcatalog directory under trunk/hive and grant the listed hcat committers karma on it? Then I'll get started on moving the actual code. Alan. On Feb 24, 2013, at 12:22 PM, Brock Noland wrote: Looks good from my perspective and I glad to see this moving forward. Regarding #4 (JIRA) I don't know if there's a way to upload existing JIRAs into Hive's JIRA, but I think it would be better to leave them where they are. JIRA has a bulk move feature, but I am curious as why we would leave them under the old project? There might be good reason to orphan them, but my first thought is that it would be nice to have them under the HIVE project simply for search purposes. Brock On Fri, Feb 22, 2013 at 7:12 PM, Alan Gates ga...@hortonworks.com wrote: Alright, our vote has passed, it's time to get on with merging HCatalog into Hive. Here's the things I can think of we need to deal with. Please add additional issues I've missed: 1) Moving the code 2) Dealing with domain names in the code 3) The mailing lists 4) The JIRA 5) The website 6) Committer rights 7) Make a proposal for how HCat is released going forward 8) Publish an FAQ Proposals for how we handle these: Below I propose an approach for how to handle each of these. Feedback welcome. 1) Moving the code I propose that HCat move into a subdirectory of Hive. This fits nicely into Hive's structure since it already has metastore, ql, etc. We'd just add 'hcatalog' as a new directory. This directory would contain hcatalog as it is today. It does not follow Hive's standard build model so we'd need to do some work to make it so that building Hive also builds HCat, but this should be minimal. 2) Dealing with domain names HCat code currently is under org.apache.hcatalog. Do we want to change it? In time we probably should change it to match the rest of Hive (org.apache.hadoop.hive.hcatalog). We need to do this in a backward compatible way. I propose we leave it as is for now and if we decide to in the future we can move the actual code to org.apache.hadoop.hive.hcatalog and create shell classes under org.apache.hcatalog. 3) The mailing lists Given that our goal is to merge the projects and not create a subproject we should merge the mailing lists rather than keep hcat specific lists. We can ask infra to remove hcatalog-*@incubator.apache.org and forward any new mail to the appropriate Hive lists. We need to find out if they can auto-subscribe people from the hcat lists to the hive lists. Given that traffic on the Hive lists is an order of magnitude higher we should warn people before we auto-subscribe them and allow them a chance to get off. 4) JIRA We can create an hcatalog component in Hive's JIRA. All new HCat issues could be filed there. I don't know if there's a way to upload existing JIRAs into Hive's JIRA, but I think it would be better to leave them where they are. We should see if infra can turn off the ability to create new JIRAs in hcatalog. 5) Website We will need to integrate HCatalog's website with Hive's. This should be easy except for the documentation. HCat uses forrest for docs, Hive uses wiki. We will need to put links under 'Documentation' for older versions of HCat docs so users can find them. As far as how docs are handled for the next version of HCatalog, I think that depends on the answer to question 7 (next release of HCat), but I propose that HCat needs to conform to the way Hive does docs on wiki. Though I would strongly encourage the HCat docs to be version specific (that is, have a set of wiki pages for each version). incubator.apache.org/hcatalog should be changed to forward to hive.apache.org. 6) Committer rights Carl will need to set up committer rights for all the new HCat committers. Based on our discussion of making active HCat committers Hive submodule committers this would add the following set: Alan, Sushanth, Francis, Daniel, Vandana, Travis, and Mithun. Ashutosh and Paul are already Hive committers, and neither Devaraj nor Mac have been active in HCat in over a year. 7) Future releases We need to discuss how future releases will happen, as I think this will help developers and users know how to respond to the merge. I propose that HCat will simply become part of future Hive releases. Thus Hive 0.11 (or whatever the next major release is) will include HCatalog. If there are issues found we may need to make
Re: Review Request: HIVE-4145. Create hcatalog stub directory and add it to the build
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9848/#review17816 --- Ship it! Ship It! - Alan Gates On March 11, 2013, 4:27 a.m., Carl Steinbach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9848/ --- (Updated March 11, 2013, 4:27 a.m.) Review request for hive and Ashutosh Chauhan. Description --- This patch creates an hcatalog stub directory. Alan requested this. Once the patch is committed I will contact ASFINFRA and request that they grant karma on the directory to the hcatalog submodule committers. This addresses bug HIVE-4145. https://issues.apache.org/jira/browse/HIVE-4145 Diffs - build-common.xml e68ecea build.properties 2d293a6 build.xml b5c69d3 hcatalog/build.xml PRE-CREATION hcatalog/ivy.xml PRE-CREATION hcatalog/src/java/org/apache/hive/hcatalog/package-info.java PRE-CREATION Diff: https://reviews.apache.org/r/9848/diff/ Testing --- Thanks, Carl Steinbach
Re: Merging HCatalog into Hive
Proposed changes look good to me. And you don't need an infra ticket to grant karma. Since you're Hive VP you can do it. See http://www.apache.org/dev/pmc.html#SVNaccess Alan. On Mar 10, 2013, at 9:29 PM, Carl Steinbach wrote: Hi Alan, I submitted a patch that creates the hcatalog directory and makes some other necessary changes here: https://issues.apache.org/jira/browse/HIVE-4145 Once this is committed I will contact ASFINFRA and ask them to grant the HCatalog committers karma. Thanks. Carl On Sat, Mar 9, 2013 at 12:54 PM, Alan Gates ga...@hortonworks.com wrote: Alright, I've gotten some feedback from Brock around the JIRA stuff and Carl in a live conversation expressed his desire to move hcat into the Hive namespace sooner rather than later. So the proposal is that we'd move the code to org.apache.hive.hcatalog, though we would create shell classes and interfaces in org.apache.hcatalog for all public classes and interfaces so that it will be backward compatible. I'm fine with doing this now. So, let's get started. Carl, could you create an hcatalog directory under trunk/hive and grant the listed hcat committers karma on it? Then I'll get started on moving the actual code. Alan. On Feb 24, 2013, at 12:22 PM, Brock Noland wrote: Looks good from my perspective and I glad to see this moving forward. Regarding #4 (JIRA) I don't know if there's a way to upload existing JIRAs into Hive's JIRA, but I think it would be better to leave them where they are. JIRA has a bulk move feature, but I am curious as why we would leave them under the old project? There might be good reason to orphan them, but my first thought is that it would be nice to have them under the HIVE project simply for search purposes. Brock On Fri, Feb 22, 2013 at 7:12 PM, Alan Gates ga...@hortonworks.com wrote: Alright, our vote has passed, it's time to get on with merging HCatalog into Hive. Here's the things I can think of we need to deal with. Please add additional issues I've missed: 1) Moving the code 2) Dealing with domain names in the code 3) The mailing lists 4) The JIRA 5) The website 6) Committer rights 7) Make a proposal for how HCat is released going forward 8) Publish an FAQ Proposals for how we handle these: Below I propose an approach for how to handle each of these. Feedback welcome. 1) Moving the code I propose that HCat move into a subdirectory of Hive. This fits nicely into Hive's structure since it already has metastore, ql, etc. We'd just add 'hcatalog' as a new directory. This directory would contain hcatalog as it is today. It does not follow Hive's standard build model so we'd need to do some work to make it so that building Hive also builds HCat, but this should be minimal. 2) Dealing with domain names HCat code currently is under org.apache.hcatalog. Do we want to change it? In time we probably should change it to match the rest of Hive (org.apache.hadoop.hive.hcatalog). We need to do this in a backward compatible way. I propose we leave it as is for now and if we decide to in the future we can move the actual code to org.apache.hadoop.hive.hcatalog and create shell classes under org.apache.hcatalog. 3) The mailing lists Given that our goal is to merge the projects and not create a subproject we should merge the mailing lists rather than keep hcat specific lists. We can ask infra to remove hcatalog-*@incubator.apache.org and forward any new mail to the appropriate Hive lists. We need to find out if they can auto-subscribe people from the hcat lists to the hive lists. Given that traffic on the Hive lists is an order of magnitude higher we should warn people before we auto-subscribe them and allow them a chance to get off. 4) JIRA We can create an hcatalog component in Hive's JIRA. All new HCat issues could be filed there. I don't know if there's a way to upload existing JIRAs into Hive's JIRA, but I think it would be better to leave them where they are. We should see if infra can turn off the ability to create new JIRAs in hcatalog. 5) Website We will need to integrate HCatalog's website with Hive's. This should be easy except for the documentation. HCat uses forrest for docs, Hive uses wiki. We will need to put links under 'Documentation' for older versions of HCat docs so users can find them. As far as how docs are handled for the next version of HCatalog, I think that depends on the answer to question 7 (next release of HCat), but I propose that HCat needs to conform to the way Hive does docs on wiki. Though I would strongly encourage the HCat docs to be version specific (that is, have a set of wiki pages for each version). incubator.apache.org/hcatalog should be changed to forward to hive.apache.org. 6) Committer rights Carl will need to set up committer rights for all
Re: subscribe to your lists
Send email to user-subscr...@hive.apache.org and dev-subscr...@hive.apache.org. Alan. On Mar 7, 2013, at 7:17 AM, Lin Picouleau wrote: Hi, I would like to subscribe to your lists to get involved with Hive project. Thank you! Lin Picouleau
Re: Merging HCatalog into Hive
Excellent, thank you Carl. I'll start on the process to move the code then. Alan. On Mar 15, 2013, at 5:26 PM, Carl Steinbach wrote: Hi Alan, I committed HIVE-4145, created an HCatalog component on JIRA, and updated the asf-authorization-template to give the HCatalog committers karma on the hcatalog subdirectory. At this point I think everything should be ready to go. Let me know if you run into any problems. Thanks. Carl On Wed, Mar 13, 2013 at 11:56 AM, Alan Gates ga...@hortonworks.com wrote: Proposed changes look good to me. And you don't need an infra ticket to grant karma. Since you're Hive VP you can do it. See http://www.apache.org/dev/pmc.html#SVNaccess Alan. On Mar 10, 2013, at 9:29 PM, Carl Steinbach wrote: Hi Alan, I submitted a patch that creates the hcatalog directory and makes some other necessary changes here: https://issues.apache.org/jira/browse/HIVE-4145 Once this is committed I will contact ASFINFRA and ask them to grant the HCatalog committers karma. Thanks. Carl On Sat, Mar 9, 2013 at 12:54 PM, Alan Gates ga...@hortonworks.com wrote: Alright, I've gotten some feedback from Brock around the JIRA stuff and Carl in a live conversation expressed his desire to move hcat into the Hive namespace sooner rather than later. So the proposal is that we'd move the code to org.apache.hive.hcatalog, though we would create shell classes and interfaces in org.apache.hcatalog for all public classes and interfaces so that it will be backward compatible. I'm fine with doing this now. So, let's get started. Carl, could you create an hcatalog directory under trunk/hive and grant the listed hcat committers karma on it? Then I'll get started on moving the actual code. Alan. On Feb 24, 2013, at 12:22 PM, Brock Noland wrote: Looks good from my perspective and I glad to see this moving forward. Regarding #4 (JIRA) I don't know if there's a way to upload existing JIRAs into Hive's JIRA, but I think it would be better to leave them where they are. JIRA has a bulk move feature, but I am curious as why we would leave them under the old project? There might be good reason to orphan them, but my first thought is that it would be nice to have them under the HIVE project simply for search purposes. Brock On Fri, Feb 22, 2013 at 7:12 PM, Alan Gates ga...@hortonworks.com wrote: Alright, our vote has passed, it's time to get on with merging HCatalog into Hive. Here's the things I can think of we need to deal with. Please add additional issues I've missed: 1) Moving the code 2) Dealing with domain names in the code 3) The mailing lists 4) The JIRA 5) The website 6) Committer rights 7) Make a proposal for how HCat is released going forward 8) Publish an FAQ Proposals for how we handle these: Below I propose an approach for how to handle each of these. Feedback welcome. 1) Moving the code I propose that HCat move into a subdirectory of Hive. This fits nicely into Hive's structure since it already has metastore, ql, etc. We'd just add 'hcatalog' as a new directory. This directory would contain hcatalog as it is today. It does not follow Hive's standard build model so we'd need to do some work to make it so that building Hive also builds HCat, but this should be minimal. 2) Dealing with domain names HCat code currently is under org.apache.hcatalog. Do we want to change it? In time we probably should change it to match the rest of Hive (org.apache.hadoop.hive.hcatalog). We need to do this in a backward compatible way. I propose we leave it as is for now and if we decide to in the future we can move the actual code to org.apache.hadoop.hive.hcatalog and create shell classes under org.apache.hcatalog. 3) The mailing lists Given that our goal is to merge the projects and not create a subproject we should merge the mailing lists rather than keep hcat specific lists. We can ask infra to remove hcatalog-*@incubator.apache.org and forward any new mail to the appropriate Hive lists. We need to find out if they can auto-subscribe people from the hcat lists to the hive lists. Given that traffic on the Hive lists is an order of magnitude higher we should warn people before we auto-subscribe them and allow them a chance to get off. 4) JIRA We can create an hcatalog component in Hive's JIRA. All new HCat issues could be filed there. I don't know if there's a way to upload existing JIRAs into Hive's JIRA, but I think it would be better to leave them where they are. We should see if infra can turn off the ability to create new JIRAs in hcatalog. 5) Website We will need to integrate HCatalog's website with Hive's. This should be easy except for the documentation. HCat uses forrest
Re: Getting Started
Check out https://cwiki.apache.org/confluence/display/Hive/HowToContribute Alan. On Mar 21, 2013, at 4:05 PM, Kole Reece wrote: Hi, What would be the best way to get started getting familiar and making contributions.
Re: Merging HCatalog into Hive
There's an issue with the permissions here. In the authorization file you granted permission to hcatalog committers on a directory /hive/hcatalog. But in Hive you created /hive/trunk/hcatalog, which none of the hcatalog committers can access. In the authorization file you'll need to change hive-hcatalog to have authorization /hive/trunk/hcatalog. There is also a scalability issue. Every time Hive branches you'll have to add a line for that branch as well. Also, this will prohibit any dev branches for hcatalog users, or access to any dev branches done in Hive. I suspect you'll find it much easier to give the hive-hcatalog group access to /hive and then use community mores to enforce that no hcat committers commit outside the hcat directory. Alan. On Mar 15, 2013, at 5:26 PM, Carl Steinbach wrote: Hi Alan, I committed HIVE-4145, created an HCatalog component on JIRA, and updated the asf-authorization-template to give the HCatalog committers karma on the hcatalog subdirectory. At this point I think everything should be ready to go. Let me know if you run into any problems. Thanks. Carl On Wed, Mar 13, 2013 at 11:56 AM, Alan Gates ga...@hortonworks.com wrote: Proposed changes look good to me. And you don't need an infra ticket to grant karma. Since you're Hive VP you can do it. See http://www.apache.org/dev/pmc.html#SVNaccess Alan. On Mar 10, 2013, at 9:29 PM, Carl Steinbach wrote: Hi Alan, I submitted a patch that creates the hcatalog directory and makes some other necessary changes here: https://issues.apache.org/jira/browse/HIVE-4145 Once this is committed I will contact ASFINFRA and ask them to grant the HCatalog committers karma. Thanks. Carl On Sat, Mar 9, 2013 at 12:54 PM, Alan Gates ga...@hortonworks.com wrote: Alright, I've gotten some feedback from Brock around the JIRA stuff and Carl in a live conversation expressed his desire to move hcat into the Hive namespace sooner rather than later. So the proposal is that we'd move the code to org.apache.hive.hcatalog, though we would create shell classes and interfaces in org.apache.hcatalog for all public classes and interfaces so that it will be backward compatible. I'm fine with doing this now. So, let's get started. Carl, could you create an hcatalog directory under trunk/hive and grant the listed hcat committers karma on it? Then I'll get started on moving the actual code. Alan. On Feb 24, 2013, at 12:22 PM, Brock Noland wrote: Looks good from my perspective and I glad to see this moving forward. Regarding #4 (JIRA) I don't know if there's a way to upload existing JIRAs into Hive's JIRA, but I think it would be better to leave them where they are. JIRA has a bulk move feature, but I am curious as why we would leave them under the old project? There might be good reason to orphan them, but my first thought is that it would be nice to have them under the HIVE project simply for search purposes. Brock On Fri, Feb 22, 2013 at 7:12 PM, Alan Gates ga...@hortonworks.com wrote: Alright, our vote has passed, it's time to get on with merging HCatalog into Hive. Here's the things I can think of we need to deal with. Please add additional issues I've missed: 1) Moving the code 2) Dealing with domain names in the code 3) The mailing lists 4) The JIRA 5) The website 6) Committer rights 7) Make a proposal for how HCat is released going forward 8) Publish an FAQ Proposals for how we handle these: Below I propose an approach for how to handle each of these. Feedback welcome. 1) Moving the code I propose that HCat move into a subdirectory of Hive. This fits nicely into Hive's structure since it already has metastore, ql, etc. We'd just add 'hcatalog' as a new directory. This directory would contain hcatalog as it is today. It does not follow Hive's standard build model so we'd need to do some work to make it so that building Hive also builds HCat, but this should be minimal. 2) Dealing with domain names HCat code currently is under org.apache.hcatalog. Do we want to change it? In time we probably should change it to match the rest of Hive (org.apache.hadoop.hive.hcatalog). We need to do this in a backward compatible way. I propose we leave it as is for now and if we decide to in the future we can move the actual code to org.apache.hadoop.hive.hcatalog and create shell classes under org.apache.hcatalog. 3) The mailing lists Given that our goal is to merge the projects and not create a subproject we should merge the mailing lists rather than keep hcat specific lists. We can ask infra to remove hcatalog-*@incubator.apache.org and forward any new mail to the appropriate Hive lists. We need to find out if they can auto-subscribe people from
Re: Merging HCatalog into Hive
Cool, it works now. Thanks for the fast response. Alan. On Mar 26, 2013, at 2:58 PM, Carl Steinbach wrote: Hi Alan, I agree that it will probably be too painful to enforce the rules with SVN, so I went ahead and gave all of the HCatalog committers RW access to /hive. Please follow the rules. If I receive any complaints about this I'll revert back to the old scheme. Thanks. Carl On Tue, Mar 26, 2013 at 2:34 PM, Alan Gates ga...@hortonworks.com wrote: There's an issue with the permissions here. In the authorization file you granted permission to hcatalog committers on a directory /hive/hcatalog. But in Hive you created /hive/trunk/hcatalog, which none of the hcatalog committers can access. In the authorization file you'll need to change hive-hcatalog to have authorization /hive/trunk/hcatalog. There is also a scalability issue. Every time Hive branches you'll have to add a line for that branch as well. Also, this will prohibit any dev branches for hcatalog users, or access to any dev branches done in Hive. I suspect you'll find it much easier to give the hive-hcatalog group access to /hive and then use community mores to enforce that no hcat committers commit outside the hcat directory. Alan. On Mar 15, 2013, at 5:26 PM, Carl Steinbach wrote: Hi Alan, I committed HIVE-4145, created an HCatalog component on JIRA, and updated the asf-authorization-template to give the HCatalog committers karma on the hcatalog subdirectory. At this point I think everything should be ready to go. Let me know if you run into any problems. Thanks. Carl On Wed, Mar 13, 2013 at 11:56 AM, Alan Gates ga...@hortonworks.com wrote: Proposed changes look good to me. And you don't need an infra ticket to grant karma. Since you're Hive VP you can do it. See http://www.apache.org/dev/pmc.html#SVNaccess Alan. On Mar 10, 2013, at 9:29 PM, Carl Steinbach wrote: Hi Alan, I submitted a patch that creates the hcatalog directory and makes some other necessary changes here: https://issues.apache.org/jira/browse/HIVE-4145 Once this is committed I will contact ASFINFRA and ask them to grant the HCatalog committers karma. Thanks. Carl On Sat, Mar 9, 2013 at 12:54 PM, Alan Gates ga...@hortonworks.com wrote: Alright, I've gotten some feedback from Brock around the JIRA stuff and Carl in a live conversation expressed his desire to move hcat into the Hive namespace sooner rather than later. So the proposal is that we'd move the code to org.apache.hive.hcatalog, though we would create shell classes and interfaces in org.apache.hcatalog for all public classes and interfaces so that it will be backward compatible. I'm fine with doing this now. So, let's get started. Carl, could you create an hcatalog directory under trunk/hive and grant the listed hcat committers karma on it? Then I'll get started on moving the actual code. Alan. On Feb 24, 2013, at 12:22 PM, Brock Noland wrote: Looks good from my perspective and I glad to see this moving forward. Regarding #4 (JIRA) I don't know if there's a way to upload existing JIRAs into Hive's JIRA, but I think it would be better to leave them where they are. JIRA has a bulk move feature, but I am curious as why we would leave them under the old project? There might be good reason to orphan them, but my first thought is that it would be nice to have them under the HIVE project simply for search purposes. Brock On Fri, Feb 22, 2013 at 7:12 PM, Alan Gates ga...@hortonworks.com wrote: Alright, our vote has passed, it's time to get on with merging HCatalog into Hive. Here's the things I can think of we need to deal with. Please add additional issues I've missed: 1) Moving the code 2) Dealing with domain names in the code 3) The mailing lists 4) The JIRA 5) The website 6) Committer rights 7) Make a proposal for how HCat is released going forward 8) Publish an FAQ Proposals for how we handle these: Below I propose an approach for how to handle each of these. Feedback welcome. 1) Moving the code I propose that HCat move into a subdirectory of Hive. This fits nicely into Hive's structure since it already has metastore, ql, etc. We'd just add 'hcatalog' as a new directory. This directory would contain hcatalog as it is today. It does not follow Hive's standard build model so we'd need to do some work to make it so that building Hive also builds HCat, but this should be minimal. 2) Dealing with domain names HCat code currently is under org.apache.hcatalog. Do we want to change it? In time we probably should change it to match the rest of Hive (org.apache.hadoop.hive.hcatalog
Re: Moving code to Hive NOW
I've moved the code. I'll be moving a lot of other code around over the next few days as I do what we discussed in https://issues.apache.org/jira/browse/HIVE-4198 so don't rebase your patches just yet. Alan. On Mar 26, 2013, at 3:14 PM, Alan Gates wrote: I am going to move the HCatalog code to Hive in the next few minutes. Please don't check anything into HCatalog until this is done. All patches will be invalidated by this move. I'll send an all clear when this is done. Alan.
Where to put hcatalog branches and site code
Right after I moved the hcat code to hive/trunk/hcatalog Owen pointed out that the problem with this is now everyone who checks out Hive pulls _all_ of the hcat code. This isn't what we want. The site code I propose we integrate with Hive's site code. I'll put up a patch for this shortly. The branches we could either move into Hive's branches directory (and move them to hcatalog-branch-0.x) or we could create a /hive/hcatalog-historical and put them there. I'm fine with either. Thoughts? Alan.
Re: HCatalog to Hive Committership
Asking for volunteers is probably better than making assignments. We have 7 HCatalog committers (http://hive.apache.org/credits.html). If current Hive committers/PMC members could volunteer to mentor a committer we can start assigning mentors to pupils. Seem reasonable? Alan. On Apr 16, 2013, at 6:10 PM, Carl Steinbach wrote: HCatalog committers will be assigned shepherds - How do we go about getting assigned one? I recommend asking Alan about this since he wrote the proposal. Alan, who is responsible for making these assignments? Thanks. Carl
Re: Is HCatalog stand-alone going to die ?
On Apr 29, 2013, at 10:25 PM, Rodrigo Trujillo wrote: Hi, I have followed the discussion about the merging of HCatalog into Hive. However, it is not clear to me whether new stand-alone versions of Hcatalog are going to be released. Is 0.5.0-incubating the last ? 0.5.0 is the last planned stand alone release of HCatalog. The next version of HCatalog will be included in Hive 0.11. Will be possible to build only hcatalog from Hive tree ? No, building HCatalog already depends on building Hive. Alan. Regards, Rodrigo Trujillo
Review Request: HIVE-4500 HS2 holding too many file handles of hive_job_log_hive_*.txt files
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10954/ --- Review request for hive and Carl Steinbach. Description --- HS2 holding too many file handles of hive_job_log_hive_*.txt files This addresses bug HIVE-4500. https://issues.apache.org/jira/browse/HIVE-4500 Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java 1478219 trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 1478219 trunk/service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java 1478219 trunk/service/src/java/org/apache/hive/service/cli/operation/Operation.java 1478219 trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 1478219 Diff: https://reviews.apache.org/r/10954/diff/ Testing --- Thanks, Alan Gates
Re: [VOTE] Apache Hive 0.11.0 Release Candidate 2
+1. Downloaded and built it. Ran the HCat and webhcat system tests (which include a number of Hive tests). Everything looks good. Alan. On May 11, 2013, at 10:33 AM, Owen O'Malley wrote: Based on feedback from everyone, I have respun release candidate, RC2. Please take a look. We've fixed 7 problems with the previous RC: * Release notes were incorrect * HIVE-4018 - MapJoin failing with Distributed Cache error * HIVE-4421 - Improve memory usage by ORC dictionaries * HIVE-4500 - Ensure that HiveServer 2 closes log files. * HIVE-4494 - ORC map columns get class cast exception in some contexts * HIVE-4498 - Fix TestBeeLineWithArgs failure * HIVE-4505 - Hive can't load transforms with remote scripts * HIVE-4527 - Fix the eclipse template Source tag for RC2 is at: https://svn.apache.org/repos/asf/hive/tags/release-0.11.0rc2 Source tar ball and convenience binary artifacts can be found at: http://people.apache.org/~omalley/hive-0.11.0rc2/ This release has many goodies including HiveServer2, integrated hcatalog, windowing and analytical functions, decimal data type, better query planning, performance enhancements and various bug fixes. In total, we resolved more than 350 issues. Full list of fixed issues can be found at: http://s.apache.org/8Fr Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Owen
Re: VOTE: Remove phabricator instructions from hive-development guide (wiki), officially only support Apache's review board.
Major +1 (non-binding). Using 3rd party tools where we have no option for support or help is not good. Alan. On Oct 16, 2013, at 5:32 PM, Edward Capriolo wrote: Our wiki has instructions for posting to phabricator for code reviews. https://cwiki.apache.org/confluence/display/Hive/PhabricatorCodeReview Phabricator now requires an external facebook account to review patches, and we have no technical support contact where phabricator is hosted. It also seems like some of the phabricator features are no longer working. Apache has a review board system many people are already using. https://reviews.apache.org/account/login/?next_page=/dashboard/ This vote is to remove the phabricator instructions from the wiki. The instructions will reference review board and that will be the only system that Hive supports for patch review process. +1 is a vote for removing the phabricator instructions from the wiki. Thank you, Edward -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Maven unit test question
I was attempting to write unit tests for changes I'm making to HiveMetaStoreClient as part of the ACID transaction work (see https://issues.apache.org/jira/browse/HIVE-5843). When I added the tests and attempted to run them using mvn tests -Dtest=TestHiveMetaStoreClient -Phadoop-1 it failed with: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/thrift/TUGIContainingTransport$Factory This class is contained in the hive-shims jar. The error surprised me because according to metastore/pom.xml, hive-shims is a dependency of hive-metastore. When I ran maven with -X to get debug information, I found that in the classpath it was including /Users/gates/git/apache/hive/shims/assembly/target/classes. I'm guessing that rather than use the shims jar (which has been built by this time) it's trying to use the compiled classes, but failing in this case because the shims jar is actually constructed not by directly conglomerating a set of class files but by picking and choosing from several shim jar versions and then constructing a single jar. But I could not figure out how to communicate to maven that is should use the already built shims jar rather than the classes. To test my theory I took the shims jar and unpacked in the path maven was looking in, and sure enough my tests ran once I did that. The existing unit test TestMetastoreExpr in ql seems to have the same issue. I tried to use it as a model, but when I ran it it failed with the same error, and unpacking the jar resolved it in the same way. Am I doing something wrong, or is there a change needed in the pom.xml to get it to look in the jar instead of the .class files for shims dependencies? Alan. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Maven unit test question
There's a patch on https://issues.apache.org/jira/browse/HIVE-5843 that has the code. Unfortunately the patch is huge because my work involves changes to the thrift interface. The git repo I'm working off of last pulled from Apache on Nov 20th (commit 1de88eedc69af1e7e618fc4f5eac045f69c02973 is the last one it has) so you may need to go back a bit in your repo to get a version that the patch will apply against. The test in question is TestHiveMetaStoreClient. Also, I had to move the class from metastore to ql as it turns out instantiating the HiveMetaStoreClient needs the hive-exec jar. I couldn't add a dependency on hive-exec in the metastore package as hive-exec depends on hive-metastore. Thanks for your help. Alan. On Dec 9, 2013, at 3:53 PM, Brock Noland wrote: Can you share the change with me so I can debug? On Dec 9, 2013 5:15 PM, Alan Gates ga...@hortonworks.com wrote: I was attempting to write unit tests for changes I'm making to HiveMetaStoreClient as part of the ACID transaction work (see https://issues.apache.org/jira/browse/HIVE-5843). When I added the tests and attempted to run them using mvn tests -Dtest=TestHiveMetaStoreClient -Phadoop-1 it failed with: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/thrift/TUGIContainingTransport$Factory This class is contained in the hive-shims jar. The error surprised me because according to metastore/pom.xml, hive-shims is a dependency of hive-metastore. When I ran maven with -X to get debug information, I found that in the classpath it was including /Users/gates/git/apache/hive/shims/assembly/target/classes. I'm guessing that rather than use the shims jar (which has been built by this time) it's trying to use the compiled classes, but failing in this case because the shims jar is actually constructed not by directly conglomerating a set of class files but by picking and choosing from several shim jar versions and then constructing a single jar. But I could not figure out how to communicate to maven that is should use the already built shims jar rather than the classes. To test my theory I took the shims jar and unpacked in the path maven was looking in, and sure enough my tests ran once I did that. The existing unit test TestMetastoreExpr in ql seems to have the same issue. I tried to use it as a model, but when I ran it it failed with the same error, and unpacking the jar resolved it in the same way. Am I doing something wrong, or is there a change needed in the pom.xml to get it to look in the jar instead of the .class files for shims dependencies? Alan. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: adding ANSI flag for hive
A couple of thoughts on this: 1) If we did this I think we should have one flag, not many. As Thejas points out, your test matrix goes insane when you have too many flags and hence things don't get properly tested. 2) We could do this in an incremental way, where we create this new ANSI flag and are clear with users that for a while this will be evolving. That is, as we find new issues with data types, semantics, whatever, we will continue to change the behavior of this flag. At some point in the future (as Thejas suggests, at a 1.0 release) we could make this the default behavior. This avoids having to do a full sweep now and find everything that we want to change and make ANSI compliant and living with whatever we miss. Alan. On Dec 11, 2013, at 5:14 PM, Thejas Nair wrote: Having too many configs complicates things for the user, and also complicates the code, and you also end up having many untested combinations of config flags. I think we should identify a bunch of non compatible changes that we think are important, fix it in a branch and make a major version release (say 1.x). This is also related to HIVE-5875, where there is a discussion on switching the defaults for some of the configs to more desirable values, but non backward compatible values. On Wed, Dec 11, 2013 at 4:33 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Hi. There's recently been some discussion about data type changes in Hive (double to decimal), and result changes for special cases like division by zero, etc., to bring it in compliance with MySQL (that's what JIRAs use an example; I am assuming ANSI SQL is meant). The latter are non-controversial (I guess), but for the former, performance may suffer and/or backward compat may be broken if Hive is brought in compliance. If fuller ANSI compat is sought in the future, there may be some even hairier issues such as double-quoted identifiers. In light of that, and also following MySQL, I wonder if we should add a flag, or set of flags, to HIVE to be able to force ANSI compliance. When this/ese flag/s is/are not set, for example, int/int division could return double for backward compat/perf, vectorization can skip the special case handling for division by zero/etc., etc. Wdyt? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Mail bounces from ebuddy.com
Anyone who is an admin on the list (I don't who the admins are) can do this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org where USERNAME is the name of the bouncing user (see http://untroubled.org/ezmlm/ezman/ezman1.html ) Alan. Thejas Nair mailto:the...@hortonworks.com August 17, 2014 at 17:02 I don't know how to do this. Carl, Ashutosh, Do you guys know how to remove these two invalid emails from the mailing list ? Lars Francke mailto:lars.fran...@gmail.com August 17, 2014 at 15:41 Hmm great, I see others mentioning this as well. I'm happy to contact INFRA but I'm not sure if they are even needed or if someone from the Hive team can do this? On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz leftylever...@gmail.com Lefty Leverenz mailto:leftylever...@gmail.com August 7, 2014 at 18:43 (Excuse the spam.) Actually I'm getting two bounces per message, but gmail concatenates them so I didn't notice the second one. -- Lefty On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz leftylever...@gmail.com Lefty Leverenz mailto:leftylever...@gmail.com August 7, 2014 at 18:36 Curious, I've only been getting one bounce per message. Anyway thanks for bringing this up. -- Lefty Lars Francke mailto:lars.fran...@gmail.com August 7, 2014 at 4:38 Hi, every time I send a mail to dev@ I get two bounce mails from two people at ebuddy.com. I don't want to post the E-Mail addresses publicly but I can send them on if needed (and it can be triggered easily by just replying to this mail I guess). Could we maybe remove them from the list? Cheers, Lars -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Timeline for release of Hive 0.14
+1, Eugene and I are working on getting HIVE-5317 (insert, update, delete) done and would like to get it in. Alan. Nick Dimiduk mailto:ndimi...@gmail.com August 20, 2014 at 12:27 It'd be great to get HIVE-4765 included in 0.14. The proposed changes are a big improvement for us HBase folks. Would someone mind having a look in that direction? Thanks, Nick Thejas Nair mailto:the...@hortonworks.com August 19, 2014 at 15:20 +1 Sounds good to me. Its already almost 4 months since the last release. It is time to start preparing for the next one. Thanks for volunteering! Vikram Dixit mailto:vik...@hortonworks.com August 19, 2014 at 14:02 Hi Folks, I was thinking that it was about time that we had a release of hive 0.14 given our commitment to having a release of hive on a periodic basis. We could cut a branch and start working on a release in say 2 weeks time around September 5th (Friday). After branching, we can focus on stabilizing for the release and hopefully have an RC in about 2 weeks post that. I would like to volunteer myself for the duties of the release manager for this version if the community agrees. Thanks Vikram. -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Review Request 25245: Support dynamic service discovery for HiveServer2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25245/#review52171 --- service/src/java/org/apache/hive/service/server/HiveServer2.java https://reviews.apache.org/r/25245/#comment90921 It seems like we want more than warn here if we fail to create the parent node. In this case we'll be unable to create the node for this instance, and clients will be unable to find the server. I would think this should be fatal. service/src/java/org/apache/hive/service/server/HiveServer2.java https://reviews.apache.org/r/25245/#comment90922 Agree we should have a clean shutdown case. The timeout was 3 minutes I think, which means it will be a while after the system shuts down that clients keep trying to contact it. - Alan Gates On Sept. 2, 2014, 10:05 a.m., Vaibhav Gumashta wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25245/ --- (Updated Sept. 2, 2014, 10:05 a.m.) Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair. Bugs: HIVE-7935 https://issues.apache.org/jira/browse/HIVE-7935 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-7935 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 46044d0 ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java 59294b1 service/src/java/org/apache/hive/service/cli/CLIService.java 08ed2e7 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 21c33bc service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java bc0a02c service/src/java/org/apache/hive/service/cli/session/SessionManager.java d573592 service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 37b05fc service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 027931e service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java c380b69 service/src/java/org/apache/hive/service/server/HiveServer2.java 0864dfb service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java 66fc1fc Diff: https://reviews.apache.org/r/25245/diff/ Testing --- Manual testing + test cases. Thanks, Vaibhav Gumashta
Review Request 25341: HIVE-7078 Need file sink operators that work with ACID
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25341/ --- Review request for hive and Prasanth_J. Bugs: HIVE-7078 https://issues.apache.org/jira/browse/HIVE-7078 Repository: hive-git Description --- Changes FileSinkOperator to use RecordUpdater in cases where an ACID write is being done. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java d4e61d8 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java f584926 ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java c3a83d4 ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 301dde5 ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java PRE-CREATION Diff: https://reviews.apache.org/r/25341/diff/ Testing --- Added a new unit test TestFileSinkOperator that tests writing of standard (non-ACID) data via RecordWriter and acid data via RecordUpdater, in both partitioned and non-partitioned cases. Thanks, Alan Gates
Review Request 25343: HIVE-7899 txnMgr should be session specific
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25343/ --- Review request for hive and Ashutosh Chauhan. Bugs: HIVE-7899 https://issues.apache.org/jira/browse/HIVE-7899 Repository: hive-git Description --- This patch moves the TxnManager instance from Driver to SessionState, since multiple queries can share a single session, and it is convenient in other parts of the code to be able to get to the transaction manager. It also stores the current transaction id and whether we are in autocommit in SessionState. Diffs - ql/src/java/org/apache/hadoop/hive/ql/Driver.java 0533ae8 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java df66f83 Diff: https://reviews.apache.org/r/25343/diff/ Testing --- Ran existing transaction manager tests. Thanks, Alan Gates
Re: Review Request 25341: HIVE-7078 Need file sink operators that work with ACID
On Sept. 5, 2014, 8:21 a.m., Prasanth_J wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java, line 211 https://reviews.apache.org/r/25341/diff/1/?file=676838#file676838line211 I don't see needToRename being used elsewhere. So can you replace this chunk with if (fs.exists(outPaths[idx]) !fs.rename(outPaths[idx], finalPaths[idx]) {..} ? But that would change the behavior in the standard case. This way the behavior is only changed in the update and delete case. I didn't want to add an extra stat for every type of write. On Sept. 5, 2014, 8:21 a.m., Prasanth_J wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java, line 556 https://reviews.apache.org/r/25341/diff/1/?file=676838#file676838line556 Can you add a comment what is happening here? Are you just stripping off the _attemptId from taskId_attemptId? If so can you use Utilities.getTaskIdFromFilename() instead? Switched to Utilities.getTaskIdFromFilename() as requested. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25341/#review52339 --- On Sept. 4, 2014, 3:49 p.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25341/ --- (Updated Sept. 4, 2014, 3:49 p.m.) Review request for hive and Prasanth_J. Bugs: HIVE-7078 https://issues.apache.org/jira/browse/HIVE-7078 Repository: hive-git Description --- Changes FileSinkOperator to use RecordUpdater in cases where an ACID write is being done. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java d4e61d8 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java f584926 ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java c3a83d4 ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 301dde5 ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java PRE-CREATION Diff: https://reviews.apache.org/r/25341/diff/ Testing --- Added a new unit test TestFileSinkOperator that tests writing of standard (non-ACID) data via RecordWriter and acid data via RecordUpdater, in both partitioned and non-partitioned cases. Thanks, Alan Gates
Re: Review Request 25341: HIVE-7078 Need file sink operators that work with ACID
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25341/ --- (Updated Sept. 5, 2014, 4:36 p.m.) Review request for hive and Prasanth_J. Bugs: HIVE-7078 https://issues.apache.org/jira/browse/HIVE-7078 Repository: hive-git Description --- Changes FileSinkOperator to use RecordUpdater in cases where an ACID write is being done. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java d4e61d8 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java f584926 ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java c3a83d4 ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 301dde5 ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java PRE-CREATION Diff: https://reviews.apache.org/r/25341/diff/ Testing --- Added a new unit test TestFileSinkOperator that tests writing of standard (non-ACID) data via RecordWriter and acid data via RecordUpdater, in both partitioned and non-partitioned cases. Thanks, Alan Gates
Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
PRE-CREATION ql/src/test/queries/clientpositive/update_where_non_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/update_where_partitioned.q PRE-CREATION ql/src/test/results/clientnegative/delete_not_acid.q.out PRE-CREATION ql/src/test/results/clientnegative/update_not_acid.q.out PRE-CREATION ql/src/test/results/clientnegative/update_partition_col.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_where_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_whole_partition.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_update_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_dynamic_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_where_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_whole_partition.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_update_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_dynamic_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_after_multiple_inserts.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_all_types.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_two_cols.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_where_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_after_multiple_inserts.q.out PRE-CREATION ql/src/test/results/clientpositive/update_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_all_types.q.out PRE-CREATION ql/src/test/results/clientpositive/update_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/update_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/update_two_cols.q.out PRE-CREATION ql/src/test/results/clientpositive/update_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/update_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_where_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25414/diff/ Testing --- Many tests included in the patch, including insert/values, update, and delete all tested against: non-partitioned tables, partitioned tables, and temp tables. Thanks, Alan Gates
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
/clientpositive/update_where_no_match.q PRE-CREATION ql/src/test/queries/clientpositive/update_where_non_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/update_where_partitioned.q PRE-CREATION ql/src/test/results/clientnegative/delete_not_acid.q.out PRE-CREATION ql/src/test/results/clientnegative/update_not_acid.q.out PRE-CREATION ql/src/test/results/clientnegative/update_partition_col.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_where_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_whole_partition.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_update_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_dynamic_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_where_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_whole_partition.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_update_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_dynamic_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_after_multiple_inserts.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_all_types.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_two_cols.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_where_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_after_multiple_inserts.q.out PRE-CREATION ql/src/test/results/clientpositive/update_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_all_types.q.out PRE-CREATION ql/src/test/results/clientpositive/update_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/update_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/update_two_cols.q.out PRE-CREATION ql/src/test/results/clientpositive/update_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/update_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_where_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25414/diff/ Testing --- Many tests included in the patch, including insert/values, update, and delete all tested against: non-partitioned tables, partitioned tables, and temp tables. Thanks, Alan Gates
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
/clientpositive/update_where_no_match.q PRE-CREATION ql/src/test/queries/clientpositive/update_where_non_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/update_where_partitioned.q PRE-CREATION ql/src/test/results/clientnegative/delete_not_acid.q.out PRE-CREATION ql/src/test/results/clientnegative/update_not_acid.q.out PRE-CREATION ql/src/test/results/clientnegative/update_partition_col.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_where_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/delete_whole_partition.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_update_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_dynamic_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/insert_values_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_where_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/delete_whole_partition.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_update_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_dynamic_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/insert_values_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_after_multiple_inserts.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_all_types.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_two_cols.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/update_where_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_after_multiple_inserts.q.out PRE-CREATION ql/src/test/results/clientpositive/update_all_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_all_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_all_types.q.out PRE-CREATION ql/src/test/results/clientpositive/update_orig_table.q.out PRE-CREATION ql/src/test/results/clientpositive/update_tmp_table.q.out PRE-CREATION ql/src/test/results/clientpositive/update_two_cols.q.out PRE-CREATION ql/src/test/results/clientpositive/update_where_no_match.q.out PRE-CREATION ql/src/test/results/clientpositive/update_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_where_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25414/diff/ Testing --- Many tests included in the patch, including insert/values, update, and delete all tested against: non-partitioned tables, partitioned tables, and temp tables. Thanks, Alan Gates
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: I obivously don't have context here but I do have a few items which I think should be addressed. Thx! Thanks for the review. On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 305 https://reviews.apache.org/r/25414/diff/1/?file=682007#file682007line305 I think this should be HIVE_IN_TEZ_TEST :), will fix On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/Context.java, line 105 https://reviews.apache.org/r/25414/diff/1/?file=682012#file682012line105 there is a setter/getter for this field so I think it can be private. Ok. On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java, line 275 https://reviews.apache.org/r/25414/diff/1/?file=682016#file682016line275 assert is almost never enabled. Should we use preconditions? I put this here as a way to test while I was developing, and left it because it helped make clear to later maintainers what I was expecting. I avoided doing an explicit instanceof check for performance. If you think it's important I can put it in there without the assert and then throw an exception. On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java, line 52 https://reviews.apache.org/r/25414/diff/1/?file=682019#file682019line52 constants should be all caps. If we fix this one can we fix bucketFileFilter as well. I'm fine to change it, except that all of the other filters in the file aren't, so I was matching existing style. We might want to file a separate JIRA to fix them all, which should be a quick patch. On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2404 https://reviews.apache.org/r/25414/diff/1/?file=682024#file682024line2404 seems like we might want to log the exception here Agreed, will fix. On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2417 https://reviews.apache.org/r/25414/diff/1/?file=682024#file682024line2417 seems like we might want to log the exception here Agreed, will fix. On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2436 https://reviews.apache.org/r/25414/diff/1/?file=682024#file682024line2436 hmm, why not just log as INFO or DEBUG? Will add an INFO message. On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2449 https://reviews.apache.org/r/25414/diff/1/?file=682024#file682024line2449 no need for stringifyException here, you can pass e as a second arg Will fix. On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 99 https://reviews.apache.org/r/25414/diff/1/?file=682028#file682028line99 looks like this can be final Sure, but why? On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 747 https://reviews.apache.org/r/25414/diff/1/?file=682029#file682029line747 why not log the exception as well Do you mean the exception stack or the exception name? The exception message is getting logged, since it's in errMsg. I'm throwing a SemanticException that includes the caught exception, so I'm assuming the stack will be printed when that is dumped. On Sept. 6, 2014, 5:43 p.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java, line 333 https://reviews.apache.org/r/25414/diff/1/?file=682031#file682031line333 I think we should change this to IllegalStateException The dangers of using comedy in your error messages is that you'll forget to go back and put something useful in. Will fix. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/#review52542 --- On Sept. 6, 2014, 4:32 p.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/ --- (Updated Sept. 6, 2014, 4:32 p.m.) Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Thejas Nair. Bugs: HIVE-7788 https://issues.apache.org/jira/browse/HIVE-7788 Repository: hive-git Description --- This patch adds plan generation as well as making modifications to some of the exec operators to make insert/value, update, and delete work
Re: Timeline for release of Hive 0.14
I'll review that. I just need the time to test it against mysql, oracle, and hopefully sqlserver. But I think we can do this post branch if we need to, as it's a bug fix rather than a feature. Alan. Damien Carol mailto:dca...@blitzbs.com September 8, 2014 at 3:19 Same request for https://issues.apache.org/jira/browse/HIVE-7689 I already provided a patch, re-based it many times and I'm waiting for a review. Regards, Le 08/09/2014 12:08, amareshwarisr . a écrit : amareshwarisr . mailto:amareshw...@gmail.com September 8, 2014 at 3:08 Would like to include https://issues.apache.org/jira/browse/HIVE-2390 and https://issues.apache.org/jira/browse/HIVE-7936. I can review and merge them. Thanks Amareshwari Vikram Dixit mailto:vik...@hortonworks.com September 5, 2014 at 17:53 Hi Folks, I am going to start consolidating the items mentioned in this list and create a wiki page to track it. I will wait till the end of next week to create the branch taking into account Ashutosh's request. Thanks Vikram. On Fri, Sep 5, 2014 at 5:39 PM, Ashutosh Chauhan hashut...@apache.org Ashutosh Chauhan mailto:hashut...@apache.org September 5, 2014 at 17:39 Vikram, Some of us are working on stabilizing cbo branch and trying to get it merged into trunk. We feel we are close. May I request to defer cutting the branch for few more days? Folks interested in this can track our progress here : https://issues.apache.org/jira/browse/HIVE-7946 Thanks, Ashutosh On Fri, Aug 22, 2014 at 4:09 PM, Lars Francke lars.fran...@gmail.com Lars Francke mailto:lars.fran...@gmail.com August 22, 2014 at 16:09 Thank you for volunteering to do the release. I think a 0.14 release is a good idea. I have a couple of issues I'd like to get in too: * Either HIVE-7107[0] (Fix an issue in the HiveServer1 JDBC driver) or HIVE-6977[1] (Delete HiveServer1). The former needs a review the latter a patch * HIVE-6123[2] Checkstyle in Maven needs a review HIVE-7622[3] HIVE-7543[4] are waiting for any reviews or comments on my previous thread[5]. I'd still appreciate any helpers for reviews or even just comments. I'd feel very sad if I had done all that work for nothing. Hoping this thread gives me a wider audience. Both patches fix up issues that should have been caught in earlier reviews as they are almost all Checkstyle or other style violations but they make for huge patches. I could also create hundreds of small issues or stop doing these things entirely [0] https://issues.apache.org/jira/browse/HIVE-7107 [1] https://issues.apache.org/jira/browse/HIVE-6977 [2] https://issues.apache.org/jira/browse/HIVE-6123 [3] https://issues.apache.org/jira/browse/HIVE-7622 [4] https://issues.apache.org/jira/browse/HIVE-7543 On Fri, Aug 22, 2014 at 11:01 PM, John Pullokkaran -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
/update_where_non_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/update_where_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25414/diff/ Testing --- Many tests included in the patch, including insert/values, update, and delete all tested against: non-partitioned tables, partitioned tables, and temp tables. Thanks, Alan Gates
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
/hive/ql/session/SessionState.java, line 1288 https://reviews.apache.org/r/25414/diff/1/?file=682035#file682035line1288 Could a Session be shared accross threads? Answered by Thejas' comments. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/#review52655 --- On Sept. 8, 2014, 11:37 p.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/ --- (Updated Sept. 8, 2014, 11:37 p.m.) Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Thejas Nair. Bugs: HIVE-7788 https://issues.apache.org/jira/browse/HIVE-7788 Repository: hive-git Description --- This patch adds plan generation as well as making modifications to some of the exec operators to make insert/value, update, and delete work. The patch is large, but about 2/3 of that are tests. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 31aeba9 data/conf/tez/hive-site.xml 0b3877c itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java 1a84024 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java 9807497 itests/src/test/resources/testconfiguration.properties 99049ca metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f1697bb ql/src/java/org/apache/hadoop/hive/ql/Context.java 7fcbe3c ql/src/java/org/apache/hadoop/hive/ql/Driver.java 9953919 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 4246d68 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java f018ca0 ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java e3bc3b1 ql/src/java/org/apache/hadoop/hive/ql/hooks/WriteEntity.java 7f1d71b ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java b1c4441 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 913d3ac ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 264052f ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 8354ad9 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 32d2f7a ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2b1a345 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 4acafba ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java 96a5d78 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java 5c711cf ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 5195748 ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 911ac8a ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 97fa52c ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 026efe8 ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 2dbf1c8 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 6dce30c ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 5695f35 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 47fe508 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 789c780 ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 63ecb8d ql/src/test/org/apache/hadoop/hive/ql/parse/TestUpdateDeleteSemanticAnalyzer.java PRE-CREATION ql/src/test/queries/clientnegative/acid_overwrite.q PRE-CREATION ql/src/test/queries/clientnegative/delete_not_acid.q PRE-CREATION ql/src/test/queries/clientnegative/update_not_acid.q PRE-CREATION ql/src/test/queries/clientnegative/update_partition_col.q PRE-CREATION ql/src/test/queries/clientpositive/delete_all_non_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/delete_all_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/delete_orig_table.q PRE-CREATION ql/src/test/queries/clientpositive/delete_tmp_table.q PRE-CREATION ql/src/test/queries/clientpositive/delete_where_no_match.q PRE-CREATION ql/src/test/queries/clientpositive/delete_where_non_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/delete_where_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/delete_whole_partition.q PRE-CREATION ql/src/test/queries/clientpositive/insert_orig_table.q PRE-CREATION ql/src/test/queries/clientpositive/insert_update_delete.q PRE-CREATION ql/src/test/queries/clientpositive/insert_values_dynamic_partitioned.q PRE-CREATION ql/src/test/queries
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
On Sept. 10, 2014, 12:11 a.m., Thejas Nair wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 305 https://reviews.apache.org/r/25414/diff/1/?file=682007#file682007line305 Pass another boolean param true to exclude it from the auto generated hive-default.xml.template . See HIVE_IN_TEST constructor use. Done. On Sept. 10, 2014, 12:11 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 930 https://reviews.apache.org/r/25414/diff/2/?file=682947#file682947line930 This error message is assuming that only getUser will throw IOException, but there might be other code added in future that might result in IOException being thrown, and it will be easy to forget to update this error message. How about creating a separate try catch block for conf.getUser ? try/catch moved to just surround conf.getUser On Sept. 10, 2014, 12:11 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2420 https://reviews.apache.org/r/25414/diff/2/?file=682958#file682958line2420 The logging here does not look necessary as an excetption is being thrown. But if we do log, I think its better to also log the exception here. LOG.error(msg,e); logging removed. On Sept. 10, 2014, 12:11 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2439 https://reviews.apache.org/r/25414/diff/2/?file=682958#file682958line2439 logging the exception would be useful, in case it fails for a reason other than dir already exists. Exception added to log message. On Sept. 10, 2014, 12:11 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2454 https://reviews.apache.org/r/25414/diff/2/?file=682958#file682958line2454 if we are logging, lets log the exception as well. (I don't see any added value of this log, as top level exception log would show the whole stack trace) Logging removed. On Sept. 10, 2014, 12:11 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 686 https://reviews.apache.org/r/25414/diff/2/?file=682964#file682964line686 Use SemanticException instead of RTE here? done On Sept. 10, 2014, 12:11 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 724 https://reviews.apache.org/r/25414/diff/2/?file=682964#file682964line724 Its safer to use {} for if-else. Remember the apple security issue ? https://www.imperialviolet.org/2014/02/22/applebug.html Which is why I never put it on a separate line. On Sept. 10, 2014, 12:11 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 735 https://reviews.apache.org/r/25414/diff/2/?file=682964#file682964line735 fillDefaultStorageFormat uses the value of ConfVars.HIVEDEFAULTFILEFORMAT. If it is set to something other than TextFile, this will break. Also, that function does not set serde at present. Good points. Is there a different function you suggest or should I just do the same work manually? - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/#review52559 --- On Sept. 8, 2014, 11:37 p.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/ --- (Updated Sept. 8, 2014, 11:37 p.m.) Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Thejas Nair. Bugs: HIVE-7788 https://issues.apache.org/jira/browse/HIVE-7788 Repository: hive-git Description --- This patch adds plan generation as well as making modifications to some of the exec operators to make insert/value, update, and delete work. The patch is large, but about 2/3 of that are tests. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 31aeba9 data/conf/tez/hive-site.xml 0b3877c itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java 1a84024 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java 9807497 itests/src/test/resources/testconfiguration.properties 99049ca metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f1697bb ql/src/java/org/apache/hadoop/hive/ql/Context.java 7fcbe3c ql/src/java/org/apache/hadoop/hive/ql/Driver.java 9953919 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 4246d68 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 ql/src/java/org/apache/hadoop/hive/ql/exec
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
On Sept. 10, 2014, 2:52 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java, line 74 https://reviews.apache.org/r/25414/diff/2/?file=682966#file682966line74 throw SemanticException here ? To me that's a RuntimeException. We should never get to that point, it indicates an internal error. I would think semantic exceptions are for user errors that make it past the parser. On Sept. 10, 2014, 2:52 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java, line 226 https://reviews.apache.org/r/25414/diff/2/?file=682966#file682966line226 I thought row__ids are stored in ascending order. Why is the sort in descending order ? You're correct that row_ids are stored in ascending order. For reasons I didn't investigate the results come out the opposite of whatever is requested here. The issue isn't in RecordIdentifier because requesting ascending order in SortedDynPartitioner in the optimizer produces the correct results. On Sept. 10, 2014, 2:52 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java, line 260 https://reviews.apache.org/r/25414/diff/2/?file=682966#file682966line260 add this here ? // - TOK_SORTBY done. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/#review52810 --- On Sept. 8, 2014, 11:37 p.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/ --- (Updated Sept. 8, 2014, 11:37 p.m.) Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Thejas Nair. Bugs: HIVE-7788 https://issues.apache.org/jira/browse/HIVE-7788 Repository: hive-git Description --- This patch adds plan generation as well as making modifications to some of the exec operators to make insert/value, update, and delete work. The patch is large, but about 2/3 of that are tests. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 31aeba9 data/conf/tez/hive-site.xml 0b3877c itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java 1a84024 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java 9807497 itests/src/test/resources/testconfiguration.properties 99049ca metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f1697bb ql/src/java/org/apache/hadoop/hive/ql/Context.java 7fcbe3c ql/src/java/org/apache/hadoop/hive/ql/Driver.java 9953919 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 4246d68 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java f018ca0 ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java e3bc3b1 ql/src/java/org/apache/hadoop/hive/ql/hooks/WriteEntity.java 7f1d71b ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java b1c4441 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 913d3ac ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 264052f ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 8354ad9 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 32d2f7a ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2b1a345 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 4acafba ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java 96a5d78 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java 5c711cf ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 5195748 ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 911ac8a ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 97fa52c ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 026efe8 ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 2dbf1c8 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 6dce30c ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 5695f35 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 47fe508 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 789c780 ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 63ecb8d ql/src/test/org/apache/hadoop/hive/ql/parse/TestUpdateDeleteSemanticAnalyzer.java PRE-CREATION ql
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
On Sept. 9, 2014, 9:33 p.m., Eugene Koifman wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 6108 https://reviews.apache.org/r/25414/diff/1/?file=682029#file682029line6108 Would ROWID.getTypeInfo() work? Seems to. On Sept. 9, 2014, 9:33 p.m., Eugene Koifman wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 11872 https://reviews.apache.org/r/25414/diff/1/?file=682029#file682029line11872 does this work if of implements a sublcass of AcidOutputFormat? Perhaps Class.isAssignableFrom() is a safer choice Changed. On Sept. 9, 2014, 9:33 p.m., Eugene Koifman wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java, line 305 https://reviews.apache.org/r/25414/diff/1/?file=682031#file682031line305 Would the outputs.size() be in partitioned case since we dynamic partition insert is used? Could this test be more obvious, like checking getTyp()==TABLE or something like that? Good catch. The way I was doing this was definitely buggy. I'll re-implement it to check explicitly for partitions in the inputs. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/#review52655 --- On Sept. 8, 2014, 11:37 p.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/ --- (Updated Sept. 8, 2014, 11:37 p.m.) Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Thejas Nair. Bugs: HIVE-7788 https://issues.apache.org/jira/browse/HIVE-7788 Repository: hive-git Description --- This patch adds plan generation as well as making modifications to some of the exec operators to make insert/value, update, and delete work. The patch is large, but about 2/3 of that are tests. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 31aeba9 data/conf/tez/hive-site.xml 0b3877c itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java 1a84024 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java 9807497 itests/src/test/resources/testconfiguration.properties 99049ca metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f1697bb ql/src/java/org/apache/hadoop/hive/ql/Context.java 7fcbe3c ql/src/java/org/apache/hadoop/hive/ql/Driver.java 9953919 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 4246d68 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java f018ca0 ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java e3bc3b1 ql/src/java/org/apache/hadoop/hive/ql/hooks/WriteEntity.java 7f1d71b ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java b1c4441 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 913d3ac ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 264052f ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 8354ad9 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 32d2f7a ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2b1a345 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 4acafba ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java 96a5d78 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java 5c711cf ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 5195748 ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 911ac8a ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 97fa52c ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 026efe8 ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 2dbf1c8 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 6dce30c ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 5695f35 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 47fe508 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 789c780 ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 63ecb8d ql/src/test/org/apache/hadoop/hive/ql/parse/TestUpdateDeleteSemanticAnalyzer.java PRE-CREATION ql/src/test/queries/clientnegative/acid_overwrite.q PRE-CREATION ql/src/test/queries/clientnegative/delete_not_acid.q PRE-CREATION ql/src/test/queries/clientnegative/update_not_acid.q PRE
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
-CREATION ql/src/test/results/clientpositive/update_where_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25414/diff/ Testing --- Many tests included in the patch, including insert/values, update, and delete all tested against: non-partitioned tables, partitioned tables, and temp tables. Thanks, Alan Gates
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
On Sept. 9, 2014, 9:33 p.m., Eugene Koifman wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 11872 https://reviews.apache.org/r/25414/diff/1/?file=682029#file682029line11872 does this work if of implements a sublcass of AcidOutputFormat? Perhaps Class.isAssignableFrom() is a safer choice Alan Gates wrote: Changed. Actually, I had to back this out. Making this change made it so that it said all output formats were acid. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/#review52655 --- On Sept. 11, 2014, 2:17 p.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/ --- (Updated Sept. 11, 2014, 2:17 p.m.) Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Thejas Nair. Bugs: HIVE-7788 https://issues.apache.org/jira/browse/HIVE-7788 Repository: hive-git Description --- This patch adds plan generation as well as making modifications to some of the exec operators to make insert/value, update, and delete work. The patch is large, but about 2/3 of that are tests. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5d2e6b0 data/conf/tez/hive-site.xml 0b3877c itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java 1a84024 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java 9807497 itests/src/test/resources/testconfiguration.properties 99049ca metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f1697bb ql/src/java/org/apache/hadoop/hive/ql/Context.java 7fcbe3c ql/src/java/org/apache/hadoop/hive/ql/Driver.java 9953919 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 4246d68 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java f018ca0 ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java e3bc3b1 ql/src/java/org/apache/hadoop/hive/ql/hooks/WriteEntity.java 7f1d71b ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java b1c4441 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 264052f ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 8354ad9 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 32d2f7a ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2b1a345 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 4acafba ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java 96a5d78 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java 5c711cf ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 5195748 ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 911ac8a ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 496f6a6 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 3e3926e ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java ad91b0f ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 2dbf1c8 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 6dce30c ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 5695f35 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 5164b16 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 789c780 ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 63ecb8d ql/src/test/org/apache/hadoop/hive/ql/parse/TestUpdateDeleteSemanticAnalyzer.java PRE-CREATION ql/src/test/queries/clientnegative/acid_overwrite.q PRE-CREATION ql/src/test/queries/clientnegative/delete_not_acid.q PRE-CREATION ql/src/test/queries/clientnegative/update_not_acid.q PRE-CREATION ql/src/test/queries/clientnegative/update_partition_col.q PRE-CREATION ql/src/test/queries/clientpositive/delete_all_non_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/delete_all_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/delete_orig_table.q PRE-CREATION ql/src/test/queries/clientpositive/delete_tmp_table.q PRE-CREATION ql/src/test/queries/clientpositive/delete_where_no_match.q PRE-CREATION ql/src/test/queries/clientpositive/delete_where_non_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/delete_where_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
/results/clientpositive/update_where_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25414/diff/ Testing --- Many tests included in the patch, including insert/values, update, and delete all tested against: non-partitioned tables, partitioned tables, and temp tables. Thanks, Alan Gates
Broken build
There are 7 tez qfile tests that are failing on every HiveQA run. They fail for me when I run them on trunk. I've filed: https://issues.apache.org/jira/browse/HIVE-8093 for them. I'm guessing this is related to the recent checkin for HIVE-7704, since many of those tests were added in that commit. Alan. -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Review Request 25616: HIVE-7790 Update privileges to check for update and delete
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25616/ --- Review request for hive and Thejas Nair. Bugs: HIVE-7790 https://issues.apache.org/jira/browse/HIVE-7790 Repository: hive-git Description --- Adds update and delete as action and adds checks for authorization during update and delete. Also adds passing of updated columns in case authorizer wishes to check them. Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java 53d88b0 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 298f429 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java b2f66e0 ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 3aaa09c ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationUtils.java 93df9f4 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java 093b4fd ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java 3236341 ql/src/test/queries/clientnegative/authorization_delete_nodeletepriv.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_update_noupdatepriv.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_delete.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_delete_own_table.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_update.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_update_own_table.q PRE-CREATION ql/src/test/results/clientnegative/authorization_delete_nodeletepriv.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_update_noupdatepriv.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_delete_own_table.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_update.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_update_own_table.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25616/diff/ Testing --- Added tests, both positive and negative, for update and delete, including ability to update and delete tables created by user. Also added tests for passing correct update columns. Thanks, Alan Gates
Re: Review Request 25616: HIVE-7790 Update privileges to check for update and delete
On Sept. 14, 2014, 7:13 a.m., Thejas Nair wrote: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java, line 272 https://reviews.apache.org/r/25616/diff/1/?file=688987#file688987line272 It would be good to also verify the input columns being passed here. But I don't put the input columns in the list. You don't need read permissions to update, so I'm not adding these to a list to be checked. On Sept. 14, 2014, 7:13 a.m., Thejas Nair wrote: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java, line 273 https://reviews.apache.org/r/25616/diff/1/?file=688987#file688987line273 A similar test for delete would also be useful, specially for testing the input columns being passed. Same as above on update, I'm not checking read permissions, so there's no list of input columns. On Sept. 14, 2014, 7:13 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 506 https://reviews.apache.org/r/25616/diff/1/?file=688988#file688988line506 Can you also change the variable name of tab2cols to indicate that it is the table to input column mapping (since we have updateTab2Cols) ? maybe selectTab2Cols or inputTab2Cols Changed to selectTab2Cols - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25616/#review53277 --- On Sept. 14, 2014, 4:30 a.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25616/ --- (Updated Sept. 14, 2014, 4:30 a.m.) Review request for hive and Thejas Nair. Bugs: HIVE-7790 https://issues.apache.org/jira/browse/HIVE-7790 Repository: hive-git Description --- Adds update and delete as action and adds checks for authorization during update and delete. Also adds passing of updated columns in case authorizer wishes to check them. Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java 53d88b0 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 298f429 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java b2f66e0 ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 3aaa09c ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationUtils.java 93df9f4 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java 093b4fd ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java 3236341 ql/src/test/queries/clientnegative/authorization_delete_nodeletepriv.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_update_noupdatepriv.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_delete.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_delete_own_table.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_update.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_update_own_table.q PRE-CREATION ql/src/test/results/clientnegative/authorization_delete_nodeletepriv.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_update_noupdatepriv.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_delete_own_table.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_update.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_update_own_table.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25616/diff/ Testing --- Added tests, both positive and negative, for update and delete, including ability to update and delete tables created by user. Also added tests for passing correct update columns. Thanks, Alan Gates
Review Request 19149: Stand alone metastore fails to start if new transaction values not defined in config
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19149/ --- Review request for hive and Ashutosh Chauhan. Bugs: HIVE-6606 https://issues.apache.org/jira/browse/HIVE-6606 Repository: hive-git Description --- The metastore creates instances of TxnHandler. The constructor of this class will fail if the config value for the jdbc string it expects is not defined in the config file. Fixed this by changing transaction connection to use the same JDBC connection string as the rest of the metastore. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java edc3d38 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java bbb0d28 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 4441c2f metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 560fd5a Diff: https://reviews.apache.org/r/19149/diff/ Testing --- Ran unit tests plus ran on cluster to assure issue not seen when transaction handling turned off. Thanks, Alan Gates
Review Request 19161: Heartbeats are not being sent when DbLockMgr is used and an operation holds locks
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19161/ --- Review request for hive and Ashutosh Chauhan. Bugs: HIVE-6635 https://issues.apache.org/jira/browse/HIVE-6635 Repository: hive-git Description --- Added a thread to Driver to send heartbeats. This thread only runs during the main loop in Driver.execute. I added this in a separate thread because otherwise I would have needed to add threads in every task to see if heartbeats needed to be sent. This would be very invasive, and also it's not clear it would be possible to cover all cases as there are actions that may simply take a long time (like certain metastore operations). The downside is that a query will keep running even after it's found out it's locks were aborted and only be terminated at the end. Diffs - metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 4441c2f ql/src/java/org/apache/hadoop/hive/ql/Driver.java 7dbb8be ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 535912f ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 7773f66 Diff: https://reviews.apache.org/r/19161/diff/ Testing --- Ran unit tests specific to transaction operations, as well as manual system testing. Thanks, Alan Gates
Re: Review Request 19149: Stand alone metastore fails to start if new transaction values not defined in config
On March 12, 2014, 8:21 p.m., Ashutosh Chauhan wrote: metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java, line 205 https://reviews.apache.org/r/19149/diff/1/?file=517656#file517656line205 Do we need to synchronize this method? This is really intended for use only in testing. It's only in the src area rather than test so that it can be picked up cross package for things like streaming and hive client tests. So I'm not too worried about synchronization or performance (for the next comment). I can add comments on the methods to make this clear so no one uses it when they shouldn't. On March 12, 2014, 8:21 p.m., Ashutosh Chauhan wrote: metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java, line 215 https://reviews.apache.org/r/19149/diff/1/?file=517656#file517656line215 You created prop object but didn't make use of it. Don't you want to use that prop here, instead of new Properties? Oops. Will fix. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19149/#review36974 --- On March 12, 2014, 7:20 p.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19149/ --- (Updated March 12, 2014, 7:20 p.m.) Review request for hive and Ashutosh Chauhan. Bugs: HIVE-6606 https://issues.apache.org/jira/browse/HIVE-6606 Repository: hive-git Description --- The metastore creates instances of TxnHandler. The constructor of this class will fail if the config value for the jdbc string it expects is not defined in the config file. Fixed this by changing transaction connection to use the same JDBC connection string as the rest of the metastore. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java edc3d38 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java bbb0d28 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 4441c2f metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 560fd5a Diff: https://reviews.apache.org/r/19149/diff/ Testing --- Ran unit tests plus ran on cluster to assure issue not seen when transaction handling turned off. Thanks, Alan Gates
Re: [VOTE] Apache Hive 0.13.0 Release Candidate 2
+1 (non-binding) Alan. On Apr 17, 2014, at 7:15 PM, Thejas Nair the...@hortonworks.com wrote: +1 - Verified the md5 checksums and gpg keys - Checked LICENSE, README.txt , NOTICE, RELEASE_NOTES.txt files - Build src tar.gz - Ran local mode queries with new build. I had run unit test suite with rc1 and they looked good. On Tue, Apr 15, 2014 at 2:06 PM, Harish Butani rhbut...@apache.org wrote: Apache Hive 0.13.0 Release Candidate 2 is available here: http://people.apache.org/~rhbutani/hive-0.13.0-candidate-2 Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1011 Source tag for RCN is at: https://svn.apache.org/repos/asf/hive/tags/release-0.13.0-rc2/ Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Modify/add component in Hive JIRA
For the new transactions work that went in as part of HIVE-5317 I’ve been using “Locking” as the component for JIRAs. This work is related to locking, but that label doesn’t really cover all the transaction management being done. Could we add a label for “Transaction Management” or modify the “Locking” label to be “Locking/Transaction Management”? Alan. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [DISCUSS] Proposed Changes to the Apache Hive Project Bylaws
One other benefit in rotating chairs is that it exposes more of Hive’s PMC members to the board and other Apache old timers. This is helpful in getting better integrated into Apache and becoming a candidate for Apache membership. It is also an excellent education in the Apache Way for those who serve. Alan. On Dec 31, 2013, at 3:30 PM, Lefty Leverenz leftylever...@gmail.com wrote: Okay, I'm convinced that one-year terms for the chair are reasonable. Thanks for the reassurance, Edward and Thejas. Is 24h rule is needed at all? In other projects, I've seen patches simply reverted by author (or someone else). It's a rare occurrence, and it should be possible to revert a patch if someone -1s it after commit, esp. within the same 24 hours when not many other changes are in. Sergey makes a good point, but the 24h rule seems helpful in prioritizing tasks. We're all deadline-driven, right? I'm the chief culprit of seeing patch available and ignoring it until it has been committed. Then if I find some minor typo or doc issue, I'm embarrassed at posting a comment after the commit because nobody wants to revert a patch just for documentation. -- Lefty On Sun, Dec 29, 2013 at 12:06 PM, Thejas Nair the...@hortonworks.comwrote: On Sun, Dec 29, 2013 at 12:06 AM, Lefty Leverenz leftylever...@gmail.com wrote: Let's discuss annual rotation of the PMC chair a bit more. Although I agree with the points made in favor, I wonder about frequent loss of expertise and needing to establish new relationships. What's the ramp-up time? The ramp up time is not significant, as you can see from the list of responsibilities mentioned here - http://www.apache.org/dev/pmc.html#chair . We have enough people in PMC who have been involved with Apache project for long time and are familiar with apache bylaws and way of doing things. Also, the former PMC chairs are likely to be around to help as needed. Could a current chair be chosen for another consecutive term? Could two chairs alternate years indefinitely? I would take the meaning of rotation to mean that we have a new chair for the next term. I think it should be OK to have same chair in alternative year. 2 years is a long time and it sounds reasonable given the size of the community ! :) Do many other projects have annual rotations? Yes, at least hadoop and pig project have that. I could not find by-laws pages easily for other projects. Would it be inconvenient to change chairs in the middle of a release? No. The PMC Chair position does not have any special role in a release. And now to trivialize my comments: while making other changes, let's fix this typo: Membership of the PMC can be revoked by an unanimous vote ... *(should be a unanimous ... just like a university because the rule is based on sound, not spelling)*. I think you should feel free to fix such a typos in this wiki without a vote on it ! :) -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: How do you run single query test(s) after mavenization?
The rest of the ant instances are okay because the MVN section afterwards gives the alternative, but should we keep ant or make the replacements? - 9. Now you can run the ant 'thriftif' target ... - 11. ant thriftif -Dthrift.home=... - 15. ant thriftif - 18. ant clean package - The maven equivalent of ant thriftif is: mvn clean install -Pthriftif -DskipTests -Dthrift.home=/usr/local I have not generated the thrift stuff recently. It would be great if Alan or someone else who has would update this section. I can take a look at this. It works with pretty minimal changes. Alan. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: How do you run single query test(s) after mavenization?
Ok, I’ve updated it to just have the maven instructions, since I’m assuming no one cares about the ant ones anymore. Alan. On Jan 3, 2014, at 3:46 PM, Alan Gates ga...@hortonworks.com wrote: The rest of the ant instances are okay because the MVN section afterwards gives the alternative, but should we keep ant or make the replacements? - 9. Now you can run the ant 'thriftif' target ... - 11. ant thriftif -Dthrift.home=... - 15. ant thriftif - 18. ant clean package - The maven equivalent of ant thriftif is: mvn clean install -Pthriftif -DskipTests -Dthrift.home=/usr/local I have not generated the thrift stuff recently. It would be great if Alan or someone else who has would update this section. I can take a look at this. It works with pretty minimal changes. Alan. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Parquet support (HIVE-5783)
Gunther, is it the case that there is anything extra that needs to be done to ship Parquet code with Hive right now? If I read the patch correctly the Parquet jars were added to the pom and thus will be shipped as part of Hive. As long as it works out of the box when a user says “create table … stored as parquet” why do we care whether the parquet jar is owned by Hive or another project? The concern about feature mismatch in Parquet versus Hive is valid, but I’m not sure what to do about it other than assure that there are good error messages. Users will often want to use non-Hive based storage formats (Parquet, Avro, etc.). This means we need a good way to detect at SQL compile time that the underlying storage doesn’t support the indicated data type and throw a good error. Also, it’s important to be clear going forward about what Hive as a project is signing up for. If tomorrow someone decides to add a new datatype or feature we need to be clear that we expect the contributor to make this work for Hive owned formats (text, RC, sequence, ORC) but not necessarily for external formats (Parquet, Avro). Alan. On Feb 17, 2014, at 7:03 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: Brock, I'm not trying to pick winners, I'm merely trying to say that the documentation/code should match what's actually there, so folks can make informed decisions. The issue I have with the word native is that people have expectations when they hear it and I think these are not met. I've had folks ask me why we're switching the default of hive to Parquet. This isn't the case obviously, but native to most people means just that: Hive's primary format. That's why I was asking for a title of Add Parquet SerDe for the jira. That's the exact same thing that was done for Avro under the exact same circumstances: https://issues.apache.org/jira/browse/HIVE-895. Native also has other associations a) it supports the full data model/feature set and b) it's part of hive. Neither is the case and I don't think that's just a superficial difference. Support and usability will be different. That's why I think the documentation should delineate between RC/ORC/etc on one side and Parquet/Avro/etc on the other. As mentioned in the jira STORED AS was reserved for what's actually part of hive (or hadoop core in the case of sequence file as you point out). I think there are reasons for that: a) being part of the grammar implies native as above b) you need to ship the code bundled in hive-exec for this to work (which is *broken* right now) and c) like you said we shouldn't pick winners by letting some of them become a keyword and others not. For these reasons I think Parquet should use the old syntax at this point. If you have a pluggable/configurable way great, but right now we don't have that. Finally, yes, I am late to this party and I apologize for that. I'm happy to make the suggested changes myself, if that's the concern. Thanks, Gunther. On Sun, Feb 16, 2014 at 7:40 PM, Brock Noland br...@cloudera.com wrote: Hi Gunther, Please find my response inline. On Sat, Feb 15, 2014 at 5:52 PM, Gunther Hagleitner gunt...@apache.org wrote: I read through the ticket, patch and documentation Thank you very much for reading through these items! and would like to suggest some changes. There was ample time to suggest these changes prior to commit. The JIRA was created three months ago, and the title you object to and the patch was up there over two months ago. As far as I can tell this basically adds parquet SerDes to hive, but the file format remains external to hive. There is no way for hive devs to makes changes, fix bugs add, change datatypes, add features to parquet itself. As stated in many locations including the JIRA discussed here, we shouldn't be picking winner/loser file formats. We use many external libraries, none of which, all Hive developers have the ability to modify. For example most Hive developers do not have the ability to modify Sequence File. Tez is also an external library which few Hive developers can change. So: - I suggest we document it as one of the built-in SerDes and not as a native format like here: https://cwiki.apache.org/confluence/display/Hive/Parquet (and here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual) - I vote for the jira to say Add parquet SerDes to Hive and not Native support The change provides the ability to create a parquet table with Hive, natively. Therefore I don't see the issue you have with the word native. - I think we should revert the change to the grammar to allow STORED AS PARQUET until we have a mechanism to do that for all SerDes, i.e.: someone picks up: HIVE-5976. (I also don't think this actually works properly unless we bundle parquet in hive-exec, which I don't think we want.) Again, you could have provided this feedback many moons ago.
Re: Timeline for the Hive 0.13 release?
Sure. I’d really like to get the work related to HIVE-5317 in 0.13. HIVE-5843 is patch available and hopefully can be checked in today. There are several more that depend on that one and can’t be made patch available until then (HIVE-6060, HIVE-6319, HIVE-6460, and HIVE-5687). I don’t want to hold up the branching, but are you ok with those going in after the branch? Alan. On Mar 3, 2014, at 7:53 PM, Harish Butani hbut...@hortonworks.com wrote: I plan to create the branch 5pm PST tomorrow. Ok with everybody? regards, Harish. On Feb 21, 2014, at 5:44 PM, Lefty Leverenz leftylever...@gmail.com wrote: That's appropriate -- let the Hive release march forth on March 4th. -- Lefty On Fri, Feb 21, 2014 at 4:04 PM, Harish Butani hbut...@hortonworks.comwrote: Ok,let’s set it for March 4th . regards, Harish. On Feb 21, 2014, at 12:14 PM, Brock Noland br...@cloudera.com wrote: Might as well make it March 4th or 5th. Otherwise folks will burn weekend time to get patches in. On Fri, Feb 21, 2014 at 2:10 PM, Harish Butani hbut...@hortonworks.com wrote: Yes makes sense. How about we postpone the branching until 10am PST March 3rd, which is the following Monday. Don’t see a point of setting the branch time to a Friday evening. Do people agree? regards, Harish. On Feb 21, 2014, at 11:04 AM, Brock Noland br...@cloudera.com wrote: +1 On Fri, Feb 21, 2014 at 1:02 PM, Thejas Nair the...@hortonworks.com wrote: Can we wait for some few more days for the branching ? I have a few more security fixes that I would like to get in, and we also have a long pre-commit queue ahead right now. How about branching around Friday next week ? By then hadoop 2.3 should also be out as that vote has been concluded, and we can get HIVE-6037 in as well. -Thejas On Sun, Feb 16, 2014 at 5:32 PM, Brock Noland br...@cloudera.com wrote: I'd love to see HIVE-6037 in the 0.13 release. I have +1'ed it pending tests. Brock On Sun, Feb 16, 2014 at 7:23 PM, Navis류승우 navis@nexr.com wrote: HIVE-6037 is for generating hive-default.template file from HiveConf. Could it be included in this release? If it's not, I'll suspend further rebasing of it till next release (conflicts too frequently). 2014-02-16 20:38 GMT+09:00 Lefty Leverenz leftylever...@gmail.com : I'll try to catch up on the wikidocs backlog for 0.13.0 patches in time for the release. It's a long and growing list, though, so no promises. Feel free to do your own documentation, or hand it off to a friendly in-house writer. -- Lefty, self-appointed Hive docs maven On Sat, Feb 15, 2014 at 1:28 PM, Thejas Nair the...@hortonworks.com wrote: Sounds good to me. On Fri, Feb 14, 2014 at 7:29 PM, Harish Butani hbut...@hortonworks.com wrote: Hi, Its mid feb. Wanted to check if the community is ready to cut a branch. Could we cut the branch in a week , say 5pm PST 2/21/14? The goal is to keep the release cycle short: couple of weeks; so after the branch we go into stabilizing mode for hive 0.13, checking in only blocker/critical bug fixes. regards, Harish. On Jan 20, 2014, at 9:25 AM, Brock Noland br...@cloudera.com wrote: Hi, I agree that picking a date to branch and then restricting commits to that branch would be a less time intensive plan for the RM. Brock On Sat, Jan 18, 2014 at 4:21 PM, Harish Butani hbut...@hortonworks.com wrote: Yes agree it is time to start planning for the next release. I would like to volunteer to do the release management duties for this release(will be a great experience for me) Will be happy to do it, if the community is fine with this. regards, Harish. On Jan 17, 2014, at 7:05 PM, Thejas Nair the...@hortonworks.com wrote: Yes, I think it is time to start planning for the next release. For 0.12 release I created a branch and then accepted patches that people asked to be included for sometime, before moving a phase of accepting only critical bug fixes. This turned out to be laborious. I think we should instead give everyone a few weeks to get any patches they are working on to be ready, cut the branch, and take in only critical bug fixes to the branch after that. How about cutting the branch around mid-February and targeting to release in a week or two after that. Thanks, Thejas On Fri, Jan 17, 2014 at 4:39 PM, Carl Steinbach c...@apache.org wrote: I was wondering what people think about setting a tentative date for the Hive 0.13 release? At an old Hive Contrib meeting we agreed that Hive should follow a time-based release model with new releases every four months. If we follow that schedule we're due for the next release in mid-February. Thoughts? Thanks. Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain
Re: [VOTE] Apache Hive 0.13.1 Release Candidate 1
So this isn’t a technical issue, just concern about the delays in the mailing list? Why not just extend the voting period then, until say Monday? Alan. On May 15, 2014, at 3:17 PM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi Folks, I'm canceling this vote and withdrawing the RC1 candidate for the following reasons: a) I've talked to a couple of other people who haven't seen my mail updates to this thread, and saw my initial vote mail a bit late too. b) There's at least one other person that has attempted to reply to this thread, and I don't see the replies yet. Thus, when the mailing list channel isn't reliably working, the ability for people to +1 or -1 is taken away, and this does not work. (We don't want a situation where 3 people go ahead and +1, and that arrives before today evening, thus making the release releasable, while someone else discovers a breaking issue that should stop it, but is not able to have their objection or -1 appear in time.) I'm open to suggestions on how to proceed with the voting process. We could wait out this week and hope the ASF mailing list issues are resolved, but if it takes too much longer than that, we also have the issue of delaying an important bugfix release. Thoughts? -Sushanth (3:15PM PDT, May 15 2014) On Thu, May 15, 2014 at 11:46 AM, Sushanth Sowmyan khorg...@gmail.com wrote: The apache dev list seems to still be a little wonky, Prasanth mailed me saying he'd replied to this thread with the following content, that I don't see in this thread: Hi Sushanth https://issues.apache.org/jira/browse/HIVE-7067 This bug is critical as it returns wrong results for min(), max(), join queries that uses date/timestamp columns from ORC table. The reason for this issue is, for these datatypes ORC returns java objects whereas for all other types ORC returns writables. When get() is performed on their corresponding object inspectors, writables return a new object where as java object returns reference. This will cause issue when any operator perform comparison on date/timestamp values (references will be overwritten with next values). More information is provided in the description of the jira. I think the severity of this bug is critical and should be included as part of 0.13.1. Can you please include this patch in RC2?” I think this meets the bar for criticality(actual bug in core feature, no workaround) and severity( incorrect results, effectively data corruption when used as source for other data), and I'm willing to spin an RC2 for this, but I would still like to follow the process I set up for jira inclusion though, to make sure I'm not being biased about this, so I would request two other +1s to champion this bug's inclusion into the release. Also, another thought here is whether it makes sense for us to try to have a VOTE with a 72 hour deadline when the mailing list still seems iffy and delaying mails by multiple hours. Any thoughts on how we should proceed? (In case this mail goes out much later than I send it out, I'm sending it out at 11:45AM PDT, Thu May 15 2014) On Thu, May 15, 2014 at 10:06 AM, Sushanth Sowmyan khorg...@gmail.com wrote: Eugene, do you know if these two failures happen on 0.13.0 as well? I would assume that TestHive_7 is an issue on 0.13.0 as well, given that the fix for it went into trunk. What is your sense for how important it is that we fix this? i.e., per my understanding, (a) It does not cause a crash or adversly affect the ability for webhcat to continue operating, and (b) It means that the feature does not work (at all, but in isolation), and that there is no work around for it. This means I treat it as critical(valid bug without workaround) but not severe(breaks product, affects other features from being used). Thus, I'm willing to include HIVE-6521 in an RC2 if we have 2 more committers +1 an inclusion request for this. As for TestHeartbeat_1, that's an interesting failure. Do you have logs on what commandline options org.apache.hive.hcatalog.templeton.LauncherDelegator sent along that caused it to break? Would that affect other job launches? On Tue, May 13, 2014 at 8:14 PM, Eugene Koifman ekoif...@hortonworks.com wrote: TestHive_7 is explained by https://issues.apache.org/jira/browse/HIVE-6521, which is in trunk but not 13.1 On Tue, May 13, 2014 at 6:50 PM, Eugene Koifman ekoif...@hortonworks.comwrote: I downloaded src tar, built it and ran webhcat e2e tests. I see 2 failures (which I don't see on trunk) TestHive_7 fails with got percentComplete map 100% reduce 0%, expected map 100% reduce 100% TestHeartbeat_1 fails to even launch the job. This looks like the root cause ERROR | 13 May 2014 18:24:00,394 | org.apache.hive.hcatalog.templeton.CatchallExceptionMapper | java.lang.NullPointerException at
Re: [VOTE] Apache Hive 0.13.1 Release Candidate 1
On May 16, 2014, at 10:51 PM, Lefty Leverenz leftylever...@gmail.com wrote: Any thoughts on how we should proceed? 1. Is the mail archive accurate now? Perhaps it could be used for vote verification. 2. What if we voted in comments on a JIRA ticket? (Lately I'm checking comment order on JIRAs because my inbox receives messages out of order.) No, it has to use mail as the primary medium I think. But the archives are accurate. Alan. The JIRA is connected to the mailing list, so it might comply with the vote-by-email rule. -- Lefty On Fri, May 16, 2014 at 2:53 PM, Alan Gates ga...@hortonworks.com wrote: So this isn’t a technical issue, just concern about the delays in the mailing list? Why not just extend the voting period then, until say Monday? Alan. On May 15, 2014, at 3:17 PM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi Folks, I'm canceling this vote and withdrawing the RC1 candidate for the following reasons: a) I've talked to a couple of other people who haven't seen my mail updates to this thread, and saw my initial vote mail a bit late too. b) There's at least one other person that has attempted to reply to this thread, and I don't see the replies yet. Thus, when the mailing list channel isn't reliably working, the ability for people to +1 or -1 is taken away, and this does not work. (We don't want a situation where 3 people go ahead and +1, and that arrives before today evening, thus making the release releasable, while someone else discovers a breaking issue that should stop it, but is not able to have their objection or -1 appear in time.) I'm open to suggestions on how to proceed with the voting process. We could wait out this week and hope the ASF mailing list issues are resolved, but if it takes too much longer than that, we also have the issue of delaying an important bugfix release. Thoughts? -Sushanth (3:15PM PDT, May 15 2014) On Thu, May 15, 2014 at 11:46 AM, Sushanth Sowmyan khorg...@gmail.com wrote: The apache dev list seems to still be a little wonky, Prasanth mailed me saying he'd replied to this thread with the following content, that I don't see in this thread: Hi Sushanth https://issues.apache.org/jira/browse/HIVE-7067 This bug is critical as it returns wrong results for min(), max(), join queries that uses date/timestamp columns from ORC table. The reason for this issue is, for these datatypes ORC returns java objects whereas for all other types ORC returns writables. When get() is performed on their corresponding object inspectors, writables return a new object where as java object returns reference. This will cause issue when any operator perform comparison on date/timestamp values (references will be overwritten with next values). More information is provided in the description of the jira. I think the severity of this bug is critical and should be included as part of 0.13.1. Can you please include this patch in RC2?” I think this meets the bar for criticality(actual bug in core feature, no workaround) and severity( incorrect results, effectively data corruption when used as source for other data), and I'm willing to spin an RC2 for this, but I would still like to follow the process I set up for jira inclusion though, to make sure I'm not being biased about this, so I would request two other +1s to champion this bug's inclusion into the release. Also, another thought here is whether it makes sense for us to try to have a VOTE with a 72 hour deadline when the mailing list still seems iffy and delaying mails by multiple hours. Any thoughts on how we should proceed? (In case this mail goes out much later than I send it out, I'm sending it out at 11:45AM PDT, Thu May 15 2014) On Thu, May 15, 2014 at 10:06 AM, Sushanth Sowmyan khorg...@gmail.com wrote: Eugene, do you know if these two failures happen on 0.13.0 as well? I would assume that TestHive_7 is an issue on 0.13.0 as well, given that the fix for it went into trunk. What is your sense for how important it is that we fix this? i.e., per my understanding, (a) It does not cause a crash or adversly affect the ability for webhcat to continue operating, and (b) It means that the feature does not work (at all, but in isolation), and that there is no work around for it. This means I treat it as critical(valid bug without workaround) but not severe(breaks product, affects other features from being used). Thus, I'm willing to include HIVE-6521 in an RC2 if we have 2 more committers +1 an inclusion request for this. As for TestHeartbeat_1, that's an interesting failure. Do you have logs on what commandline options org.apache.hive.hcatalog.templeton.LauncherDelegator sent along that caused it to break? Would that affect other job launches? On Tue, May 13, 2014 at 8:14 PM, Eugene Koifman ekoif...@hortonworks.com wrote: TestHive_7 is explained by https://issues.apache.org/jira/browse
Re: [VOTE] Apache Hive 0.13.1 Release Candidate 1
The vote by mail requirement is an Apache one, which trumps any Hive bylaws. I really think Apache is going to frown on voting via JIRA. Alan. On May 17, 2014, at 9:15 PM, Lefty Leverenz leftylever...@gmail.com wrote: Hive bylawshttps://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Votingsay the mailing list is used for voting, but as I recall bylaws have some wiggle room. Decisions regarding the project are made by votes on the primary project development mailing list (u...@hive.apache.org u...@pig.apache.org). Where necessary, PMC voting may take place on the private Hive PMC mailing list. Votes are clearly indicated by subject line starting with [VOTE]. Votes may contain multiple items for approval and these should be clearly separated. Voting is carried out by replying to the vote mail. (Hm, the text says primary project development mailing list but then user@hive is shown in parentheses -- is that a typo in the bylaws?) Would people be willing to vote simultaneously by mail and on a jira? It's inconvenient but shouldn't be necessary after this release. -- Lefty On Sat, May 17, 2014 at 7:30 PM, Sushanth Sowmyan khorg...@gmail.comwrote: There is a technical issue as well now, as raised by Prashant. But there is also the issue that people aren't reliably able to respond/object/approve, and not knowing if/when it'll go through. I think I like Lefty's jira proposal - we could open out a jira for it and address votes there, I think I'll do that for RC2. On Fri, May 16, 2014 at 2:53 PM, Alan Gates ga...@hortonworks.com wrote: So this isn’t a technical issue, just concern about the delays in the mailing list? Why not just extend the voting period then, until say Monday? Alan. On May 15, 2014, at 3:17 PM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi Folks, I'm canceling this vote and withdrawing the RC1 candidate for the following reasons: a) I've talked to a couple of other people who haven't seen my mail updates to this thread, and saw my initial vote mail a bit late too. b) There's at least one other person that has attempted to reply to this thread, and I don't see the replies yet. Thus, when the mailing list channel isn't reliably working, the ability for people to +1 or -1 is taken away, and this does not work. (We don't want a situation where 3 people go ahead and +1, and that arrives before today evening, thus making the release releasable, while someone else discovers a breaking issue that should stop it, but is not able to have their objection or -1 appear in time.) I'm open to suggestions on how to proceed with the voting process. We could wait out this week and hope the ASF mailing list issues are resolved, but if it takes too much longer than that, we also have the issue of delaying an important bugfix release. Thoughts? -Sushanth (3:15PM PDT, May 15 2014) On Thu, May 15, 2014 at 11:46 AM, Sushanth Sowmyan khorg...@gmail.com wrote: The apache dev list seems to still be a little wonky, Prasanth mailed me saying he'd replied to this thread with the following content, that I don't see in this thread: Hi Sushanth https://issues.apache.org/jira/browse/HIVE-7067 This bug is critical as it returns wrong results for min(), max(), join queries that uses date/timestamp columns from ORC table. The reason for this issue is, for these datatypes ORC returns java objects whereas for all other types ORC returns writables. When get() is performed on their corresponding object inspectors, writables return a new object where as java object returns reference. This will cause issue when any operator perform comparison on date/timestamp values (references will be overwritten with next values). More information is provided in the description of the jira. I think the severity of this bug is critical and should be included as part of 0.13.1. Can you please include this patch in RC2?” I think this meets the bar for criticality(actual bug in core feature, no workaround) and severity( incorrect results, effectively data corruption when used as source for other data), and I'm willing to spin an RC2 for this, but I would still like to follow the process I set up for jira inclusion though, to make sure I'm not being biased about this, so I would request two other +1s to champion this bug's inclusion into the release. Also, another thought here is whether it makes sense for us to try to have a VOTE with a 72 hour deadline when the mailing list still seems iffy and delaying mails by multiple hours. Any thoughts on how we should proceed? (In case this mail goes out much later than I send it out, I'm sending it out at 11:45AM PDT, Thu May 15 2014) On Thu, May 15, 2014 at 10:06 AM, Sushanth Sowmyan khorg...@gmail.com wrote: Eugene, do you know if these two failures happen on 0.13.0 as well? I would assume that TestHive_7 is an issue on 0.13.0 as well, given that the fix
Re: [VOTE] Apache Hive 0.13.1 Release Candidate 1
Per the apache infra tweet stream mail delivery times should be back to normal as of today. I believe Sushanth decided to roll a new rc anyway. Once that is done we should be able to vote in the normal manner. Alan. Sent from my iPhone On May 19, 2014, at 17:59, Lefty Leverenz leftylever...@gmail.com wrote: Gotta side with Alan about voting by JIRA: although it's convenient for the moment, the indirect mail records wouldn't be labeled [VOTE]. Some future Hive historian could come to grief (and have to drop out of grad school). The community needs to see the votes as they come in, and shouldn't have to look in the archives or JIRA comments. What to do? When can we expect the mailing list to get back to normal? -- Lefty On Mon, May 19, 2014 at 8:00 AM, Alan Gates ga...@hortonworks.com wrote: The vote by mail requirement is an Apache one, which trumps any Hive bylaws. I really think Apache is going to frown on voting via JIRA. Alan. On May 17, 2014, at 9:15 PM, Lefty Leverenz leftylever...@gmail.com wrote: Hive bylaws https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Votingsay the mailing list is used for voting, but as I recall bylaws have some wiggle room. Decisions regarding the project are made by votes on the primary project development mailing list (u...@hive.apache.org u...@pig.apache.org). Where necessary, PMC voting may take place on the private Hive PMC mailing list. Votes are clearly indicated by subject line starting with [VOTE]. Votes may contain multiple items for approval and these should be clearly separated. Voting is carried out by replying to the vote mail. (Hm, the text says primary project development mailing list but then user@hive is shown in parentheses -- is that a typo in the bylaws?) Would people be willing to vote simultaneously by mail and on a jira? It's inconvenient but shouldn't be necessary after this release. -- Lefty On Sat, May 17, 2014 at 7:30 PM, Sushanth Sowmyan khorg...@gmail.com wrote: There is a technical issue as well now, as raised by Prashant. But there is also the issue that people aren't reliably able to respond/object/approve, and not knowing if/when it'll go through. I think I like Lefty's jira proposal - we could open out a jira for it and address votes there, I think I'll do that for RC2. On Fri, May 16, 2014 at 2:53 PM, Alan Gates ga...@hortonworks.com wrote: So this isn’t a technical issue, just concern about the delays in the mailing list? Why not just extend the voting period then, until say Monday? Alan. On May 15, 2014, at 3:17 PM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi Folks, I'm canceling this vote and withdrawing the RC1 candidate for the following reasons: a) I've talked to a couple of other people who haven't seen my mail updates to this thread, and saw my initial vote mail a bit late too. b) There's at least one other person that has attempted to reply to this thread, and I don't see the replies yet. Thus, when the mailing list channel isn't reliably working, the ability for people to +1 or -1 is taken away, and this does not work. (We don't want a situation where 3 people go ahead and +1, and that arrives before today evening, thus making the release releasable, while someone else discovers a breaking issue that should stop it, but is not able to have their objection or -1 appear in time.) I'm open to suggestions on how to proceed with the voting process. We could wait out this week and hope the ASF mailing list issues are resolved, but if it takes too much longer than that, we also have the issue of delaying an important bugfix release. Thoughts? -Sushanth (3:15PM PDT, May 15 2014) On Thu, May 15, 2014 at 11:46 AM, Sushanth Sowmyan khorg...@gmail.com wrote: The apache dev list seems to still be a little wonky, Prasanth mailed me saying he'd replied to this thread with the following content, that I don't see in this thread: Hi Sushanth https://issues.apache.org/jira/browse/HIVE-7067 This bug is critical as it returns wrong results for min(), max(), join queries that uses date/timestamp columns from ORC table. The reason for this issue is, for these datatypes ORC returns java objects whereas for all other types ORC returns writables. When get() is performed on their corresponding object inspectors, writables return a new object where as java object returns reference. This will cause issue when any operator perform comparison on date/timestamp values (references will be overwritten with next values). More information is provided in the description of the jira. I think the severity of this bug is critical and should be included as part of 0.13.1. Can you please include this patch in RC2?” I think this meets the bar for criticality(actual bug in core feature, no workaround) and severity( incorrect results, effectively data corruption when used as source
Re: [VOTE] Apache Hive 0.13.1 Release Candidate 2
+1 (non-binding) - Built it, checked the signature and md5, and ran some basic tests. Alan. On May 23, 2014, at 1:45 AM, Sushanth Sowmyan khorg...@apache.org wrote: Apache Hive 0.13.1 Release Candidate 2 is available here: http://people.apache.org/~khorgath/releases/0.13.1_RC2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1014 Source tag for RC2 is at : https://svn.apache.org/viewvc/hive/tags/release-0.13.1-rc2/ Hive PMC Members: Please test and vote. Thanks, -Sushanth -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Review Request 22996: HIVE-7090 Support session-level temporary tables in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22996/#review47005 --- What will happen if a user tries to create a view over a temp table? I'm not sure if the creation will fail (since there's no table in the database) or succeed but fail later when the user tries to use the view. Ideally it would give a nice error message, e.g. views not supported over temp tables. ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java https://reviews.apache.org/r/22996/#comment82599 AFAICT there are no security checks here on who can create tables in what database. Not sure how we should handle this, as you'd like users to be able to create temp tables even when they don't have a database they own and thus can create temp tables in. But explicitly creating them in databases they don't have permission on is going to look like a security breach. ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java https://reviews.apache.org/r/22996/#comment82600 Same as comment above on being able to create a temp table on any db. This allows moving a temp table into any db. - Alan Gates On June 28, 2014, 12:35 a.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22996/ --- (Updated June 28, 2014, 12:35 a.m.) Review request for hive, Gunther Hagleitner, Navis Ryu, and Harish Butani. Bugs: HIVE-7090 https://issues.apache.org/jira/browse/HIVE-7090 Repository: hive-git Description --- Temp tables managed in memory by SessionState. SessionHiveMetaStoreClient overrides table-related methods in HiveMetaStore to access the temp tables saved in the SessionState when appropriate. Diffs - itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniMr.java 9fb7550 itests/qtest/testconfiguration.properties 1462ecd metastore/if/hive_metastore.thrift cc802c6 metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 9e8d912 ql/src/java/org/apache/hadoop/hive/ql/Context.java abc4290 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java d8d900b ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 4d35176 ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 3df2690 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 1270520 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f934ac4 ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 71471f4 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 83d09c0 ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 2537b75 ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableLikeDesc.java cb5d64c ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 2143d0c ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 43125f7 ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java 98c3cc3 ql/src/test/org/apache/hadoop/hive/ql/parse/TestMacroSemanticAnalyzer.java 91de8da ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/TestHiveAuthorizationTaskFactory.java 20d08b3 ql/src/test/queries/clientnegative/temp_table_authorize_create_tbl.q PRE-CREATION ql/src/test/queries/clientnegative/temp_table_column_stats.q PRE-CREATION ql/src/test/queries/clientnegative/temp_table_create_like_partitions.q PRE-CREATION ql/src/test/queries/clientnegative/temp_table_index.q PRE-CREATION ql/src/test/queries/clientnegative/temp_table_partitions.q PRE-CREATION ql/src/test/queries/clientnegative/temp_table_rename.q PRE-CREATION ql/src/test/queries/clientpositive/show_create_table_temp_table.q PRE-CREATION ql/src/test/queries/clientpositive/stats19.q 51514bd ql/src/test/queries/clientpositive/temp_table.q PRE-CREATION ql/src/test/queries/clientpositive/temp_table_external.q PRE-CREATION ql/src/test/queries/clientpositive/temp_table_gb1.q PRE-CREATION ql/src/test/queries/clientpositive/temp_table_join1.q PRE-CREATION ql/src/test/queries/clientpositive/temp_table_names.q PRE-CREATION ql/src/test/queries/clientpositive/temp_table_options1.q PRE-CREATION ql/src/test/queries/clientpositive/temp_table_precedence.q PRE-CREATION ql/src/test/queries/clientpositive/temp_table_subquery1.q PRE-CREATION ql/src/test/queries/clientpositive/temp_table_windowing_expressions.q PRE-CREATION ql/src/test/results/clientnegative/temp_table_authorize_create_tbl.q.out PRE-CREATION ql/src/test/results
Version specific Hive docs
Recently a JIRA was opened in Hive to move some of the Hive documents from wiki to version control ( https://issues.apache.org/jira/browse/HIVE-3039 ). Edward commented on the JIRA: This issue was opened and died a long time ago. After I moved the documentation from the wiki to the source everyone refused to update it and it was a dead issue. I would advice not going down this very frustrating path again. ... I marked pages on the wiki that I moved as 'DO NOT EDIT THIS PAGE THIS IS NOW IN XDOCS'. People ran into this and complained they explicitly did not want to use in line documentation. You can find the history on the mailing list. The value of version specific documentation is very clear to me (and apparently to others since I cannot think of a single large project that does not do it), so I am trying to figure out if Hive is really opposed to it or just wants to keep the wikis open. I've been going over the old JIRAs and mailing list archives as Edward suggested to understand what the decision at the time was. Here's what I have found. The initial approach was covered in the notes from the July 2010 Hive contributor meetup ( https://cwiki.apache.org/confluence/display/Hive/Development+ContributorsMeetings+HiveContributorsMinutes100706 ): • There was a discussion about the plan to move the documentation off of the wiki and into version control. • Several people voiced concerns that developers/users are less likely to update the documentation if doing so requires them to submit a patch. • The new proposal for documentation reached at the meeting is as follows: • The trunk version of the documentation will be maintained on the wiki. • As part of the release process the documentation will be copied off of the wiki and converted to xdoc, and then checked into svn. • HTML documentation generated from the xdoc will be posted to the Hive webpage when the new release is posted. • Carl is going to investigate the feasibility of writing a tool that converts documentation directly from !MoinMoin wiki markup to xdoc. There was some email discussion generated by these notes which did not change the general view: http://mail-archives.apache.org/mod_mbox/hive-dev/201007.mbox/%3CAANLkTin3EWUKWj65ZzDApGlrEEWYyg9sVrgGzDaFWO7T%40mail.gmail.com%3E In a later email thread Joydeep rather forcefully argued that the wiki pages should be left open even though there are xdocs: http://mail-archives.apache.org/mod_mbox/hive-dev/201008.mbox/ajax/%3CB4F4475C5A97594A87B283C91F9E873A017FB35F%40sc-mbx05.TheFacebook.com%3E Then in August it appears it was discussed again with the outcome that docs should still be kept in source control ( https://cwiki.apache.org/confluence/display/Hive/Development+ContributorsMeetings+HiveContributorsMinutes100808 ): Discussed moving the documentation from the wiki to version control. • Probably not practical to maintain the trunk version of the docs on the wiki and roll over to version control at release time, so trunk version of docs will be maintained in vcs. • It was agreed that feature patches should include updates to the docs, but it is also acceptable to file a doc ticket if there is time pressure to commit.j • Will maintain an errata page on the wiki for collecting updates/corrections from users. These notes will be rolled into the documentation in vcs on a monthly basis. Also relevant are the two older JIRA issues covering the abortive move: https://issues.apache.org/jira/browse/HIVE-1135 https://issues.apache.org/jira/browse/HIVE-1446 It appears to me that there was lack of clarity about how to move information from the wiki to the version controlled doc. There was also opposition expressed to locking off the wiki. As far as I can tell no one was opposed to version control of the docs per se. So, I propose we let this, and similar patches that propose version specific docs in version control, go forward. There's no need to close off the wiki. There will be a tax on developers of features to add docs if they want users to know about them. But this seems to me a good thing rather than a bad thing. Thoughts? Alan.
[DISCUSS] HCatalog becoming a subproject of Hive
Hello Hive community. It is time for HCatalog to graduate from the Apache Incubator. Given the heavy dependence of HCatalog on Hive the HCatalog community agreed it made sense to explore graduating from the Incubator to become a subproject of Hive (see http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201209.mbox/%3C08C40723-8D4D-48EB-942B-8EE4327DD84A%40hortonworks.com%3E and http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201210.mbox/%3CCABN7xTCRM5wXGgJKEko0PmqDXhuAYpK%2BD-H57T29zcSGhkwGQw%40mail.gmail.com%3E ). To help both communities understand what HCatalog is and hopes to become we also developed a roadmap that summarizes HCatalog's current features, planned features, and other possible features under discussion: https://cwiki.apache.org/confluence/display/HCATALOG/HCatalog+Roadmap So we are now approaching you to see if there is agreement in the Hive community that HCatalog graduating into Hive would make sense. Alan.
Re: [DISCUSS] HCatalog becoming a subproject of Hive
On Nov 4, 2012, at 8:35 PM, Namit Jain wrote: I like the idea of Hcatalog becoming a Hive sub-project. The enhancements/bugs in the serde/metastore areas can indirectly benefit the hive community, and it will be easier for the fix to be in one place. Having said that, I don't see serde/metastore moving out of hive into a separate component. Things are tied too closely together. I am assuming that no new committers would be automatically added to Hive as part of this, and both Hive and HCatalog will continue to have its own committers. One thing in this we'd like to discuss is the HCatalog committers having commit access to the metastore sections of Hive code. That doesn't mean it has to move into HCatalog's code base. But more and more the fixes and changes we're doing in HCatalog are really in Hive's metastore. So we believe it would make sense to give HCat committers access to that component as well as HCat. Alan. Thanks, -namit On 11/3/12 2:22 AM, Alan Gates ga...@hortonworks.com wrote: Hello Hive community. It is time for HCatalog to graduate from the Apache Incubator. Given the heavy dependence of HCatalog on Hive the HCatalog community agreed it made sense to explore graduating from the Incubator to become a subproject of Hive (see http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201209.mb ox/%3C08C40723-8D4D-48EB-942B-8EE4327DD84A%40hortonworks.com%3E and http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201210.mb ox/%3CCABN7xTCRM5wXGgJKEko0PmqDXhuAYpK%2BD-H57T29zcSGhkwGQw%40mail.gmail.c om%3E ). To help both communities understand what HCatalog is and hopes to become we also developed a roadmap that summarizes HCatalog's current features, planned features, and other possible features under discussion: https://cwiki.apache.org/confluence/display/HCATALOG/HCatalog+Roadmap So we are now approaching you to see if there is agreement in the Hive community that HCatalog graduating into Hive would make sense. Alan.
Re: [DISCUSS] HCatalog becoming a subproject of Hive
I would suggest looking over the patch history of HCat committers. I think most of them have already contributed a number of patches to the metastore. All are certainly aware of how to run Hive unit tests and have an understanding of how Hive works. So I don't think it's fair to say they would be unsafe with access to the metastore. And the Hive PMC is there to assure this does not happen. If there are issues I am sure they can deal with them. Alan. On Nov 6, 2012, at 8:06 PM, Namit Jain wrote: Alan, that would not be a good idea. Metastore code is part of hive code, and it would be safer if only Hive committers had commit access to that. On 11/6/12 11:25 PM, Alan Gates ga...@hortonworks.com wrote: On Nov 4, 2012, at 8:35 PM, Namit Jain wrote: I like the idea of Hcatalog becoming a Hive sub-project. The enhancements/bugs in the serde/metastore areas can indirectly benefit the hive community, and it will be easier for the fix to be in one place. Having said that, I don't see serde/metastore moving out of hive into a separate component. Things are tied too closely together. I am assuming that no new committers would be automatically added to Hive as part of this, and both Hive and HCatalog will continue to have its own committers. One thing in this we'd like to discuss is the HCatalog committers having commit access to the metastore sections of Hive code. That doesn't mean it has to move into HCatalog's code base. But more and more the fixes and changes we're doing in HCatalog are really in Hive's metastore. So we believe it would make sense to give HCat committers access to that component as well as HCat. Alan. Thanks, -namit On 11/3/12 2:22 AM, Alan Gates ga...@hortonworks.com wrote: Hello Hive community. It is time for HCatalog to graduate from the Apache Incubator. Given the heavy dependence of HCatalog on Hive the HCatalog community agreed it made sense to explore graduating from the Incubator to become a subproject of Hive (see http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201209. mb ox/%3C08C40723-8D4D-48EB-942B-8EE4327DD84A%40hortonworks.com%3E and http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201210. mb ox/%3CCABN7xTCRM5wXGgJKEko0PmqDXhuAYpK%2BD-H57T29zcSGhkwGQw%40mail.gmail .c om%3E ). To help both communities understand what HCatalog is and hopes to become we also developed a roadmap that summarizes HCatalog's current features, planned features, and other possible features under discussion: https://cwiki.apache.org/confluence/display/HCATALOG/HCatalog+Roadmap So we are now approaching you to see if there is agreement in the Hive community that HCatalog graduating into Hive would make sense. Alan.
Re: [DISCUSS] HCatalog becoming a subproject of Hive
I am not sure where we are on this discussion. So far those who have chimed in seemed generally positive (Namit, Edward, Clark, Alexander). Namit and I have different visions for what the committership might look like, so I'd like to hear from other Hive PMC members what their view is on this. I have to say from an HCatalog perspective the proposition is much less attractive without some commit rights. On a related note, people should be aware of these threads in the Incubator list: http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/%3CCAGU5spdWHNtJxgQ8f%3DnPEXx9xNLjyjOYaFfnSw4EyAjgm1c46w%40mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/%3CCAKQbXgDZj_zMj4qSodXjMHV7xQZxpcY1-35cvq959YKLNd6tJQ%40mail.gmail.com%3E For those not inclined to read all the mails in the threads I will summarize (though I urge all PMC members of Hive and PPMC members of HCat to read both mail threads because this is highly relevant to what we are discussing). There are two salient points in these threads: 1) It is not wise to build a subproject that is distinct from the main project in the sense that it has separate community members interested in it. Bertrand, Arun, Chris Mattman, and Greg Stein all spoke against this, and all are long time Apache contributors with a lot of experience. They were all of the opinion that it was reasonable for one project to release separate products. 2) It is not wise to have committers that have access to parts of a project but not others. Greg and Bertrand argued (and Arun seemed to imply) that splitting up committer lists by sections of the code did not work out well. These insights cause me to question what we mean by subproject. I had originally envisioned something that looked like Pig and Hive did when they were subprojects of Hadoop. But this violates both 1 and 2 above. Given this input from many of the wise old timers of Apache I think we should consider what we mean when we say subproject and how tightly we are willing to integrate these projects. Personally I think it makes sense to continue to pursue integration, as I think HCat is really a set of interfaces on top of Hive and it makes sense to coalesce those into one project. I guess this would mean HCat becomes just another set of jars that Hive releases when it releases, rather than a stand alone entity. But I'm curious to hear what others think. Alan. On Nov 14, 2012, at 10:22 PM, Namit Jain wrote: The same criteria should be applied to all Hive committers. Only a committer should be able to commit code. I don¹t think we should bend this rule. Metastore is not a separate project, but a integral part of hive. -namit On 11/12/12 10:32 PM, Alan Gates ga...@hortonworks.com wrote: I would suggest looking over the patch history of HCat committers. I think most of them have already contributed a number of patches to the metastore. All are certainly aware of how to run Hive unit tests and have an understanding of how Hive works. So I don't think it's fair to say they would be unsafe with access to the metastore. And the Hive PMC is there to assure this does not happen. If there are issues I am sure they can deal with them. Alan. On Nov 6, 2012, at 8:06 PM, Namit Jain wrote: Alan, that would not be a good idea. Metastore code is part of hive code, and it would be safer if only Hive committers had commit access to that. On 11/6/12 11:25 PM, Alan Gates ga...@hortonworks.com wrote: On Nov 4, 2012, at 8:35 PM, Namit Jain wrote: I like the idea of Hcatalog becoming a Hive sub-project. The enhancements/bugs in the serde/metastore areas can indirectly benefit the hive community, and it will be easier for the fix to be in one place. Having said that, I don't see serde/metastore moving out of hive into a separate component. Things are tied too closely together. I am assuming that no new committers would be automatically added to Hive as part of this, and both Hive and HCatalog will continue to have its own committers. One thing in this we'd like to discuss is the HCatalog committers having commit access to the metastore sections of Hive code. That doesn't mean it has to move into HCatalog's code base. But more and more the fixes and changes we're doing in HCatalog are really in Hive's metastore. So we believe it would make sense to give HCat committers access to that component as well as HCat. Alan. Thanks, -namit On 11/3/12 2:22 AM, Alan Gates ga...@hortonworks.com wrote: Hello Hive community. It is time for HCatalog to graduate from the Apache Incubator. Given the heavy dependence of HCatalog on Hive the HCatalog community agreed it made sense to explore graduating from the Incubator to become a subproject of Hive (see http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20120 9. mb ox/%3C08C40723-8D4D-48EB-942B
Re: [DISCUSS] HCatalog becoming a subproject of Hive
are only based on what hive needs. Which I believe is the wrong way to look at this situation. I though to reply to this thread because I have been following this Jira: https://issues.apache.org/jira/browse/HIVE-3752 On a high level I do not like this duplication of effort and code. If hive is compatible with hcatalog I do not see why we put off merging the two at all. Hive users would get an immediate benefit if Hive used hcatalog with no apparent downside. Meanwhile we are putting this off and staying in this awkward transition phase. Personally, I do not have a problem being a hive committer and not having hcatalog commit. None of the hive work I have done has ever touched the metastore. Also of the thousands of jiras and features we have added only a small portion require metastore changes. As long as a couple active users have commit on hive and the suggested hcatalog subproject I do not think not having commit will be a roadblock in moving hive forward. On Mon, Dec 3, 2012 at 6:22 PM, Alan Gates ga...@hortonworks.com wrote: I am not sure where we are on this discussion. So far those who have chimed in seemed generally positive (Namit, Edward, Clark, Alexander). Namit and I have different visions for what the committership might look like, so I'd like to hear from other Hive PMC members what their view is on this. I have to say from an HCatalog perspective the proposition is much less attractive without some commit rights. On a related note, people should be aware of these threads in the Incubator list: http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/% 3CCAGU5spdWHNtJxgQ8f%3DnPEXx9xNLjyjOYaFfnSw4EyAjgm1c46w% 40mail.gmail.com %3E http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/% 3CCAKQbXgDZj_zMj4qSodXjMHV7xQZxpcY1-35cvq959YKLNd6tJQ%40mail.gmail.com %3 E For those not inclined to read all the mails in the threads I will summarize (though I urge all PMC members of Hive and PPMC members of HCat to read both mail threads because this is highly relevant to what we are discussing). There are two salient points in these threads: 1) It is not wise to build a subproject that is distinct from the main project in the sense that it has separate community members interested in it. Bertrand, Arun, Chris Mattman, and Greg Stein all spoke against this, and all are long time Apache contributors with a lot of experience. They were all of the opinion that it was reasonable for one project to release separate products. 2) It is not wise to have committers that have access to parts of a project but not others. Greg and Bertrand argued (and Arun seemed to imply) that splitting up committer lists by sections of the code did not work out well. These insights cause me to question what we mean by subproject. I had originally envisioned something that looked like Pig and Hive did when they were subprojects of Hadoop. But this violates both 1 and 2 above. Given this input from many of the wise old timers of Apache I think we should consider what we mean when we say subproject and how tightly we are willing to integrate these projects. Personally I think it makes sense to continue to pursue integration, as I think HCat is really a set of interfaces on top of Hive and it makes sense to coalesce those into one project. I guess this would mean HCat becomes just another set of jars that Hive releases when it releases, rather than a stand alone entity. But I'm curious to hear what others think. Alan. On Nov 14, 2012, at 10:22 PM, Namit Jain wrote: The same criteria should be applied to all Hive committers. Only a committer should be able to commit code. I don¹t think we should bend this rule. Metastore is not a separate project, but a integral part of hive. -namit On 11/12/12 10:32 PM, Alan Gates ga...@hortonworks.com wrote: I would suggest looking over the patch history of HCat committers. I think most of them have already contributed a number of patches to the metastore. All are certainly aware of how to run Hive unit tests and have an understanding of how Hive works. So I don't think it's fair to say they would be unsafe with access to the metastore. And the Hive PMC is there to assure this does not happen. If there are issues I am sure they can deal with them. Alan. On Nov 6, 2012, at 8:06 PM, Namit Jain wrote: Alan, that would not be a good idea. Metastore code is part of hive code, and it would be safer if only Hive committers had commit access to that. On 11/6/12 11:25 PM, Alan Gates ga...@hortonworks.com wrote: On Nov 4, 2012, at 8:35 PM, Namit Jain wrote: I like the idea of Hcatalog becoming a Hive sub-project. The enhancements/bugs in the serde/metastore areas can indirectly benefit the hive community, and it will be easier for the fix to be in one place. Having said that, I don't see
Re: [DISCUSS] HCatalog becoming a subproject of Hive
Namit, I was not proposing that promotion to full committership would be automatic. I assume it would still be done via a vote by the PMC. I agree that we cannot _guarantee_ committership for HCat committers in 6-9 months. But I am trying to lay out a clear path they can follow. If they don't follow the path then they won't be committers. I am also trying to make it non-preferential in that I am setting the criteria to be what I believe the Hive PMC would expect any prospective Hive committer to do. The only intended preferential part of the proposal is the Hive shepherds, which we have all agreed is a good idea. Alan. On Dec 19, 2012, at 8:23 PM, Namit Jain wrote: I don’t agree with the proposal. It is impractical to have a Hcat committer with commit access to Hcat only portions of Hive. We cannot guarantee that a Hcat committer will become a Hive committer in 6-9 months, that depends on what they do in the next 6-9 months. The current Hcat committers should spend more time in reviewing patches, work on non-Hcat areas in Hive, and then gradually become a hive committer. They should not be given any preferential treatment, and the process should be same as it would be for any other hive contributor currently. Given that the expertise of the Hcat committers, they should be inline for becoming a hive committer if they continue to work in hive, but that cannot be guaranteed. I agree that some Hive committers should try and help the existing Hcat patches, and again that is voluntary and different committers cannot be assigned to different parts of the code. Thanks, -namit On 12/20/12 1:03 AM, Carl Steinbach cwsteinb...@gmail.com wrote: Alan's proposal sounds like a good idea to me. +1 On Dec 18, 2012 5:36 PM, Travis Crawford traviscrawf...@gmail.com wrote: Alan, I think your proposal sounds great. --travis On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates ga...@hortonworks.com wrote: Carl, speaking just for myself and not as a representative of the HCat PPMC at this point, I am coming to agree with you that HCat integrating with Hive fully makes more sense. However, this makes the committer question even thornier. Travis and Namit, I think the shepherd proposal needs to lay out a clear and time bounded path to committership for HCat committers. Having HCat committers as second class Hive citizens for the long run will not be healthy. I propose the following as a starting point for discussion: All active HCat committers (those who have contributed or committed a patch in the last 6 months) will be made committers in the HCat portion only of Hive. In addition those committers will be assigned a particular shepherd who is a current Hive committer and who will be responsible for mentoring them towards full Hive committership. As a part of this mentorship the HCat committer will review patches of other contributors, contribute patches to Hive (both inside and outside of HCatalog), respond to user issues on the mailing lists, etc. It is intended that as a result of this mentorship program HCat committers can become full Hive committers in 6-9 months. No new HCat only committers will be elected in Hive after this. All Hive committers will automatically also have commit rights on HCatalog. Alan. On Dec 14, 2012, at 10:05 AM, Carl Steinbach wrote: On a functional level I don't think there is going to be much of a difference between the subproject option proposed by Travis and the other option where HCatalog becomes a TLP. In both cases HCatalog and Hive will have separate committers, separate code repositories, separate release cycles, and separate project roadmaps. Aside from ASF bureaucracy, I think the only major difference between the two options is that the subproject route will give the rest of the community the false impression that the two projects have coordinated roadmaps and a process to prevent overlapping functionality from appearing in both projects. Consequently, If these are the only two options then I would prefer that HCatalog become a TLP. On the other hand, I also agree with many of the sentiments that have already been expressed in this thread, namely that the two projects are closely related and that it would benefit the community at large if the two projects could be brought closer together. Up to this point the major source of pain for the HCatalog team has been the frequent necessity of making changes on both the Hive and HCatalog sides when implementing new features in HCatalog. This situation is compounded by the ASF requirement that release artifacts may not depend on snapshot artifacts from other ASF projects. Furthermore, if Hive adds a dependency on HCatalog then it will be subject to these same problems (in addition to the gross circular dependency!). I think the best way to avoid these problems is for HCatalog to become a Hive
Re: [VOTE] Apache Hive 0.10.0 Release Candidate 0
+1 (non-binding) Checked the check sums and key signatures. Installed it and ran a few queries. All looked good. As a note Hive should be offering a src only release and a convenience binary rather than two binaries, one with the source and one without. See the thread on general@incubator discussing this: http://mail-archives.apache.org/mod_mbox/incubator-general/201203.mbox/%3CCAOFYJNY%3DEjVHrWVvAedR3OKwCv-BkTaCbEu0ufp7OZR_gpCTiA%40mail.gmail.com%3E I think this can be solved later and need not block this release. Alan. On Dec 18, 2012, at 10:23 PM, Ashutosh Chauhan wrote: Apache Hive 0.10.0 Release Candidate 0 is available here: http://people.apache.org/~hashutosh/hive-0.10.0-rc0/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-049/org/apache/hive/ Release notes are available at: https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12320745styleName=TextprojectId=12310843Create=Createatl_token=A5KQ-2QAV-T4JA-FDED%7C70f39c6dd3cf337eaa0e3a0359687cf608903879%7Clin Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Ashutosh
Re: [DISCUSS] HCatalog becoming a subproject of Hive
sense then why don't we try to get rid of the procedural elements that would only slow down that transition? If there is angst about specific people on Hcat committers list on the Hive committers side (are there any?), then I think that should be addressed on a case by case basis but why enforce a general rule. In the same vein why have a rule saying in 6-9 months a Hcat committer becomes a Hive committer - how is that helpful? If they are changing the Hcat subproject in Hive are they not already Hive committers? And if they gain the expertise to review and commit code in the SemanticAnalyzer in a few months should they not be able to do that before 9 months are over? And if they don't get that expertise in 9 months would they really review and commit anything in the SemanticAnalyzer - I mean there are Hive committers who don't touch that piece of code today. no? Ashish On Wed, Dec 19, 2012 at 8:23 PM, Namit Jain nj...@fb.com wrote: I don’t agree with the proposal. It is impractical to have a Hcat committer with commit access to Hcat only portions of Hive. We cannot guarantee that a Hcat committer will become a Hive committer in 6-9 months, that depends on what they do in the next 6-9 months. The current Hcat committers should spend more time in reviewing patches, work on non-Hcat areas in Hive, and then gradually become a hive committer. They should not be given any preferential treatment, and the process should be same as it would be for any other hive contributor currently. Given that the expertise of the Hcat committers, they should be inline for becoming a hive committer if they continue to work in hive, but that cannot be guaranteed. I agree that some Hive committers should try and help the existing Hcat patches, and again that is voluntary and different committers cannot be assigned to different parts of the code. Thanks, -namit On 12/20/12 1:03 AM, Carl Steinbach cwsteinb...@gmail.com wrote: Alan's proposal sounds like a good idea to me. +1 On Dec 18, 2012 5:36 PM, Travis Crawford traviscrawf...@gmail.com wrote: Alan, I think your proposal sounds great. --travis On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates ga...@hortonworks.com wrote: Carl, speaking just for myself and not as a representative of the HCat PPMC at this point, I am coming to agree with you that HCat integrating with Hive fully makes more sense. However, this makes the committer question even thornier. Travis and Namit, I think the shepherd proposal needs to lay out a clear and time bounded path to committership for HCat committers. Having HCat committers as second class Hive citizens for the long run will not be healthy. I propose the following as a starting point for discussion: All active HCat committers (those who have contributed or committed a patch in the last 6 months) will be made committers in the HCat portion only of Hive. In addition those committers will be assigned a particular shepherd who is a current Hive committer and who will be responsible for mentoring them towards full Hive committership. As a part of this mentorship the HCat committer will review patches of other contributors, contribute patches to Hive (both inside and outside of HCatalog), respond to user issues on the mailing lists, etc. It is intended that as a result of this mentorship program HCat committers can become full Hive committers in 6-9 months. No new HCat only committers will be elected in Hive after this. All Hive committers will automatically also have commit rights on HCatalog. Alan. On Dec 14, 2012, at 10:05 AM, Carl Steinbach wrote: On a functional level I don't think there is going to be much of a difference between the subproject option proposed by Travis and the other option where HCatalog becomes a TLP. In both cases HCatalog and Hive will have separate committers, separate code repositories, separate release cycles, and separate project roadmaps. Aside from ASF bureaucracy, I think the only major difference between the two options is that the subproject route will give the rest of the community the false impression that the two projects have coordinated roadmaps and a process to prevent overlapping functionality from appearing in both projects. Consequently, If these are the only two options then I would prefer that HCatalog become a TLP. On the other hand, I also agree with many of the sentiments that have already been expressed in this thread, namely that the two projects are closely related and that it would benefit the community at large if the two projects
Re: [DISCUSS] HCatalog becoming a subproject of Hive
If you think that's the best path forward that's fine. I can't call a vote I don't think, since I'm not part of the Hive PMC. But I'm happy to draft a resolution for you and then let you call the vote. Should I do that? Alan. On Jan 11, 2013, at 4:34 PM, Carl Steinbach wrote: Hi Alan, I agree that submitting this for a vote is the best option. If anyone has additional proposed modifications please make them. Otherwise I propose that the Hive PMC vote on this proposal. In order for the Hive PMC to be able to vote on these changes they need to be expressed in terms of one or more of the actions listed at the end of the Hive project bylaws: https://cwiki.apache.org/confluence/display/Hive/Bylaws So I think we first need to amend to the bylaws in order to define the rights and privileges of a submodule committer, and then separately vote the HCatalog committers in as Hive submodule committers. Does this make sense? Thanks. Carl
Re: [DISCUSS] HCatalog becoming a subproject of Hive
I've created a wiki page for my proposed changes at https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committers Text to be removed is struck through. Text to be added is in italics. Any recommended changes before we vote? Alan. On Jan 17, 2013, at 2:08 PM, Carl Steinbach wrote: Sounds like a good plan to me. Since Ashutosh is a member of both the Hive and HCatalog PMCs it probably makes more sense for him to call the vote, but I'm willing to do it too. On Wed, Jan 16, 2013 at 8:24 AM, Alan Gates ga...@hortonworks.com wrote: If you think that's the best path forward that's fine. I can't call a vote I don't think, since I'm not part of the Hive PMC. But I'm happy to draft a resolution for you and then let you call the vote. Should I do that? Alan. On Jan 11, 2013, at 4:34 PM, Carl Steinbach wrote: Hi Alan, I agree that submitting this for a vote is the best option. If anyone has additional proposed modifications please make them. Otherwise I propose that the Hive PMC vote on this proposal. In order for the Hive PMC to be able to vote on these changes they need to be expressed in terms of one or more of the actions listed at the end of the Hive project bylaws: https://cwiki.apache.org/confluence/display/Hive/Bylaws So I think we first need to amend to the bylaws in order to define the rights and privileges of a submodule committer, and then separately vote the HCatalog committers in as Hive submodule committers. Does this make sense? Thanks. Carl
Re: Review Request 25616: HIVE-7790 Update privileges to check for update and delete
On Sept. 15, 2014, 7:24 a.m., Thejas Nair wrote: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java, line 272 https://reviews.apache.org/r/25616/diff/1/?file=688987#file688987line272 Wouldn't select permissions for using column j in where clause be needed ? In most databases, you get to know the number of rows getting updated. Using that information, with the query in the test, you could find number of columns where j = 3. I haven't verified what SQL spec says about this (privileges needed for including columns in where clause in update statement.) Postgres says it is needed : http://www.postgresql.org/docs/9.2/static/sql-update.html You must have the UPDATE privilege on the table, or at least on the column(s) that are listed to be updated. You must also have the SELECT privilege on any column whose values are read in the expressions or condition. I looked through the SQL spec and couldn't figure it out one way or another. I chose this route because it seemed odd to require SELECT privileges for UPDATE and DELETE. I see what you're saying about being able to tease out information such as how many rows match a where clause. If we want to require SELECT privileges for these operations that's ok. I'll just need to rework a few pieces of the patch and some of the tests. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25616/#review53316 --- On Sept. 14, 2014, 4:30 a.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25616/ --- (Updated Sept. 14, 2014, 4:30 a.m.) Review request for hive and Thejas Nair. Bugs: HIVE-7790 https://issues.apache.org/jira/browse/HIVE-7790 Repository: hive-git Description --- Adds update and delete as action and adds checks for authorization during update and delete. Also adds passing of updated columns in case authorizer wishes to check them. Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java 53d88b0 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 298f429 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java b2f66e0 ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 3aaa09c ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationUtils.java 93df9f4 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java 093b4fd ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java 3236341 ql/src/test/queries/clientnegative/authorization_delete_nodeletepriv.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_update_noupdatepriv.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_delete.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_delete_own_table.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_update.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_update_own_table.q PRE-CREATION ql/src/test/results/clientnegative/authorization_delete_nodeletepriv.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_update_noupdatepriv.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_delete_own_table.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_update.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_update_own_table.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25616/diff/ Testing --- Added tests, both positive and negative, for update and delete, including ability to update and delete tables created by user. Also added tests for passing correct update columns. Thanks, Alan Gates
Re: Review Request 25616: HIVE-7790 Update privileges to check for update and delete
On Sept. 16, 2014, 6:42 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 741 https://reviews.apache.org/r/25616/diff/2/?file=690379#file690379line741 should we skip it from ReadEntity if none of the columns are being used ? Though, that case is not going to be common. eg a query like 'update table set j=null;' should not require select privileges on the table, as there is no columns in where clause or value expression. Note that this is a change that we can also make in future without breaking users. (making a change in future to require fewer privileges will not break users). Ie, it does not have to be addressed in this patch. I don't think that makes any sense. If I have delete permissions but not select permissions I can delete all rows from a table but not some rows? That definitely violates the law of least astonishment. On Sept. 16, 2014, 6:42 a.m., Thejas Nair wrote: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java, line 265 https://reviews.apache.org/r/25616/diff/2/?file=690378#file690378line265 can you also use a column in the i = expression to make sure that that also gets included in the input list. eg (add another column l to table definition) update ... set i = 5 + l where j = 3; Done. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25616/#review53485 --- On Sept. 16, 2014, 3:35 a.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25616/ --- (Updated Sept. 16, 2014, 3:35 a.m.) Review request for hive and Thejas Nair. Bugs: HIVE-7790 https://issues.apache.org/jira/browse/HIVE-7790 Repository: hive-git Description --- Adds update and delete as action and adds checks for authorization during update and delete. Also adds passing of updated columns in case authorizer wishes to check them. Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java 53d88b0 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 298f429 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java b2f66e0 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java a4df8b4 ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 3aaa09c ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationUtils.java 93df9f4 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java 093b4fd ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java 3236341 ql/src/test/queries/clientnegative/authorization_delete_nodeletepriv.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_update_noupdatepriv.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_delete.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_delete_own_table.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_update.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_update_own_table.q PRE-CREATION ql/src/test/results/clientnegative/authorization_delete_nodeletepriv.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_update_noupdatepriv.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_delete_own_table.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_update.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_update_own_table.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25616/diff/ Testing --- Added tests, both positive and negative, for update and delete, including ability to update and delete tables created by user. Also added tests for passing correct update columns. Thanks, Alan Gates
Re: Review Request 25616: HIVE-7790 Update privileges to check for update and delete
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25616/ --- (Updated Sept. 16, 2014, 7:37 p.m.) Review request for hive and Thejas Nair. Bugs: HIVE-7790 https://issues.apache.org/jira/browse/HIVE-7790 Repository: hive-git Description --- Adds update and delete as action and adds checks for authorization during update and delete. Also adds passing of updated columns in case authorizer wishes to check them. Diffs (updated) - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java 53d88b0 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 298f429 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java b2f66e0 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java a4df8b4 ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 3aaa09c ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationUtils.java 93df9f4 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java 093b4fd ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java 3236341 ql/src/test/queries/clientnegative/authorization_delete_nodeletepriv.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_update_noupdatepriv.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_delete.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_delete_own_table.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_update.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_update_own_table.q PRE-CREATION ql/src/test/results/clientnegative/authorization_delete_nodeletepriv.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_update_noupdatepriv.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_delete.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_delete_own_table.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_update.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_update_own_table.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25616/diff/ Testing --- Added tests, both positive and negative, for update and delete, including ability to update and delete tables created by user. Also added tests for passing correct update columns. Thanks, Alan Gates
Re: Timeline for release of Hive 0.14
Are you wanting to track all JIRAs here or only feature JIRAs? Once you branch are you open to porting bug fix patches to the branch or are you looking to lock it down and release it very quickly? Regarding HIVE-7689, which is on the list, I have some concerns regarding that JIRA. See dialogue on the JIRA. Alan. Vikram Dixit mailto:vik...@hortonworks.com September 23, 2014 at 12:16 Hi Folks, I have created a wiki page for tracking 0.14 release: https://cwiki.apache.org/confluence/display/Hive/Hive+0.14+release+status Please take a look and let me know if I have to add any more jiras to the list. Given that the CBO branch is close to getting merged and the progress being made there, I will branch in a day or so once that commit goes in. Thanks Vikram. Navis류승우 mailto:navis@nexr.com September 12, 2014 at 0:15 Hi, I'll really appreciate if HIVE-5690 can be included, which becomes harder and harder to rebase. Other 79 patches I've assigned to can be held on. Thanks, Navis Vaibhav Gumashta mailto:vgumas...@hortonworks.com September 11, 2014 at 3:54 Hi Vikram, Can we also add: https://issues.apache.org/jira/browse/HIVE-6799 https://issues.apache.org/jira/browse/HIVE-7935 to the list. Thanks, --Vaibhav On Wed, Sep 10, 2014 at 12:18 AM, Satish Mittal satish.mit...@inmobi.com Satish Mittal mailto:satish.mit...@inmobi.com September 10, 2014 at 0:18 Hi, Can you please include HIVE-7892 (Thrift Set type not working with Hive) as well? It is under code review. Regards, Satish On Tue, Sep 9, 2014 at 2:10 PM, Suma Shivaprasad Suma Shivaprasad mailto:sumasai.shivapra...@gmail.com September 9, 2014 at 1:40 Please include https://issues.apache.org/jira/browse/HIVE-7694 as well. It is currently under review by Amareshwari and should be done in the next couple of days. Thanks Suma -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Timeline for release of Hive 0.14
I'd like to add two to the list: HIVE-8203 and HIVE-8239. The fix version has been set to 0.14 on both. Alan. Vikram Dixit mailto:vik...@hortonworks.com September 23, 2014 at 13:39 Hi Folks, I have added all the bugs to the list that had affects version set to 0.14.0 similar to the status page of 0.13.0. This will help track and nail down the fixes we need to get the release going. I request all devs to go through their jiras and let me know if any of them are critical or blockers that need to be included in 0.14.0. Please mark fix versions to be 0.14.0 for those. Thanks Vikram. Jason Dere mailto:jd...@hortonworks.com September 23, 2014 at 12:53 Would like to see HIVE-8102 and HIVE-7971 in if possible. Thanks, Jason Lars Francke mailto:lars.fran...@gmail.com September 23, 2014 at 12:33 Hi Vikram, I'd like to add HIVE-7107[1] to the list. Ashutosh voted against including it but that was based on the premise that HiveServer1 would be removed which doesn't seem to happen in this release. So I'd like to get this in as it's a very annoying issue to debug in production. It is in need of a review. Any volunteers? I'll take a look at the patch and will rebase if needed. Cheers, Lars [1] https://issues.apache.org/jira/browse/HIVE-7107 On Tue, Sep 23, 2014 at 9:16 PM, Vikram Dixit vik...@hortonworks.com Vikram Dixit mailto:vik...@hortonworks.com September 23, 2014 at 12:16 Hi Folks, I have created a wiki page for tracking 0.14 release: https://cwiki.apache.org/confluence/display/Hive/Hive+0.14+release+status Please take a look and let me know if I have to add any more jiras to the list. Given that the CBO branch is close to getting merged and the progress being made there, I will branch in a day or so once that commit goes in. Thanks Vikram. Navis류승우 mailto:navis@nexr.com September 12, 2014 at 0:15 Hi, I'll really appreciate if HIVE-5690 can be included, which becomes harder and harder to rebase. Other 79 patches I've assigned to can be held on. Thanks, Navis -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Patches to release branches
So combining Mithun's proposal with the input from Sergey and Gopal, I propose: 1) When a contributor provides a patch for a high priority bug (data corruption, wrong results, crashes) he or she should also provide a patch against the branch of the latest feature release. For example, once Hive 0.14 is released that will mean providing patches for trunk and the 0.14 branch. I believe the test infrastructure already supports running the tests against alternate branches (is that correct Brock?) so the patches can be tested against both trunk and the release branch. 2) The release manager of the feature release (e.g. Hive 0.14) will be responsible for maintaining the branch with these patch fixes. It is his or her call whether a given bug merits inclusion on the branch. If a contributor provides a patch for trunk which in the release manager's opinion should also be on the branch, then the release manager can ask the contributor to also provide a patch for the branch. Since whoever manages the feature release may not want to or be able to continue managing the branch post release, these release manager duties are transferable. But the transfer should be clear and announced on the dev list. 3) In order to make these patch fixes available to Hive users we should strive to have frequent maintenance releases. The frequency will depend on the number of bug fixes going into branch, but 6-8 weeks seems like a good goal. Hive 0.14 could be the test run of this process to see what works and what doesn't. Seem reasonable? Alan. Mithun Radhakrishnan mailto:mithun.radhakrish...@yahoo.com.INVALID September 15, 2014 at 11:16 Hey, Gopal. Thank you, that makes sense. I'll concede that delaying the initial commit till a patch is available for the recent-most release-branch won't always be viable. While I'd expect it to be easier to patch the release-branch early than late, if we (the community) would prefer a cloned JIRA in a separate queue, of course I'll go along. Anything to make the release-branch usable out of the box, without further patching. Forgive my ignorance of the relevant protocol... Would this be a change in release/patch process? Does this need codifying? I'm not sure if this needs voting on, or even who might call a vote on this. Mithun On Thursday, September 11, 2014 3:15 PM, Gopal V gop...@apache.org wrote: This is a very sensible proposal. As a start, I think we need to have people open backport JIRAs, for such issues - even if a direct merge might be hard to do with the same patch. Immediately cherry-picking the same patch should be done if it applies with very little modifications - but reworking the patch for an older release is a significant overhead for the initial commit. At the very least, we need to get past the unknowns that currently surround the last point release against the bugs already fixed in trunk. Once we have a backport queue, I'm sure the RMs in charge of the branch can moderate the community on the complexity and risk factors involved. Cheers, Gopal Gopal V mailto:gop...@apache.org September 11, 2014 at 15:15 On 9/9/14, 1:52 PM, Mithun Radhakrishnan wrote: This is a very sensible proposal. As a start, I think we need to have people open backport JIRAs, for such issues - even if a direct merge might be hard to do with the same patch. Immediately cherry-picking the same patch should be done if it applies with very little modifications - but reworking the patch for an older release is a significant overhead for the initial commit. At the very least, we need to get past the unknowns that currently surround the last point release against the bugs already fixed in trunk. Once we have a backport queue, I'm sure the RMs in charge of the branch can moderate the community on the complexity and risk factors involved. Cheers, Gopal -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Review Request 25682: HIVE-6586 - Add new parameters to HiveConf.java after commit HIVE-6037
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25682/#review55256 --- trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/25682/#comment95645 Rather than saying to turn on Hive transactions this should read as part of turning on Hive transactions. This change alone won't turn on transactions. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/25682/#comment95647 Same comment as above about turning on transactions. - Alan Gates On Oct. 1, 2014, 7:27 a.m., Lefty Leverenz wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25682/ --- (Updated Oct. 1, 2014, 7:27 a.m.) Review request for hive, Carl Steinbach, Alan Gates, Navis Ryu, Prasad Mujumdar, and Sergey Shelukhin. Bugs: HIVE-6586 https://issues.apache.org/jira/browse/HIVE-6586 Repository: hive Description --- HIVE-6586 kept track of new configuration parameters and changes to parameter descriptions when HIVE-6037 moved parameter descriptions into HiveConf.java from hive-default.xml.template. HIVE-6586.patch addresses all the fixes listed in the JIRA comments (except ones that had already been fixed), tidies up some line breaks, and makes minor edits to parameter descriptions. It also revises the descriptions of hive.txn.xxx, hive.compactor.xxx, hive.server2.async.exec.shutdown.timeout, and hive.security.authorization.createtable.owner.grants. Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1628586 Diff: https://reviews.apache.org/r/25682/diff/ Testing --- Generated hive-default.xml.template (attached to HIVE-6586) from the new HiveConf.java and reviewed the changed parameter descriptions. File Attachments Patch 2, rebased and fixed some issues https://reviews.apache.org/media/uploaded/files/2014/10/01/8e4b539e-2590-4d8e-b3b5-45175a051f9d__HIVE-6586.2.patch Thanks, Lefty Leverenz
Re: Review Request 25682: HIVE-6586 - Add new parameters to HiveConf.java after commit HIVE-6037
On Oct. 2, 2014, 10:08 p.m., Alan Gates wrote: trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1322 https://reviews.apache.org/r/25682/diff/2/?file=710033#file710033line1322 Rather than saying to turn on Hive transactions this should read as part of turning on Hive transactions. This change alone won't turn on transactions. Lefty Leverenz wrote: Agreed. How about hive.txn.manager? (To turn on Hive transactions, set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.) Should all 3 parameters list the others that are required? Oh ... make that 4 parameters, including hive.support.concurrency. For example: Set this to true on one instance of the Thrift metastore service as part of turning on Hive transactions. The parameters hive.txn.manager, hive.compactor.worker.threads, and hive.support.concurrency must also be set appropriately to turn on Hive transactions. I don't know how verbose you want to get in HiveConf.java. It might make sense to detail all the required parameters under hive.txn.manager and then have pointers to it from the compactor ones. I don't think you need pointers from hive.support.concurrency since that has other uses beyond just turning on transactions. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25682/#review55256 --- On Oct. 1, 2014, 7:27 a.m., Lefty Leverenz wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25682/ --- (Updated Oct. 1, 2014, 7:27 a.m.) Review request for hive, Carl Steinbach, Alan Gates, Navis Ryu, Prasad Mujumdar, and Sergey Shelukhin. Bugs: HIVE-6586 https://issues.apache.org/jira/browse/HIVE-6586 Repository: hive Description --- HIVE-6586 kept track of new configuration parameters and changes to parameter descriptions when HIVE-6037 moved parameter descriptions into HiveConf.java from hive-default.xml.template. HIVE-6586.patch addresses all the fixes listed in the JIRA comments (except ones that had already been fixed), tidies up some line breaks, and makes minor edits to parameter descriptions. It also revises the descriptions of hive.txn.xxx, hive.compactor.xxx, hive.server2.async.exec.shutdown.timeout, and hive.security.authorization.createtable.owner.grants. Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1628586 Diff: https://reviews.apache.org/r/25682/diff/ Testing --- Generated hive-default.xml.template (attached to HIVE-6586) from the new HiveConf.java and reviewed the changed parameter descriptions. File Attachments Patch 2, rebased and fixed some issues https://reviews.apache.org/media/uploaded/files/2014/10/01/8e4b539e-2590-4d8e-b3b5-45175a051f9d__HIVE-6586.2.patch Thanks, Lefty Leverenz
Re: [VOTE] officially stop supporting hadoop 0.20.x in hive 0.14 ?
+1. Alan. Thejas Nair mailto:the...@hortonworks.com October 7, 2014 at 15:53 I think it is time to revisit the support for hive support for hadoop 0.20. Trying to maintain support for it puts additional burden on hive contributors. The last hadoop 0.20.x version was released on Feb 2010. Hadoop 1.0 was released in Dec 2011. I believe most users have moved on to hadoop 2.x or at least hadoop 1.x . The users if any that are still on hadoop 0.20 probably don't tend to upgrade their hive versions as well. With the move to maven for builds in hive 0.13, we don't have the ability to compile against hadoop 0.20. (Nobody has complains about that AFAIK). I am not sure if hive 0.13 works well against hadoop 0.20, as it is not clear if that combination is in use. Also, most commercial vendors seem to be focussing on testing it against hadoop 2.x. I think it is time to do away with this added burden of attempting to support hadoop 0.20.x versions. Here is my +1 for officially stopping support for hadoop 0.20.x in hive 0.14 . Thanks, Thejas -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Build appears to be broken by http://www.datanucleus.org/
It appears that the jars we need are in maven central. I tried removing datanucleus completely from my maven cache then commenting out the datanucleus repository in pom.xml and the jars were properly fetched from maven central. Should I just put up a patch for this so we can get building again? Alan. Brock Noland mailto:br...@cloudera.com October 8, 2014 at 10:28 http://www.datanucleus.org/ is not accessiable [ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on project hive-exec: Error resolving project artifact: Could not transfer artifact net.hydromatic:linq4j:pom:0.4 from/to datanucleus (http://www.datanucleus.org/downloads/maven2): Access denied to: http://www.datanucleus.org/downloads/maven2/net/hydromatic/linq4j/0.4/linq4j-0.4.pom, ReasonPhrase: Forbidden. for project net.hydromatic:linq4j:jar:0.4 - [Help 1] -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Build appears to be broken by http://www.datanucleus.org/
Yeah, I found what it was. javax.jms. I'm trying to see if there's another repo I can pick it up from. Alan. Brock Noland mailto:br...@cloudera.com October 8, 2014 at 11:25 That works for me. IIRC when I did the maven build I learned we used the DN repo for something not DN related. Can you try the build with -Dmaven.repo.local=/tmp/maven and ensure it builds without any cache? If so, yes let's get rid of that repo. Also I found it works now with the -o flag. Alan Gates mailto:ga...@hortonworks.com October 8, 2014 at 11:18 It appears that the jars we need are in maven central. I tried removing datanucleus completely from my maven cache then commenting out the datanucleus repository in pom.xml and the jars were properly fetched from maven central. Should I just put up a patch for this so we can get building again? Alan. Brock Noland mailto:br...@cloudera.com October 8, 2014 at 10:28 http://www.datanucleus.org/ is not accessiable [ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on project hive-exec: Error resolving project artifact: Could not transfer artifact net.hydromatic:linq4j:pom:0.4 from/to datanucleus (http://www.datanucleus.org/downloads/maven2): Access denied to: http://www.datanucleus.org/downloads/maven2/net/hydromatic/linq4j/0.4/linq4j-0.4.pom, ReasonPhrase: Forbidden. for project net.hydromatic:linq4j:jar:0.4 - [Help 1] -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.