Re: Tez branch and tez based patches

2013-08-16 Thread Edward Capriolo
I still am not sure we are doing this the ideal way. I am not a believer in a commit-then-review branch. This issue is an example. https://issues.apache.org/jira/browse/HIVE-5108 I ask myself these questions: Does this currently work? Are their tests? If so which ones are broken? How does the

Re: Tez branch and tez based patches

2013-08-16 Thread Edward Capriolo
Commit then review, and self commit, destroys the good things we get from our normal system. http://anna.gs/blog/2013/08/12/code-review-ftw/ I am most worried about silo's and knowledge, lax testing policies, and code quality. Which I now have seen on several occasions when something is

Re: Tez branch and tez based patches

2013-08-05 Thread Alan Gates
Which talk are you referencing here? AFAIK all the Hive code we've written is being pushed back into the Tez branch, so you should be able to see it there. Alan. On Jul 29, 2013, at 9:02 PM, Edward Capriolo wrote: At ~25:00 There is a working prototype of hive which is using tez as the

Re: Tez branch and tez based patches

2013-08-05 Thread Alan Gates
On Jul 29, 2013, at 9:53 PM, Edward Capriolo wrote: Also watched http://www.ustream.tv/recorded/36323173 I definitely see the win in being able to stream inter-stage output. I see some cases where small intermediate results can be kept In memory. But I was somewhat under the impression

Re: Tez branch and tez based patches

2013-07-29 Thread Edward Capriolo
At ~25:00 There is a working prototype of hive which is using tez as the targeted runtime Can I get a look at that code? Is it on github? Edward On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates ga...@hortonworks.com wrote: Answers to some of your questions inlined. Alan. On Jul 16, 2013, at

Re: Tez branch and tez based patches

2013-07-29 Thread Edward Capriolo
Also watched http://www.ustream.tv/recorded/36323173 I definitely see the win in being able to stream inter-stage output. I see some cases where small intermediate results can be kept In memory. But I was somewhat under the impression that the map reduce spill settings kept stuff in memory,

Re: Tez branch and tez based patches

2013-07-22 Thread Gunther Hagleitner
I have finally gotten access to wiki and added the design doc: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez I've also added links to it from the jira and in general overhauled the design. Please let me know if you feel there's still stuff missing from the document. Possibly we

Re: Tez branch and tez based patches

2013-07-20 Thread Edward Capriolo
I agree we are getting into grey area with the term disruptive. For reference ( I have not been doing this all the time bad on me) we are supposed to +1 and wait a day. I am not familiar with these other engines, but the short answer is that Tez is built to work on YARN, which works well for

Re: Tez branch and tez based patches

2013-07-17 Thread Alan Gates
Answers to some of your questions inlined. Alan. On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote: There are some points I want to bring up. First, I am on the PMC. Here is something I find relevant: http://www.apache.org/foundation/how-it-works.html --

Re: Tez branch and tez based patches

2013-07-17 Thread Edward Capriolo
As all JIRA creations and updates are sent to dev@hive, creating a JIRA is de facto posting to the list. Agreed (although several ticket names are non descriptive). Possibly more out-of-band discussions need to be summarized on list. Yes. I will restart this: In my opinion we should not start

Re: Tez branch and tez based patches

2013-07-17 Thread Alan Gates
On Jul 17, 2013, at 1:41 PM, Edward Capriolo wrote: In my opinion we should limit the amount of tez related optimizations to and trunk Refactoring that cleans up code is good, but as you have pointed out there wont be a tez release until sometime this fall, and this branch will be open for

Re: Tez branch and tez based patches

2013-07-17 Thread Ashutosh Chauhan
On Wed, Jul 17, 2013 at 1:41 PM, Edward Capriolo edlinuxg...@gmail.comwrote: In my opinion we should limit the amount of tez related optimizations to and trunk Refactoring that cleans up code is good, but as you have pointed out there wont be a tez release until sometime this fall, and this

Re: Tez branch and tez based patches

2013-07-16 Thread Alan Gates
Ed, I'm not sure I understand your argument, so I'm going to try to restate it. Please tell me if I understand it correctly. I think you're saying we should not embark on big projects in Hive because: 1) There were big projects in the past that were abandoned or are not currently making

Re: Tez branch and tez based patches

2013-07-16 Thread Edward Capriolo
Alan, I agree with all your statements, with the exception of one. Second, the way Apache works is that contributors scratch the itch that = bothers them. So to argue We shouldn't do X because we never finished = Y or We shouldn't do X because we're doing Y (where X and Y are = independent) is

Re: Tez branch and tez based patches

2013-07-16 Thread Edward Capriolo
There are some points I want to bring up. First, I am on the PMC. Here is something I find relevant: http://www.apache.org/foundation/how-it-works.html -- The role of the PMC from a Foundation perspective is oversight. The main role of the PMC is not code and not

Re: Tez branch and tez based patches

2013-07-15 Thread Alan Gates
On Jul 13, 2013, at 9:48 AM, Edward Capriolo wrote: I have started to see several re factoring patches around tez. https://issues.apache.org/jira/browse/HIVE-4843 This is the only mention on the hive list I can find with tez: Makes sense. I will create the branch soon. Thanks, Ashutosh

Re: Tez branch and tez based patches

2013-07-15 Thread Edward Capriolo
The Hive bylaws, https://cwiki.apache.org/confluence/display/Hive/Bylaws, lay out what votes are needed for what. I don't see anything there about needing 3 +1s for a branch. Branching would seem to fall under code change, which requires one vote and a minimum length of 1 day. You could argue