Re: Review Request: HIVE-1636: Implement SHOW TABLES {FROM | IN} db_name
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/323/#review145 --- ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java https://reviews.apache.org/r/323/#comment302 nitpicking: I think a switch statement would be more appropriate here. ql/src/test/queries/clientpositive/show_tables.q https://reviews.apache.org/r/323/#comment303 Please add some statements that test quoting of reserved keywords, e.g: CREATE DATABASE `database`; SHOW TABLES IN `database`; ql/src/test/queries/clientpositive/show_tables.q https://reviews.apache.org/r/323/#comment299 Hive tries to mimic the syntax of MySQL as closely as possible. In MySQL the grammar for the SHOW TABLES command is: SHOW [FULL] TABLES [{FROM | IN} db_name] [LIKE 'pattern' | WHERE expr] Please modify the grammar so that Hive also requires a 'LIKE' keyword before a pattern. Ref: http://dev.mysql.com/doc/refman/5.0/en/show-tables.html ql/src/test/queries/clientpositive/show_tables.q https://reviews.apache.org/r/323/#comment300 Showing the tables in a nonexistent database should result in database does not exist error, and this query should be moved to a clientnegative query file. ql/src/test/queries/clientpositive/show_tables.q https://reviews.apache.org/r/323/#comment301 This cleanup is unnecessary. QTestUtils.clearTestSideEffects() does it for you. It gets called from TestCliDriver and TestNegativeCliDriver so everything should be OK. - Carl On 2011-01-19 21:06:38, Jonathan Natkins wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/323/ --- (Updated 2011-01-19 21:06:38) Review request for hive. Summary --- Review request for HIVE-1636. This implements the syntax SHOW TABLES [{FROM | IN} db_name] [table_pattern]. This addresses bug HIVE-1636. https://issues.apache.org/jira/browse/HIVE-1636 Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 32c6e72 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java df7e0f9 ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 128f3a6 ql/src/java/org/apache/hadoop/hive/ql/plan/ShowTablesDesc.java ec9e933 ql/src/test/queries/clientpositive/database.q 2b6c911 ql/src/test/queries/clientpositive/show_tables.q 1fa78bf ql/src/test/results/clientpositive/database.q.out a74f9ea ql/src/test/results/clientpositive/show_tables.q.out 0bbd81b Diff: https://reviews.apache.org/r/323/diff Testing --- Thanks, Jonathan
[jira] Updated: (HIVE-1636) Implement SHOW TABLES {FROM | IN} db_name
[ https://issues.apache.org/jira/browse/HIVE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1636: - Status: Open (was: Patch Available) I left some comments on reviewboard. Thanks. Implement SHOW TABLES {FROM | IN} db_name --- Key: HIVE-1636 URL: https://issues.apache.org/jira/browse/HIVE-1636 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Carl Steinbach Assignee: Jonathan Natkins Attachments: HIVE-1636.1.patch.txt, HIVE-1636.2.patch.txt Make it possible to list the tables in a specific database using the following syntax borrowed from MySQL: {noformat} SHOW TABLES [{FROM|IN} db_name] {noformat} See http://dev.mysql.com/doc/refman/5.0/en/show-tables.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984128#action_12984128 ] Krishna Kumar commented on HIVE-1918: - Ok. Will take of this via a delegating ctor. A process question: I guess I should wait for more comments from other reviewers before I create another patch in case if others are reviewing the current patch? Add export/import facilities to the hive system --- Key: HIVE-1918 URL: https://issues.apache.org/jira/browse/HIVE-1918 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Krishna Kumar Attachments: HIVE-1918.patch.txt This is an enhancement request to add export/import features to hive. With this language extension, the user can export the data of the table - which may be located in different hdfs locations in case of a partitioned table - as well as the metadata of the table into a specified output location. This output location can then be moved over to another different hadoop/hive instance and imported there. This should work independent of the source and target metastore dbms used; for instance, between derby and mysql. For partitioned tables, the ability to export/import a subset of the partition must be supported. Howl will add more features on top of this: The ability to create/use the exported data even in the absence of hive, using MR or Pig. Please see http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.20 #501
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/501/ -- [...truncated 7224 lines...] create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds deploy-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: init: install-hadoopcore: install-hadoopcore-default: ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-retrieve-hadoop-source: :: loading settings :: file = https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ivy/ivysettings.xml [ivy:retrieve] :: resolving dependencies :: org.apache.hadoop.hive#contrib;work...@vesta.apache.org [ivy:retrieve] confs: [default] [ivy:retrieve] found hadoop#core;0.20.0 in hadoop-source [ivy:retrieve] :: resolution report :: resolve 758ms :: artifacts dl 1ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 1 | 0 | 0 | 0 || 1 | 0 | - [ivy:retrieve] :: retrieving :: org.apache.hadoop.hive#contrib [ivy:retrieve] confs: [default] [ivy:retrieve] 0 artifacts copied, 1 already retrieved (0kB/0ms) install-hadoopcore-internal: setup: compile: [echo] Compiling: hbase-handler [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build-common.xml:283: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: [echo] Jar: hbase-handler test: test-shims: test-conditions: gen-test: create-dirs: compile-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds deploy-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: init: compile: ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-retrieve-hadoop-source: :: loading settings :: file = https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ivy/ivysettings.xml [ivy:retrieve] :: resolving dependencies :: org.apache.hadoop.hive#shims;work...@vesta.apache.org [ivy:retrieve] confs: [default] [ivy:retrieve] found hadoop#core;0.20.0 in hadoop-source [ivy:retrieve] found hadoop#core;0.20.3-CDH3-SNAPSHOT in hadoop-source [ivy:retrieve] :: resolution report :: resolve 1974ms :: artifacts dl 1ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 2 | 0 | 0 | 0 || 2 | 0 | - [ivy:retrieve] :: retrieving :: org.apache.hadoop.hive#shims [ivy:retrieve] confs: [default] [ivy:retrieve] 0 artifacts copied, 2 already retrieved (0kB/1ms) install-hadoopcore-internal: build_shims: [echo] Compiling shims against hadoop 0.20.0 (https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/hadoopcore/hadoop-0.20.0) [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/shims/build.xml:53: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds ivy-init-dirs: ivy-download:
Re: Build failed in Hudson: Hive-trunk-h0.20 #500
I checked the console output on the failed build. The command that's used to run the tests is 'ant -Dhadoop.version=0.20.0 .'. This probably is the cause of the problem (though I haven't verified locally). Devaraj. On Jan 20, 2011, at 5:26 AM, Carl Steinbach c...@cloudera.com wrote: HIVE-1696 introduced a build failure: Running org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.getCurrentUser()Lorg/ apache/hadoop/security/UserGroupInformation; This looks like a straightforward classpath problem (picking up 0.20.0 instead of 0.20.3-CDH3-SNAPSHOT), but when I run this locally it works, and dumping the classpath from the toplevel test target shows that 0.20.3 is getting pulled into the classpath. Anyone else have any ideas? Thanks. Carl On Thu, Jan 20, 2011 at 4:54 AM, Apache Hudson Server hud...@hudson.apache.org wrote: See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/500/ -- [...truncated 7224 lines...] create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml :40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds deploy-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml :40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: init: install-hadoopcore: install-hadoopcore-default: ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-retrieve-hadoop-source: :: loading settings :: file = https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ivy/ivysettings.xml [ivy:retrieve] :: resolving dependencies :: org.apache.hadoop.hive# contrib;work...@vesta.apache.org contrib%3bwork...@vesta.apache.org [ivy:retrieve] confs: [default] [ivy:retrieve] found hadoop#core;0.20.0 in hadoop-source [ivy:retrieve] :: resolution report :: resolve 1086ms :: artifacts dl 1ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 1 | 0 | 0 | 0 || 1 | 0 | - [ivy:retrieve] :: retrieving :: org.apache.hadoop.hive#contrib [ivy:retrieve] confs: [default] [ivy:retrieve] 0 artifacts copied, 1 already retrieved (0kB/1ms) install-hadoopcore-internal: setup: compile: [echo] Compiling: hbase-handler [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build-common.xml :283: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: [echo] Jar: hbase-handler test: test-shims: test-conditions: gen-test: create-dirs: compile-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml :40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds deploy-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml :40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: init: compile: ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-retrieve-hadoop-source: :: loading settings :: file = https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ivy/ivysettings.xml [ivy:retrieve] :: resolving dependencies :: org.apache.hadoop.hive# shims;work...@vesta.apache.org shims%3bwork...@vesta.apache.org [ivy:retrieve] confs: [default] [ivy:retrieve] found hadoop#core;0.20.0 in hadoop-source [ivy:retrieve] found hadoop#core;0.20.3-CDH3-SNAPSHOT in hadoop- source [ivy:retrieve] :: resolution report :: resolve 1738ms :: artifacts dl 1ms
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984348#action_12984348 ] Edward Capriolo commented on HIVE-1918: --- I was not implying that we should definately not add {noformat}, MapString, String partParams{noformat}. What I am asking is what is the rational for doing it? I think we should not need to add things to the metastore to export it's information, but I might be missing something. Add export/import facilities to the hive system --- Key: HIVE-1918 URL: https://issues.apache.org/jira/browse/HIVE-1918 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Krishna Kumar Attachments: HIVE-1918.patch.txt This is an enhancement request to add export/import features to hive. With this language extension, the user can export the data of the table - which may be located in different hdfs locations in case of a partitioned table - as well as the metadata of the table into a specified output location. This output location can then be moved over to another different hadoop/hive instance and imported there. This should work independent of the source and target metastore dbms used; for instance, between derby and mysql. For partitioned tables, the ability to export/import a subset of the partition must be supported. Howl will add more features on top of this: The ability to create/use the exported data even in the absence of hive, using MR or Pig. Please see http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: patch review process
+1 on option 1. This is standard operating practice (for all code changes, no exceptions) at Facebook, Google, and many other companies that care about code quality. (The latest HBase wiki makes an exception for patches that only involve one changed+unreviewed file, but I think that creates a bad incentive for people to squash stuff into one file, e.g. via inner class, instead of properly refactoring in cases where it is called for.) To better facilitate this policy, we should make sure that the workflow is as smooth as possible. (1) Make usage of postreview much more pominent in the HowToContribute docs, and do everything possible to raise awareness about its existence. I think we should also configure our repositories and check in a wrapper script in order to make post-review usage a no-brainer: ideally, we would be able to just give it a HIVE-xyz JIRA number, and the script would wget the necessary information and include that in the postreview submission. There should be very few cases where anyone needs to go through the Review Board UI. See this section and following for more: http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/#repository-configuration (2) See if we can configure JIRA to require a review board link as part of submitting a patch. This makes the system self-enforcing. Ideally, JIRA would automatically pull the patch from review board so that the contributor does not have to do a separate patch-generation + upload. (3) Eliminate all generated code from the codebase; this causes a lot of extra friction. Now that we have moved to an official thrift release, this should be easier. If we do all of these, I think the net result will actually be a better experience for contributors relative to what we have today. JVS On Jan 19, 2011, at 9:05 PM, Carl Steinbach wrote: The system that we have in place right now places all of the burden on the reviewer. If you want to look at a patch you have to download it, apply it to a clean workspace, view it using the diff viewer of your choice, and then copy your comments back to JIRA along with line numbers and code fragments in order to provide context for the author. If there's more than one reviewer, then everyone repeats these steps individually. From this perspective I think using ReviewBoard is a clear win. It eliminates the setup steps that are currently incumbent on the reviewer and consequently encourages more people to participate in the review process, which I think will result in higher quality code in the end. I think that the additional burden that ReviewBoard places on the contributor is very small (especially when compared to the effort invested in producing the patch in the first place) and can be mitigated by using tools like post-review ( http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/). I'm +1 for option (1), meaning that I think people should be required to post a review request (or update an existing request) for every patch that they submit for review on JIRA. I also think excluding small patches from this requirement is a bad idea because rational people can disagree about what qualifies as a small patch and what does not, and I'd like people to make ReviewBoard a habit instead of something that they use occasionally. I think that Yongqiang's point about scaring away new contributors with lots of requirements is valid, and I'm more that willing to post a review request for a first (or second) time contributor, but in general it's important for the contributor to create the request since only the creator can update it. Thanks. Carl On Wed, Jan 19, 2011 at 6:48 PM, yongqiang he heyongqiang...@gmail.comwrote: +1 for option 2. In general, we as a community should be nice to all contributors, and should avoid doing things that make contributors not comfortable, even that requires some work from committers. Sometimes it is especially true for new contributors, like we need to be more patience for new people. It seems a free style and contribution focused environment would be better to encourage people to do more contributions of different kinds. thanks -yongqiang On Wed, Jan 19, 2011 at 6:37 PM, Namit Jain nj...@fb.com wrote: It would be good to have a policy for submitting a new patch for review. If the patch is small, usually it is pretty easy to review.But, if it large, a GUI like reviewboard (https://reviews.apache.org) makes it easy. So, going forward, I would like to propose either of the following. 1. All patches must go through reviewboard 2. If a contributor/reviewer creates a reviewboard request, all subsequent review requests should go through the reviewboard. I would personally vote for 2., since for small patches, we don’t really need a reviewboard. But, please vote, and based on that, we can come up with a policy. Let us know, if you think of some other
[jira] Commented: (HIVE-1817) Remove Hive dependency on unrelease commons-cli 2.0 Snapshot
[ https://issues.apache.org/jira/browse/HIVE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984400#action_12984400 ] Edward Capriolo commented on HIVE-1817: --- Arge. This is a major buzzkill, our shims, build, eclipse generation processes can not deal easily with lib changes for minor versions. Option 1. This angers me. Pushed into an update by a CLI jar, but it seems like the right think to do. Option 2. More bash scripting is only going to make the problem worse As an option 3 we do have a ticket for a hive JobShell. Which would be some perfect new shell that is not so married to bash and hadoop, I imagine this new job shell would do explicit class loading and would reference nothing except HIVE_HOME from the environment. Option 4. We use some other CLI library. +1 option 1. Push forward at all costs including forcing someone with 0.20.0 to upgrade just for hive. Remove Hive dependency on unrelease commons-cli 2.0 Snapshot Key: HIVE-1817 URL: https://issues.apache.org/jira/browse/HIVE-1817 Project: Hive Issue Type: Task Components: Build Infrastructure, CLI Reporter: Carl Steinbach Assignee: Carl Steinbach Priority: Blocker Fix For: 0.7.0 Attachments: HIVE-1817.wip.1.patch.txt The Hive CLI depends on commons-cli-2.0-SNAPSHOT. This branch of of the commons-cli project is dead. Hive needs to use commons-cli-1.2 instead. See MAPREDUCE-767 for more information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: patch review process
I would argue that how to do the review has nothing to do with code quality. No review board has no connection with low quality code. The review board just provides a limited view of the diff's context. So if the review is familiar with the diff's context and is confident with the patch, it is his call to decide the review process. And also even with the review board, the reviewer sometimes still need to open his workspace to look for more contexts, like the references and the caller of a function etc. The reason that option 2 looks good to me are the rule here is just assuming people in this community acting nice to each other. The committer/reviewer creates a review board for the contributor because he is nice to the contributor, right? And the contributor should also create review board for following patches in the same issue because he knows the reviewer needs a review board to review his patch, and he should be nice to the reviewer by doing that himself. Another thing came up from an offline discussion is that are the comments on review board coming back to the jira? If no, that means the comments are not searchable, and the later followers, who want to get a whole understanding of the problem/solutions/pitfalls, need to open the patch/review board page to find all the comments from reviewers. And I want to clear that I am not agains review board. I would suggest let us move on with what each one feels comfortable and more productable, and avoid enforcing some rules on this. thanks -yongqiang On Thu, Jan 20, 2011 at 12:16 PM, Todd Lipcon t...@cloudera.com wrote: On Thu, Jan 20, 2011 at 12:12 PM, John Sichi jsi...@fb.com wrote: +1 on option 1. This is standard operating practice (for all code changes, no exceptions) at Facebook, Google, and many other companies that care about code quality. (The latest HBase wiki makes an exception for patches that only involve one changed+unreviewed file, but I think that creates a bad incentive for people to squash stuff into one file, e.g. via inner class, instead of properly refactoring in cases where it is called for.) huh, I had no idea that was the case for HBase :) We generally follow the policy that you can Commit-Then-Review for obviously trivial things (eg a doc update or spelling fix or something of that sort) To better facilitate this policy, we should make sure that the workflow is as smooth as possible. (1) Make usage of postreview much more pominent in the HowToContribute docs, and do everything possible to raise awareness about its existence. I think we should also configure our repositories and check in a wrapper script in order to make post-review usage a no-brainer: ideally, we would be able to just give it a HIVE-xyz JIRA number, and the script would wget the necessary information and include that in the postreview submission. There should be very few cases where anyone needs to go through the Review Board UI. See this section and following for more: http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/#repository-configuration (2) See if we can configure JIRA to require a review board link as part of submitting a patch. This makes the system self-enforcing. Ideally, JIRA would automatically pull the patch from review board so that the contributor does not have to do a separate patch-generation + upload. (3) Eliminate all generated code from the codebase; this causes a lot of extra friction. Now that we have moved to an official thrift release, this should be easier. If we do all of these, I think the net result will actually be a better experience for contributors relative to what we have today. JVS On Jan 19, 2011, at 9:05 PM, Carl Steinbach wrote: The system that we have in place right now places all of the burden on the reviewer. If you want to look at a patch you have to download it, apply it to a clean workspace, view it using the diff viewer of your choice, and then copy your comments back to JIRA along with line numbers and code fragments in order to provide context for the author. If there's more than one reviewer, then everyone repeats these steps individually. From this perspective I think using ReviewBoard is a clear win. It eliminates the setup steps that are currently incumbent on the reviewer and consequently encourages more people to participate in the review process, which I think will result in higher quality code in the end. I think that the additional burden that ReviewBoard places on the contributor is very small (especially when compared to the effort invested in producing the patch in the first place) and can be mitigated by using tools like post-review ( http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/). I'm +1 for option (1), meaning that I think people should be required to post a review request (or update an existing request) for every patch that they submit for review on JIRA. I
Re: patch review process
I agree on the problem of comments. For a new reviewer, it makes the process more painful. But, I think, the issues are not major, and we should decide one approach or another. Even for option 2., we can make it mandatory for the contributor to submit a reviewboard request if the reviewer asks for it. Can others vote please ? Thanks, -namit On 1/20/11 2:25 PM, yongqiang he heyongqiang...@gmail.com wrote: I would argue that how to do the review has nothing to do with code quality. No review board has no connection with low quality code. The review board just provides a limited view of the diff's context. So if the review is familiar with the diff's context and is confident with the patch, it is his call to decide the review process. And also even with the review board, the reviewer sometimes still need to open his workspace to look for more contexts, like the references and the caller of a function etc. The reason that option 2 looks good to me are the rule here is just assuming people in this community acting nice to each other. The committer/reviewer creates a review board for the contributor because he is nice to the contributor, right? And the contributor should also create review board for following patches in the same issue because he knows the reviewer needs a review board to review his patch, and he should be nice to the reviewer by doing that himself. Another thing came up from an offline discussion is that are the comments on review board coming back to the jira? If no, that means the comments are not searchable, and the later followers, who want to get a whole understanding of the problem/solutions/pitfalls, need to open the patch/review board page to find all the comments from reviewers. And I want to clear that I am not agains review board. I would suggest let us move on with what each one feels comfortable and more productable, and avoid enforcing some rules on this. thanks -yongqiang On Thu, Jan 20, 2011 at 12:16 PM, Todd Lipcon t...@cloudera.com wrote: On Thu, Jan 20, 2011 at 12:12 PM, John Sichi jsi...@fb.com wrote: +1 on option 1. This is standard operating practice (for all code changes, no exceptions) at Facebook, Google, and many other companies that care about code quality. (The latest HBase wiki makes an exception for patches that only involve one changed+unreviewed file, but I think that creates a bad incentive for people to squash stuff into one file, e.g. via inner class, instead of properly refactoring in cases where it is called for.) huh, I had no idea that was the case for HBase :) We generally follow the policy that you can Commit-Then-Review for obviously trivial things (eg a doc update or spelling fix or something of that sort) To better facilitate this policy, we should make sure that the workflow is as smooth as possible. (1) Make usage of postreview much more pominent in the HowToContribute docs, and do everything possible to raise awareness about its existence. I think we should also configure our repositories and check in a wrapper script in order to make post-review usage a no-brainer: ideally, we would be able to just give it a HIVE-xyz JIRA number, and the script would wget the necessary information and include that in the postreview submission. There should be very few cases where anyone needs to go through the Review Board UI. See this section and following for more: http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/#repo sitory-configuration (2) See if we can configure JIRA to require a review board link as part of submitting a patch. This makes the system self-enforcing. Ideally, JIRA would automatically pull the patch from review board so that the contributor does not have to do a separate patch-generation + upload. (3) Eliminate all generated code from the codebase; this causes a lot of extra friction. Now that we have moved to an official thrift release, this should be easier. If we do all of these, I think the net result will actually be a better experience for contributors relative to what we have today. JVS On Jan 19, 2011, at 9:05 PM, Carl Steinbach wrote: The system that we have in place right now places all of the burden on the reviewer. If you want to look at a patch you have to download it, apply it to a clean workspace, view it using the diff viewer of your choice, and then copy your comments back to JIRA along with line numbers and code fragments in order to provide context for the author. If there's more than one reviewer, then everyone repeats these steps individually. From this perspective I think using ReviewBoard is a clear win. It eliminates the setup steps that are currently incumbent on the reviewer and consequently encourages more people to participate in the review process, which I think will result in higher quality code in the end. I think that the additional burden that ReviewBoard places on the contributor is very
Re: patch review process
On Thu, Jan 20, 2011 at 5:56 PM, Namit Jain nj...@fb.com wrote: I agree on the problem of comments. For a new reviewer, it makes the process more painful. But, I think, the issues are not major, and we should decide one approach or another. Even for option 2., we can make it mandatory for the contributor to submit a reviewboard request if the reviewer asks for it. Can others vote please ? Thanks, -namit On 1/20/11 2:25 PM, yongqiang he heyongqiang...@gmail.com wrote: I would argue that how to do the review has nothing to do with code quality. No review board has no connection with low quality code. The review board just provides a limited view of the diff's context. So if the review is familiar with the diff's context and is confident with the patch, it is his call to decide the review process. And also even with the review board, the reviewer sometimes still need to open his workspace to look for more contexts, like the references and the caller of a function etc. The reason that option 2 looks good to me are the rule here is just assuming people in this community acting nice to each other. The committer/reviewer creates a review board for the contributor because he is nice to the contributor, right? And the contributor should also create review board for following patches in the same issue because he knows the reviewer needs a review board to review his patch, and he should be nice to the reviewer by doing that himself. Another thing came up from an offline discussion is that are the comments on review board coming back to the jira? If no, that means the comments are not searchable, and the later followers, who want to get a whole understanding of the problem/solutions/pitfalls, need to open the patch/review board page to find all the comments from reviewers. And I want to clear that I am not agains review board. I would suggest let us move on with what each one feels comfortable and more productable, and avoid enforcing some rules on this. thanks -yongqiang On Thu, Jan 20, 2011 at 12:16 PM, Todd Lipcon t...@cloudera.com wrote: On Thu, Jan 20, 2011 at 12:12 PM, John Sichi jsi...@fb.com wrote: +1 on option 1. This is standard operating practice (for all code changes, no exceptions) at Facebook, Google, and many other companies that care about code quality. (The latest HBase wiki makes an exception for patches that only involve one changed+unreviewed file, but I think that creates a bad incentive for people to squash stuff into one file, e.g. via inner class, instead of properly refactoring in cases where it is called for.) huh, I had no idea that was the case for HBase :) We generally follow the policy that you can Commit-Then-Review for obviously trivial things (eg a doc update or spelling fix or something of that sort) To better facilitate this policy, we should make sure that the workflow is as smooth as possible. (1) Make usage of postreview much more pominent in the HowToContribute docs, and do everything possible to raise awareness about its existence. I think we should also configure our repositories and check in a wrapper script in order to make post-review usage a no-brainer: ideally, we would be able to just give it a HIVE-xyz JIRA number, and the script would wget the necessary information and include that in the postreview submission. There should be very few cases where anyone needs to go through the Review Board UI. See this section and following for more: http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/#repo sitory-configuration (2) See if we can configure JIRA to require a review board link as part of submitting a patch. This makes the system self-enforcing. Ideally, JIRA would automatically pull the patch from review board so that the contributor does not have to do a separate patch-generation + upload. (3) Eliminate all generated code from the codebase; this causes a lot of extra friction. Now that we have moved to an official thrift release, this should be easier. If we do all of these, I think the net result will actually be a better experience for contributors relative to what we have today. JVS On Jan 19, 2011, at 9:05 PM, Carl Steinbach wrote: The system that we have in place right now places all of the burden on the reviewer. If you want to look at a patch you have to download it, apply it to a clean workspace, view it using the diff viewer of your choice, and then copy your comments back to JIRA along with line numbers and code fragments in order to provide context for the author. If there's more than one reviewer, then everyone repeats these steps individually. From this perspective I think using ReviewBoard is a clear win. It eliminates the setup steps that are currently incumbent on the reviewer and consequently encourages more people to participate in the review process, which I think will result in higher quality code in the end. I think that the
[jira] Updated: (HIVE-1910) Provide config parameters to control cache object pinning
[ https://issues.apache.org/jira/browse/HIVE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mac Yang updated HIVE-1910: --- Status: Patch Available (was: Open) https://reviews.apache.org/r/343/ Provide config parameters to control cache object pinning - Key: HIVE-1910 URL: https://issues.apache.org/jira/browse/HIVE-1910 Project: Hive Issue Type: Improvement Components: Configuration Affects Versions: 0.6.0 Reporter: Mac Yang Priority: Minor Attachments: HIVE-1910.1.patch.txt The metastore currently pin instances of the following classes in the secondary cache, MTable, MStorageDescriptor, MSerDeInfo, MPartition, MDatabase, MType, MFieldSchema, MOrder.class It would be better if this behavior is configurable instead of hard coded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Review Request: HIVE-1910: Provide config parameters to control cache object pinning
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/343/ --- Review request for hive. Summary --- Review for HIVE-1910 This addresses bug HIVE-1910. https://issues.apache.org/jira/browse/HIVE-1910 Diffs - http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1061438 http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml 1061438 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1061438 Diff: https://reviews.apache.org/r/343/diff Testing --- Thanks, Mac
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984563#action_12984563 ] Krishna Kumar commented on HIVE-1918: - Why export/import needs this change: It is not the export part, but rather the import part which needs this change. While creating a partition as part of an import, we need to be able to create the partition along with its ancillary data including partition parameters. But first part of the existing create partition flow (AddPartitionDesc - DDLTask.addPartition - Hive.createPartition) did not support partition params specification but the second part (metastore.api.Partition - IMetaStoreClient.add_partition - HiveMetaStore.HMSHandler.add_partition - ObjectStore.addPartition) does. So I added the ability to pass the partition parameters along in the first part of the flow. In terms of options for compatible changes, there are two I can see: 1. The solution suggested above. Add an additional ctor so that no existing code breaks. {noformat} public Partition(Table tbl, MapString, String partSpec, Path location) { this(tbl, partSpec, location, null); } public Partition(Table tbl, MapString, String partSpec, Path location, MapString, String partParams) {...} {noformat} 2. Have only the current ctor but in Hive.createPartition get the underlying metastore.api.Partition and set the parameters to it before passing it on to the metastoreClient. Thoughts? Add export/import facilities to the hive system --- Key: HIVE-1918 URL: https://issues.apache.org/jira/browse/HIVE-1918 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Krishna Kumar Attachments: HIVE-1918.patch.txt This is an enhancement request to add export/import features to hive. With this language extension, the user can export the data of the table - which may be located in different hdfs locations in case of a partitioned table - as well as the metadata of the table into a specified output location. This output location can then be moved over to another different hadoop/hive instance and imported there. This should work independent of the source and target metastore dbms used; for instance, between derby and mysql. For partitioned tables, the ability to export/import a subset of the partition must be supported. Howl will add more features on top of this: The ability to create/use the exported data even in the absence of hive, using MR or Pig. Please see http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1872) Hive process is exiting on executing ALTER query
[ https://issues.apache.org/jira/browse/HIVE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath R updated HIVE-1872: - Release Note: Avoiding System.exit () for killing the MR Process , instead kill the task ( which are not required to be executed ) Status: Patch Available (was: Open) Hive process is exiting on executing ALTER query Key: HIVE-1872 URL: https://issues.apache.org/jira/browse/HIVE-1872 Project: Hive Issue Type: Bug Components: CLI, Server Infrastructure Affects Versions: 0.6.0 Environment: SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (3) Hadoop 0.20.1 Hive 0.6.0 Reporter: Bharath R Attachments: HIVE-1872.1.patch Hive process is exiting on executing the below queries in the same order as mentioned 1) CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040' 2) ALTER TABLE SAMPLETABLE add Partition(ds='sf') location '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse' After the second query execution , the hive throws the below exception and exiting the process 10:09:03 ERROR exec.DDLTask: FAILED: Error in metadata: table is partitioned but partition spec is not specified or tab: {ipz=100} org.apache.hadoop.hive.ql.metadata.HiveException: table is partitioned but partition spec is not specified or tab: {ipz=100} at org.apache.hadoop.hive.ql.metadata.Table.isValidSpec(Table.java:341) at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:902) at org.apache.hadoop.hive.ql.exec.DDLTask.addPartition(DDLTask.java:282) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:191) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:114) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:378) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:366) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) As the alter query is incorrect the exception was thrown, ideally it should be ALTER TABLE SAMPLETABLE add Partition(ds='sf',ipz=100) location '/user/hive/warehouse'. It is not good to exit the HIVE process when the query is incorrect. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: patch review process
I was thinking about it. Let us do the following: For big patches (a big patch is one which involves changing more than 10 files - those files can be test/code changes or even generated files), reviewBoard is mandatory. For smaller patches, reviewBoard is optional. However, if any reviewer asks for a reviewBoard request from the committer, it becomes mandatory, and the contributor needs to create the reviewBoard request. As good practice, also copy important review comments in the jira, so that they are searchable and give enough context to a new contributor. Thanks, -namit On 1/20/11 12:16 PM, Todd Lipcon t...@cloudera.com wrote: On Thu, Jan 20, 2011 at 12:12 PM, John Sichi jsi...@fb.com wrote: +1 on option 1. This is standard operating practice (for all code changes, no exceptions) at Facebook, Google, and many other companies that care about code quality. (The latest HBase wiki makes an exception for patches that only involve one changed+unreviewed file, but I think that creates a bad incentive for people to squash stuff into one file, e.g. via inner class, instead of properly refactoring in cases where it is called for.) huh, I had no idea that was the case for HBase :) We generally follow the policy that you can Commit-Then-Review for obviously trivial things (eg a doc update or spelling fix or something of that sort) To better facilitate this policy, we should make sure that the workflow is as smooth as possible. (1) Make usage of postreview much more pominent in the HowToContribute docs, and do everything possible to raise awareness about its existence. I think we should also configure our repositories and check in a wrapper script in order to make post-review usage a no-brainer: ideally, we would be able to just give it a HIVE-xyz JIRA number, and the script would wget the necessary information and include that in the postreview submission. There should be very few cases where anyone needs to go through the Review Board UI. See this section and following for more: http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/#repos itory-configuration (2) See if we can configure JIRA to require a review board link as part of submitting a patch. This makes the system self-enforcing. Ideally, JIRA would automatically pull the patch from review board so that the contributor does not have to do a separate patch-generation + upload. (3) Eliminate all generated code from the codebase; this causes a lot of extra friction. Now that we have moved to an official thrift release, this should be easier. If we do all of these, I think the net result will actually be a better experience for contributors relative to what we have today. JVS On Jan 19, 2011, at 9:05 PM, Carl Steinbach wrote: The system that we have in place right now places all of the burden on the reviewer. If you want to look at a patch you have to download it, apply it to a clean workspace, view it using the diff viewer of your choice, and then copy your comments back to JIRA along with line numbers and code fragments in order to provide context for the author. If there's more than one reviewer, then everyone repeats these steps individually. From this perspective I think using ReviewBoard is a clear win. It eliminates the setup steps that are currently incumbent on the reviewer and consequently encourages more people to participate in the review process, which I think will result in higher quality code in the end. I think that the additional burden that ReviewBoard places on the contributor is very small (especially when compared to the effort invested in producing the patch in the first place) and can be mitigated by using tools like post-review ( http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/). I'm +1 for option (1), meaning that I think people should be required to post a review request (or update an existing request) for every patch that they submit for review on JIRA. I also think excluding small patches from this requirement is a bad idea because rational people can disagree about what qualifies as a small patch and what does not, and I'd like people to make ReviewBoard a habit instead of something that they use occasionally. I think that Yongqiang's point about scaring away new contributors with lots of requirements is valid, and I'm more that willing to post a review request for a first (or second) time contributor, but in general it's important for the contributor to create the request since only the creator can update it. Thanks. Carl On Wed, Jan 19, 2011 at 6:48 PM, yongqiang he heyongqiang...@gmail.com wrote: +1 for option 2. In general, we as a community should be nice to all contributors, and should avoid doing things that make contributors not comfortable, even that requires some work from committers. Sometimes it is especially true
Re: patch review process
On Jan 20, 2011, at 10:53 PM, Namit Jain wrote: As good practice, also copy important review comments in the jira, so that they are searchable and give enough context to a new contributor. The old review board instance did this automatically. Todd has an INFRA JIRA open to get this for reviews.apache.org too: https://issues.apache.org/jira/browse/INFRA-3153 JVS