Re: Hive 0.7.0 Release Candidate 0
Wondering if https://issues.apache.org/jira/browse/HIVE-1995 should also be considered for 0.7 ? Ashutosh On Thu, Feb 17, 2011 at 23:57, Carl Steinbach c...@cloudera.com wrote: http://people.apache.org/~cws/hive-0.7.0-candidate-0/ Please vote.
Re: Hive 0.7.0 Release Candidate 0
Great. Thanks, Carl. Ashutosh On Fri, Feb 18, 2011 at 14:20, Carl Steinbach c...@cloudera.com wrote: Hi Ashutosh, I backported it just now. I'll cut another RC early next week to include this. Thanks. Carl On Fri, Feb 18, 2011 at 1:37 PM, Ashutosh Chauhan hashut...@apache.orgwrote: Wondering if https://issues.apache.org/jira/browse/HIVE-1995 should also be considered for 0.7 ? Ashutosh On Thu, Feb 17, 2011 at 23:57, Carl Steinbach c...@cloudera.com wrote: http://people.apache.org/~cws/hive-0.7.0-candidate-0/ Please vote.
hooks in metastore functions
Hi all, I have a requirement that every time some change on metastore takes place, we have some logic which needs to be run. For example, if a new table is getting created in metastore I want to send a message to a message bus. Easiest way for this to work is to add the logic in createTable(). Control it by a hiveConf param and turn it off by default. Alternative way is via hooks. Have this extra logic in hook and then load and fire the hook if its available. Does anyone has an opinion which of these two is preferable. Second one requires new hook loading and execution logic. I am currently interested in four functions: createTable() dropTable() addPartition() dropPartition(). Current, HiveMetaHook which exists in createTable() doesn't perfectly fit the bill, since it is fired only when user expresses it in his create table statement (i.e., if he has specified a storage handler) Instead I want to have this logic always run. If it is unclear, let me know, I can post the code which can demonstrate my usecase. Ashutosh
Re: hooks in metastore functions
It might be possible to extend and modify the HiveMetaHook interface. But, I think keeping them separate is better because MetaHook and MetaStoreListener are interfaces for two different functionalities. MetaHook is for communicating with external system if there is a need for it. MetaStoreListener observe changes on metastore and run some logic in response to those changes. What do you think? Ashutosh On Wed, Mar 9, 2011 at 13:36, John Sichi jsi...@fb.com wrote: Couldn't we reuse HiveMetaHook for this new purpose (with an instance loaded via global config vs associated with the table handler)? JVS On Mar 8, 2011, at 2:12 PM, Ashutosh Chauhan wrote: Hi all, I have a requirement that every time some change on metastore takes place, we have some logic which needs to be run. For example, if a new table is getting created in metastore I want to send a message to a message bus. Easiest way for this to work is to add the logic in createTable(). Control it by a hiveConf param and turn it off by default. Alternative way is via hooks. Have this extra logic in hook and then load and fire the hook if its available. Does anyone has an opinion which of these two is preferable. Second one requires new hook loading and execution logic. I am currently interested in four functions: createTable() dropTable() addPartition() dropPartition(). Current, HiveMetaHook which exists in createTable() doesn't perfectly fit the bill, since it is fired only when user expresses it in his create table statement (i.e., if he has specified a storage handler) Instead I want to have this logic always run. If it is unclear, let me know, I can post the code which can demonstrate my usecase. Ashutosh
Re: Review Request: Review request for HIVE-2038
On 2011-04-12 03:07:54, Carl Steinbach wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, line 999 https://reviews.apache.org/r/581/diff/1/?file=15625#file15625line999 Unrelated bugfix? Related bugfix, I will say : ) Without it, when drop partition returns from object store, partition object doesn't contain partition values. On 2011-04-12 03:07:54, Carl Steinbach wrote: trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 180 https://reviews.apache.org/r/581/diff/1/?file=15621#file15621line180 Please add this property to hive-default.xml along with a description of what it does. Will add it in hive-default.xml. On 2011-04-12 03:07:54, Carl Steinbach wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, line 955 https://reviews.apache.org/r/581/diff/1/?file=15622#file15622line955 Please run checkstyle and correct any violations included in your patch. Will run checkstyle to check for any style violations. On 2011-04-12 03:07:54, Carl Steinbach wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreListener.java, line 27 https://reviews.apache.org/r/581/diff/1/?file=15623#file15623line27 Please add some javadoc explaining the intended use of this interface. * Are the methods called before or after an action completes? What happens if a metastore operation fails? * Are the methods allowed to block? Are they run in a separate thread? * Are the methods allowed to modify the catalog objects that are passed in as parameters? Will also add in javadoc. * Methods are called after action completes. Only if action succeeds. They are not called if operation fails since in that case nothing has actually changed in metastore. * This is upto implementation. They can run in same thread, or they can schedule there work in separate thread and return immediately. * I don't see a reason to disallow modification of passed in parameter objects. But, its mostly irrelevant here since methods are called after change has already been persisted on metastore. So, modifying these objects can't change any state on metastore. On 2011-04-12 03:07:54, Carl Steinbach wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreListener.java, line 29 https://reviews.apache.org/r/581/diff/1/?file=15623#file15623line29 Instead of passing in raw Table/Partition/Database objects it may be better to instead wrap these objects in containers, e.g. CreateTableEvent, DropTableEvent, etc. Eventually this interface will probably include onAlterTable() and onAlterPartition(), and programmers will probably want to access both the before and after versions of a Table/Partition, etc. Whats the advantage of wrapper container objects? On 2011-04-12 03:07:54, Carl Steinbach wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, line 1446 https://reviews.apache.org/r/581/diff/1/?file=15622#file15622line1446 No need to reference this, right? Right. Though, I think using this in such cases improves code readability. On 2011-04-12 03:07:54, Carl Steinbach wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreListener.java, line 26 https://reviews.apache.org/r/581/diff/1/?file=15623#file15623line26 What do you think about changing the name to MetaStoreEventListener or CatalogEventListener? MetaStoreEventListener is fine too. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/581/#review428 --- On 2011-04-12 01:29:41, Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/581/ --- (Updated 2011-04-12 01:29:41) Review request for hive, Carl Steinbach, John Sichi, and Paul Yang. Summary --- Review request for HIVE-2038 This addresses bug HIVE-2038. https://issues.apache.org/jira/browse/HIVE-2038 Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1079575 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1079575 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreListener.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/NoOpListener.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1079575 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore
Review Request: Removed finalizePartition() from the patch
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/648/ --- Review request for hive and Carl Steinbach. Summary --- 1. Removed finalizePartition(). Will file separate jira for it. 2. Added container objects for different event types. 3. Changed MetaStoreEventListener from interface to abstract class. 4. Modifications to allow a list of listeners instead of just one. This addresses bug HIVE-2038. https://issues.apache.org/jira/browse/HIVE-2038 Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1096112 trunk/conf/hive-default.xml 1096112 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1096112 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1096112 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1096112 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/AddPartitionEvent.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateDatabaseEvent.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateTableEvent.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/DropDatabaseEvent.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/DropPartitionEvent.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/DropTableEvent.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/ListenerEvent.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java PRE-CREATION Diff: https://reviews.apache.org/r/648/diff Testing --- Thanks, Ashutosh
Review Request: Get rid of System.exit
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/668/ --- Review request for hive, Carl Steinbach, John Sichi, and Paul Yang. Summary --- See HIVE-2034 for details. Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1096871 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1096871 Diff: https://reviews.apache.org/r/668/diff Testing --- Since this patch doesn't add/delete any functionality, no new tests are required. Passing of existing test cases will suffice. Thanks, Ashutosh
Review Request: Refactor HiveMetaStore to make it maintainable
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/669/ --- Review request for hive, Carl Steinbach, John Sichi, and Paul Yang. Summary --- See HIVE-2135 This addresses bug HIVE-2035. https://issues.apache.org/jira/browse/HIVE-2035 Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1096976 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreCommand.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1096976 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/URLConnectionUpdater.java PRE-CREATION Diff: https://reviews.apache.org/r/669/diff Testing --- Since this is a refactoring patch, no new tests are required. Ran all the tests in metastore. All of them passed. Thanks, Ashutosh
Re: Review Request: Refactor HiveMetaStore to make it maintainable
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/669/ --- (Updated 2011-04-27 18:24:42.458082) Review request for hive, Carl Steinbach, John Sichi, and Paul Yang. Changes --- Mistyped jira number. Summary --- See HIVE-2135 This addresses bug HIVE-2135. https://issues.apache.org/jira/browse/HIVE-2135 Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1096976 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreCommand.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1096976 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/URLConnectionUpdater.java PRE-CREATION Diff: https://reviews.apache.org/r/669/diff Testing --- Since this is a refactoring patch, no new tests are required. Ran all the tests in metastore. All of them passed. Thanks, Ashutosh
Re: ANNOUNCE: New PMC Member Carl Steinbach
Congrats, Carl ! On Thu, Apr 28, 2011 at 05:39, Ashish Thusoo athu...@fb.com wrote: Congratulations Carl.. Ashish On Apr 27, 2011, at 7:09 PM, John Sichi wrote: Hi all, The Hive Project Management Committee is happy to announce that Carl Steinbach has been voted in as a new PMC member. Carl is currently a very active committer and has successfully managed two Hive releases (0.6 and 0.7). His work on running Hive contributor meetups has helped foster an ever-growing development community. Congratulations, Carl! JVS
Review Request: HIVE-2147 : Add api to send / receive message to metastore
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/738/ --- Review request for hive and Carl Steinbach. Summary --- Updated patch to include missing ASF license and generated thrift code. This addresses bug HIVE-2147. https://issues.apache.org/jira/browse/HIVE-2147 Diffs - trunk/metastore/if/hive_metastore.thrift 1102450 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 1102450 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 1102450 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 1102450 trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 1102450 trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php 1102450 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 1102450 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 1102450 trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 1102450 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1102450 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1102450 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1102450 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 1102450 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/MessageEvent.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 1102450 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java 1102450 Diff: https://reviews.apache.org/r/738/diff Testing --- Updated TestMetaStoreEventListener to test new api. Thanks, Ashutosh
Re: Review Request: HIVE-2160 : Few code improvements in the metastore, hwi and ql packages.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/742/#review667 --- Ship it! Thanks Chinna for the cleanup work. Looks good to me. - Ashutosh On 2011-05-13 11:07:56, chinna wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/742/ --- (Updated 2011-05-13 11:07:56) Review request for hive. Summary --- Few code improvements in the metastore,hwi and ql packages. 1) Little performance Improvements 2) Effective varaible management. This addresses bug HIVE-2160. https://issues.apache.org/jira/browse/HIVE-2160 Diffs - http://svn.apache.org/repos/asf/hive/trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1101752 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1101752 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1101752 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 1101752 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java 1101752 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 1101752 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcessor.java 1101752 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1101752 Diff: https://reviews.apache.org/r/742/diff Testing --- Ran all tests Thanks, chinna
Re: Review Request: HIVE-2147 : Add api to send / receive message to metastore
() call. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/738/#review713 --- On 2011-05-12 21:03:29, Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/738/ --- (Updated 2011-05-12 21:03:29) Review request for hive and Carl Steinbach. Summary --- Updated patch to include missing ASF license and generated thrift code. This addresses bug HIVE-2147. https://issues.apache.org/jira/browse/HIVE-2147 Diffs - trunk/metastore/if/hive_metastore.thrift 1102450 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 1102450 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 1102450 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 1102450 trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 1102450 trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php 1102450 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 1102450 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 1102450 trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 1102450 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1102450 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1102450 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1102450 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 1102450 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/MessageEvent.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 1102450 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java 1102450 Diff: https://reviews.apache.org/r/738/diff Testing --- Updated TestMetaStoreEventListener to test new api. Thanks, Ashutosh
Re: Review Request: HIVE-2188: Add a function to retrieve multiple tables on trip to the hive metastore
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/831/#review753 --- trunk/metastore/if/hive_metastore.thrift https://reviews.apache.org/r/831/#comment1571 How about calling it get_multi_table instead? multi_get_table sounds little confusing to me. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/831/#comment1572 You can write this more concisely using commons-lang utility method as: StringUtils.join(tbls,','); trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/831/#comment1576 You can get rid of tables.get(i) == null check that will never be true. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/831/#comment1573 Instead of throwing RuntimeException, create MetaException and throw that. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java https://reviews.apache.org/r/831/#comment1574 Please add javadocs for new methods introduced in interface. Also see my first comment for name. trunk/service/src/test/org/apache/hadoop/hive/service/TestHiveServer.java https://reviews.apache.org/r/831/#comment1575 This test really belongs in the TestMetastore or some such in metastore dir not in HiveServer. - Ashutosh On 2011-06-02 23:01:00, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/831/ --- (Updated 2011-06-02 23:01:00) Review request for hive, Paul Yang and Ashutosh Chauhan. Summary --- Created a function multi_get_table that retrieves multiple tables on one trip to the hive metastore, saving round trip time. This addresses bug HIVE-2188. https://issues.apache.org/jira/browse/HIVE-2188 Diffs - trunk/metastore/if/hive_metastore.thrift 1130342 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1130342 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1130342 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1130342 trunk/service/src/test/org/apache/hadoop/hive/service/TestHiveServer.java 1130342 Diff: https://reviews.apache.org/r/831/diff Testing --- Added a test case to testMetasore() in TestHiveServer. Also tested for speed improvements in a client session. Thanks, Sohan
Review Request: HIVE-2215
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/883/ --- Review request for hive and John Sichi. Summary --- Follow-up for HIVE-2147. This addresses bug HIVE-2215. https://issues.apache.org/jira/browse/HIVE-2215 Diffs - trunk/metastore/if/hive_metastore.thrift 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/MarkPartitionEvent.java PRE-CREATION trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionEvent.java PRE-CREATION trunk/metastore/src/model/package.jdo 1134443 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 1134443 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartitionSet.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java 1134443 Diff: https://reviews.apache.org/r/883/diff Testing --- Added test cases for new api. Thanks, Ashutosh
Re: Review Request: HIVE-2215
On 2011-06-13 21:47:25, John Sichi wrote: trunk/metastore/src/model/package.jdo, line 670 https://reviews.apache.org/r/883/diff/1/?file=20978#file20978line670 Does indexing actually work on a LONGVARCHAR field across all DB's of interest? No, it doesn't. So, I reverted it back to VARCHAR. If the rest of the patch looks alright, I will attach a new patch with this change. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/883/#review822 --- On 2011-06-10 21:24:13, Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/883/ --- (Updated 2011-06-10 21:24:13) Review request for hive and John Sichi. Summary --- Follow-up for HIVE-2147. This addresses bug HIVE-2215. https://issues.apache.org/jira/browse/HIVE-2215 Diffs - trunk/metastore/if/hive_metastore.thrift 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/MarkPartitionEvent.java PRE-CREATION trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionEvent.java PRE-CREATION trunk/metastore/src/model/package.jdo 1134443 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 1134443 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartitionSet.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java 1134443 Diff: https://reviews.apache.org/r/883/diff Testing --- Added test cases for new api. Thanks, Ashutosh
Re: Review Request: HIVE-2215
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/883/ --- (Updated 2011-06-14 20:51:53.179968) Review request for hive and John Sichi. Changes --- Updated patch with Carl's comments. Carl, can you take a look? Summary --- Follow-up for HIVE-2147. This addresses bug HIVE-2215. https://issues.apache.org/jira/browse/HIVE-2215 Diffs (updated) - trunk/metastore/if/hive_metastore.thrift 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/LoadPartitionDoneEvent.java PRE-CREATION trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionEvent.java PRE-CREATION trunk/metastore/src/model/package.jdo 1135779 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 1135779 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartition.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartitionRemote.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java 1135779 Diff: https://reviews.apache.org/r/883/diff Testing --- Added test cases for new api. Thanks, Ashutosh
Re: Review Request: HIVE-2215
, Carl Steinbach wrote: trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartitionSet.java, line 36 https://reviews.apache.org/r/883/diff/1/?file=20980#file20980line36 Can you subclass this with a remote and embedded version? Done. On 2011-06-14 01:02:20, Carl Steinbach wrote: trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java, line 80 https://reviews.apache.org/r/883/diff/1/?file=20981#file20981line80 Any reason in particular why you switched to always running this test in local mode? If we can only test one scenario, then I think there's more value in focusing on the standalone client/server setup. I reverted those changes. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/883/#review824 --- On 2011-06-14 20:51:53, Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/883/ --- (Updated 2011-06-14 20:51:53) Review request for hive and John Sichi. Summary --- Follow-up for HIVE-2147. This addresses bug HIVE-2215. https://issues.apache.org/jira/browse/HIVE-2215 Diffs - trunk/metastore/if/hive_metastore.thrift 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135779 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/LoadPartitionDoneEvent.java PRE-CREATION trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionEvent.java PRE-CREATION trunk/metastore/src/model/package.jdo 1135779 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 1135779 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartition.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartitionRemote.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java 1135779 Diff: https://reviews.apache.org/r/883/diff Testing --- Added test cases for new api. Thanks, Ashutosh
Review Request: Review request for HIVE-2225
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/940/ --- Review request for hive, Carl Steinbach and John Sichi. Summary --- This addresses HIVE-2225 This addresses bug HIVE-2225. https://issues.apache.org/jira/browse/HIVE-2225 Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1138099 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1138099 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1138099 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1138099 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/EventCleanerThread.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartition.java 1138099 Diff: https://reviews.apache.org/r/940/diff Testing --- updated a test case which exercises this code path. Thanks, Ashutosh
Re: Review Request: Review request for HIVE-2225
On 2011-06-22 23:07:05, John Sichi wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/EventCleanerThread.java, line 1 https://reviews.apache.org/r/940/diff/1/?file=21415#file21415line1 New files need Apache headers Added. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/940/#review888 --- On 2011-06-21 17:34:28, Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/940/ --- (Updated 2011-06-21 17:34:28) Review request for hive, Carl Steinbach and John Sichi. Summary --- This addresses HIVE-2225 This addresses bug HIVE-2225. https://issues.apache.org/jira/browse/HIVE-2225 Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1138099 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1138099 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1138099 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1138099 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/EventCleanerThread.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartition.java 1138099 Diff: https://reviews.apache.org/r/940/diff Testing --- updated a test case which exercises this code path. Thanks, Ashutosh
Re: Review Request: Review request for HIVE-2225
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/940/ --- (Updated 2011-06-23 02:55:08.540561) Review request for hive, Carl Steinbach and John Sichi. Changes --- Updated the patch per John's comments. Summary --- This addresses HIVE-2225 This addresses bug HIVE-2225. https://issues.apache.org/jira/browse/HIVE-2225 Diffs (updated) - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1138719 trunk/conf/hive-default.xml 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/EventCleanerTask.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartition.java 1138719 Diff: https://reviews.apache.org/r/940/diff Testing --- updated a test case which exercises this code path. Thanks, Ashutosh
Re: Review Request: Review request for HIVE-2225
On 2011-06-22 23:07:46, John Sichi wrote: trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 221 https://reviews.apache.org/r/940/diff/1/?file=21411#file21411line221 If you agree about making this disabled by default, we could use a special value such as 0 for the frequency to indicate disabled. Done. Timer is now created only if this property has non-zero value. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/940/#review889 --- On 2011-06-23 02:55:08, Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/940/ --- (Updated 2011-06-23 02:55:08) Review request for hive, Carl Steinbach and John Sichi. Summary --- This addresses HIVE-2225 This addresses bug HIVE-2225. https://issues.apache.org/jira/browse/HIVE-2225 Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1138719 trunk/conf/hive-default.xml 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/EventCleanerTask.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartition.java 1138719 Diff: https://reviews.apache.org/r/940/diff Testing --- updated a test case which exercises this code path. Thanks, Ashutosh
Re: Review Request: Review request for HIVE-2225
On 2011-06-22 23:18:43, John Sichi wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, line 259 https://reviews.apache.org/r/940/diff/1/?file=21412#file21412line259 Why is this using a Thread instead of a Timer? Agreed timer is better suited here then Thread. Changed to timer. On 2011-06-22 23:18:43, John Sichi wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/EventCleanerThread.java, line 33 https://reviews.apache.org/r/940/diff/1/?file=21415#file21415line33 6 hrs is actually configurable, right? Yup, it is. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/940/#review891 --- On 2011-06-23 02:55:08, Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/940/ --- (Updated 2011-06-23 02:55:08) Review request for hive, Carl Steinbach and John Sichi. Summary --- This addresses HIVE-2225 This addresses bug HIVE-2225. https://issues.apache.org/jira/browse/HIVE-2225 Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1138719 trunk/conf/hive-default.xml 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1138719 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/EventCleanerTask.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartition.java 1138719 Diff: https://reviews.apache.org/r/940/diff Testing --- updated a test case which exercises this code path. Thanks, Ashutosh
Re: Review Request: HIVE-1537 - Allow users to specify LOCATION in CREATE DATABASE statement
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/949/#review898 --- trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/949/#comment1938 This may not be always successful. You may fail to create dirs for number of reasons. So, this needs to be handled gracefully. Transaction needs to rollback in such case and create database ddl needs to fail. For more info, look the first comment of Devaraj and also his attached partial patch. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/949/#comment1941 As previously, mkdirs() can fail, so handle similarly as in createDatabase() trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java https://reviews.apache.org/r/949/#comment1942 Please also add a test when a create database fails because a FS operation fails. In such a case no metadata should get created. One way to simulate that is to make location unwritable then try to create database on that location. - Ashutosh On 2011-06-23 09:55:50, Thiruvel Thirumoolan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/949/ --- (Updated 2011-06-23 09:55:50) Review request for hive, Ning Zhang and Amareshwari Sriramadasu. Summary --- Usage: create database location 'path1'; alter database location 'path2'; After 'alter', only newly created tables will be located under the new location. Tables created before 'alter' will be under 'path1'. Notes: -- 1. I have moved getDefaultDatabasePath() to HiveMetaStore and made it private. There should only be one API to obtain the location of a database and it has to accept 'Database' as an arg and hence the new method in Warehouse 'getDatabasePath()' and similarly 'getTablePath()'. The usages of older API also has been changed. Hope that should be fine. 2. One could argue why have getDatabasePath() as location can be obtained by db.getLocationUri(). I wanted to retain this method to do any additional processing if necessary (getDns or whatever). This addresses bug HIVE-1537. https://issues.apache.org/jira/browse/HIVE-1537 Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 1138011 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1138011 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1138011 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1138011 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 1138011 trunk/ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java 1138011 trunk/ql/src/test/queries/clientpositive/database_location.q PRE-CREATION trunk/ql/src/test/results/clientpositive/database_location.q.out PRE-CREATION Diff: https://reviews.apache.org/r/949/diff Testing --- 1. Updated TestHiveMetaStore.java for testing the functionality - database creation, alteration and table's locations as TestCliDriver outputs ignore locations. 2. Added database_location.q for testing the grammar primarily. Thanks, Thiruvel Thanks, Thiruvel
Re: Review Request: HIVE-1537 - Allow users to specify LOCATION in CREATE DATABASE statement
On 2011-06-23 16:49:59, Ashutosh Chauhan wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, line 591 https://reviews.apache.org/r/949/diff/1/?file=21560#file21560line591 This may not be always successful. You may fail to create dirs for number of reasons. So, this needs to be handled gracefully. Transaction needs to rollback in such case and create database ddl needs to fail. For more info, look the first comment of Devaraj and also his attached partial patch. Thiruvel Thirumoolan wrote: I requested Devaraj offline to handle it in a separate JIRA. I am not sure about other methods having the same issue. That said, I introduced the same bug with alter_database. Will fix it for create and alter databases. Actually, problem exists in create Database even now without your patch. So, you are not making it any worse. I am fine if you prefer to address it in a followup jira. About alter database, I am not sure if there is any real usecase for it. Having a database spread across multiple locations is not a regular semantics. First concern is clean rollback semantics. Another is what about drop database in such scenarios, which directories are deleted when you drop a database, current one or all or one you specify in drop database ddl? You potentially need to persist all the locations of database in objectstore for deletion or for other purposes, which means a list of locationUri instead of a single string. Given all these, you might want to defer alter database to a new jira. Apart from better understanding of the usecases and semantics for alter database, doing it in two different jira will make this patch smaller and thus easier to get committed. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/949/#review898 --- On 2011-06-23 09:55:50, Thiruvel Thirumoolan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/949/ --- (Updated 2011-06-23 09:55:50) Review request for hive, Ning Zhang and Amareshwari Sriramadasu. Summary --- Usage: create database location 'path1'; alter database location 'path2'; After 'alter', only newly created tables will be located under the new location. Tables created before 'alter' will be under 'path1'. Notes: -- 1. I have moved getDefaultDatabasePath() to HiveMetaStore and made it private. There should only be one API to obtain the location of a database and it has to accept 'Database' as an arg and hence the new method in Warehouse 'getDatabasePath()' and similarly 'getTablePath()'. The usages of older API also has been changed. Hope that should be fine. 2. One could argue why have getDatabasePath() as location can be obtained by db.getLocationUri(). I wanted to retain this method to do any additional processing if necessary (getDns or whatever). This addresses bug HIVE-1537. https://issues.apache.org/jira/browse/HIVE-1537 Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 1138011 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1138011 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1138011 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1138011 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1138011 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 1138011 trunk/ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java 1138011 trunk/ql/src/test/queries/clientpositive/database_location.q PRE-CREATION trunk/ql/src/test/results/clientpositive/database_location.q.out PRE-CREATION Diff: https://reviews.apache.org/r/949/diff Testing --- 1. Updated TestHiveMetaStore.java for testing the functionality - database creation, alteration and table's locations as TestCliDriver outputs ignore locations. 2. Added database_location.q for testing the grammar primarily. Thanks, Thiruvel Thanks, Thiruvel
Re: Hive projects for Google Summer of code 2012 ?
Hey Bharath, Great to see your enthusiasm for Hive! I would be happy to mentor you for the project. For the start, you can take a look at https://cwiki.apache.org/confluence/display/Hive/Roadmap for a list of open projects in Hive. The document is bit dated, so some of those projects may not be relevant. But, its a good source to start with to see if any of these projects excite you. Hope it helps, Ashutosh On Sat, Feb 4, 2012 at 08:47, bharath vissapragada bharathvissapragada1...@gmail.com wrote: Hey list, devs, Google summer of code, 2012 's notification [1] has been released and mentoring organizations can submit their proposals to Google for opensource projects. Any of the devs interested in mentoring students on Hive projects ( any critical jiras etc.) ? It would be great if any of the devs (dev list cc'ed) can do that on behalf of ASF . It would be a great opportunity for many students to contribute patches to Hadoop and Hive and make their summer vacation fruitful. [1] http://google-melange.appspot.com/gsoc/events/google/gsoc2012 Thanks and Regards, Bharath .V w:http://researchweb.iiit.ac.in/~bharath.v
Re: Hive projects for Google Summer of code 2012 ?
Hi Alexis, Great to see your interest. Feel free to come up with concrete proposal and submit to GSoC. Its certainly heartening to see folks interested in making contributions to the Hive Project. Ashutosh On Sat, Feb 4, 2012 at 10:48, Alexis De La Cruz Toledo alexis...@gmail.comwrote: Hi Ashutosh, I'm interesting in hive, I'd like to improve the compilation process, I have been that the plan query tree generated by Hive can be optimized, and I'd like to participate in Google Summer of code 2012. What do you say? Regards. El 4 de febrero de 2012 12:29, Ashutosh Chauhan hashut...@apache.org escribió: Hey Bharath, Great to see your enthusiasm for Hive! I would be happy to mentor you for the project. For the start, you can take a look at https://cwiki.apache.org/confluence/display/Hive/Roadmap for a list of open projects in Hive. The document is bit dated, so some of those projects may not be relevant. But, its a good source to start with to see if any of these projects excite you. Hope it helps, Ashutosh On Sat, Feb 4, 2012 at 08:47, bharath vissapragada bharathvissapragada1...@gmail.com wrote: Hey list, devs, Google summer of code, 2012 's notification [1] has been released and mentoring organizations can submit their proposals to Google for opensource projects. Any of the devs interested in mentoring students on Hive projects ( any critical jiras etc.) ? It would be great if any of the devs (dev list cc'ed) can do that on behalf of ASF . It would be a great opportunity for many students to contribute patches to Hadoop and Hive and make their summer vacation fruitful. [1] http://google-melange.appspot.com/gsoc/events/google/gsoc2012 Thanks and Regards, Bharath .V w:http://researchweb.iiit.ac.in/~bharath.v -- Ing. Alexis de la Cruz Toledo. *Av. Instituto Politécnico Nacional No. 2508 Col. San Pedro Zacatenco. México, D.F, 07360 * *CINVESTAV, DF.*
Re: Hive build is back to green
Good idea! Created HIVE-2811 for this. Thanks, Ashutosh On Thu, Feb 16, 2012 at 14:41, Carl Steinbach c...@cloudera.com wrote: Great news! Thanks for fixing this Ashutosh! comparisons. Fix was to do export LANG=en_US.UTF-8 in environment of build machine. Is this something that we can set in Hive's build.xml file? Thanks. Carl
Re: 'arc diff' failing with Invalid or missing field 'Test Plan': You must provide a test plan.
Hi Carl, Include in your git commit message following line Test Plan: Include your test plan here. It is looking for string Test Plan in your commit message and fails if cant find one. Hope it helps, Ashutosh On Thu, Mar 1, 2012 at 14:45, Carl Steinbach c...@cloudera.com wrote: Hey, Today I started getting the following error when I try to create a phabricator review request using arc: % arc diff --jira HIVE-2831 Exception: Invalid or missing field 'Test Plan': You must provide a test plan. (Run with --trace for a full exception trace.) Here's the complete trace: % arc --trace diff --jira HIVE-2831 Loading phutil library 'arc_jira_lib' from '/Users/carl/Work/repos/hive4/.arc_jira_lib'... [0] conduit conduit.connect() [0] conduit 318,295 us [1] exec $ (cd '/Users/carl/Work/repos/hive4'; git rev-parse --show-cdup) [1] exec 14,662 us [2] exec $ (cd '/Users/carl/Work/repos/hive4/'; git rev-parse --verify HEAD^) [2] exec 16,343 us [3] exec $ (cd '/Users/carl/Work/repos/hive4/'; git log --first-parent --format=medium 'HEAD^'..HEAD) [3] exec 15,040 us [4] conduit differential.parsecommitmessage() [4] conduit 547,222 us Fatal error: Uncaught exception 'ArcanistDifferentialCommitMessageParserException' with message 'Invalid or missing field 'Test Plan': You must provide a test plan.' in /Users/carl/.local/pkg/arcanist/src/differential/commitmessage/ArcanistDifferentialCommitMessage.php:88 Stack trace: #0 /Users/carl/Work/repos/hive4/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php(88): ArcanistDifferentialCommitMessage-pullDataFromConduit(Object(ConduitClient)) #1 /Users/carl/Work/repos/hive4/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php(364): ArcJIRAConfiguration-willRunDiffWorkflow() #2 /Users/carl/.local/pkg/arcanist/scripts/arcanist.php(264): ArcJIRAConfiguration-willRunWorkflow('diff', Object(ArcanistDiffWorkflow)) #3 {main} thrown in /Users/carl/.local/pkg/arcanist/src/differential/commitmessage/ArcanistDifferentialCommitMessage.php on line 88 Anyone know what's going on here? Thanks. Carl
Re: 'arc diff' failing with Invalid or missing field 'Test Plan': You must provide a test plan.
I don't know if there is a way to disable it. But, then I don't know much about phabricator/arc infra. On Thu, Mar 1, 2012 at 15:15, Carl Steinbach c...@cloudera.com wrote: Thanks for the tip. Is there any way to disable this behavior? On Thu, Mar 1, 2012 at 2:56 PM, Ashutosh Chauhan hashut...@apache.org wrote: Hi Carl, Include in your git commit message following line Test Plan: Include your test plan here. It is looking for string Test Plan in your commit message and fails if cant find one. Hope it helps, Ashutosh On Thu, Mar 1, 2012 at 14:45, Carl Steinbach c...@cloudera.com wrote: Hey, Today I started getting the following error when I try to create a phabricator review request using arc: % arc diff --jira HIVE-2831 Exception: Invalid or missing field 'Test Plan': You must provide a test plan. (Run with --trace for a full exception trace.) Here's the complete trace: % arc --trace diff --jira HIVE-2831 Loading phutil library 'arc_jira_lib' from '/Users/carl/Work/repos/hive4/.arc_jira_lib'... [0] conduit conduit.connect() [0] conduit 318,295 us [1] exec $ (cd '/Users/carl/Work/repos/hive4'; git rev-parse --show-cdup) [1] exec 14,662 us [2] exec $ (cd '/Users/carl/Work/repos/hive4/'; git rev-parse --verify HEAD^) [2] exec 16,343 us [3] exec $ (cd '/Users/carl/Work/repos/hive4/'; git log --first-parent --format=medium 'HEAD^'..HEAD) [3] exec 15,040 us [4] conduit differential.parsecommitmessage() [4] conduit 547,222 us Fatal error: Uncaught exception 'ArcanistDifferentialCommitMessageParserException' with message 'Invalid or missing field 'Test Plan': You must provide a test plan.' in /Users/carl/.local/pkg/arcanist/src/differential/commitmessage/ArcanistDifferentialCommitMessage.php:88 Stack trace: #0 /Users/carl/Work/repos/hive4/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php(88): ArcanistDifferentialCommitMessage-pullDataFromConduit(Object(ConduitClient)) #1 /Users/carl/Work/repos/hive4/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php(364): ArcJIRAConfiguration-willRunDiffWorkflow() #2 /Users/carl/.local/pkg/arcanist/scripts/arcanist.php(264): ArcJIRAConfiguration-willRunWorkflow('diff', Object(ArcanistDiffWorkflow)) #3 {main} thrown in /Users/carl/.local/pkg/arcanist/src/differential/commitmessage/ArcanistDifferentialCommitMessage.php on line 88 Anyone know what's going on here? Thanks. Carl
Re: Automatic/parallel patch testing
I am very much in favor of https://issues.apache.org/jira/browse/HIVE-1175 since then it publishes back on jira.. and everyone is on same page. I think Project VP has access to apache hudson. John / Namit, Can you set this up for Hive? Ashutosh On Wed, Mar 7, 2012 at 13:35, Edward Capriolo edlinuxg...@gmail.com wrote: I am trying to get myself more involved in the patch review and committing process, but running ant tests takes multiple hours. Two ideas: https://issues.apache.org/jira/browse/HIVE-1175 https://cwiki.apache.org/Hive/unit-test-parallel-execution.html Can we get a farm of test servers to get testing done faster what are other committers currently doing? I would not mind committing resources servers/$ to the cause Thanks, Edward
Re: Potential bug around hive merging of small files
This does look like a bug. Shrijeet, mind opening a jira and attaching your patch there. Thanks, Ashutosh On Mon, Mar 12, 2012 at 16:29, Shrijeet Paliwal shrij...@rocketfuel.comwrote: I had a type in last email. Settings are as follows hive set mapred.min.split.size.per.node=10; hive set mapred.min.split.size.per.rack=10; hive set mapred.max.split.size=10; hive set hive.merge.size.per.task=10; hive set hive.merge.smallfiles.avgsize=10; hive set hive.merge.size.smallfiles.avgsize=10;*hive set hive.merge.mapfiles=true;*hive set hive.merge.mapredfiles=true; *hive set hive.mergejob.maponly=false;* On Mon, Mar 12, 2012 at 4:27 PM, Shrijeet Paliwal shrij...@rocketfuel.comwrote: Hive Version: Hive 0.8 (last commit SHA b581a6192b8d4c544092679d05f45b2e50d42b45 ) Hadoop version : chd3u0 I am trying to use the hive merge small file feature by setting all the necessary params. I am disabling use of CombineHiveInputFormat since my input is compressed text. hive set mapred.min.split.size.per.node=10; hive set mapred.min.split.size.per.rack=10; hive set mapred.max.split.size=10; hive set hive.merge.size.per.task=10; hive set hive.merge.smallfiles.avgsize=10; hive set hive.merge.size.smallfiles.avgsize=10; hive set hive.merge.mapfiles=false; hive set hive.merge.mapredfiles=true; The plan decides to launch two MR jobs but after first job succeeds I get runt time error java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified I think the problem can be fixed by using this patch I came with : https://gist.github.com/2025303 Of course my understanding and hence this patch can be totally wrong. Please provide feedback.
Hive 0.9 release
Hi all, Branch for 0.8-r2 was created on Dec 7, almost four months ago. Between then and now lots of cool stuff has landed in trunk waiting to be released and get to users. I think its a good time now to get the ball rolling for 0.9 release. If this sounds good, I would propose to cut a branch for 0.9 later this week. Then we can focus on stabilizing the branch and subsequent release from it. Thoughts? Thanks, Ashutosh
Re: Hive 0.9 release
Here is a list of jiras which I plan to get in 0.9. HIVE-2084 HIVE-2822 HIVE-2764 HIVE-538 I will work with authors of these patches to see these can get in. Others, please feel free to add this list. Thanks, Ashutosh On Mon, Apr 2, 2012 at 18:39, Carl Steinbach c...@cloudera.com wrote: I'm +1 on doing an 0.9.0 release, but would also like to suggest that we put together a list of 0.9.0 blockers before cutting the release branch. In the past we have frequently underestimated the amount of work required to get trunk into a releasable state, with the consequence that we up wasting time doing a lot of backports from trunk to the release branch. It would be great if we could avoid all of that this time around. Thanks. Carl On Mon, Apr 2, 2012 at 6:33 PM, Ashutosh Chauhan hashut...@apache.org wrote: Hi all, Branch for 0.8-r2 was created on Dec 7, almost four months ago. Between then and now lots of cool stuff has landed in trunk waiting to be released and get to users. I think its a good time now to get the ball rolling for 0.9 release. If this sounds good, I would propose to cut a branch for 0.9 later this week. Then we can focus on stabilizing the branch and subsequent release from it. Thoughts? Thanks, Ashutosh
Re: Hive 0.9 release
Hi All, Seems like we have an agreement. So, unless someone has other ideas, I will cut the branch for 0.9 on 4/9. Below is the consolidated list which people have requested for 0.9, so as and when these gets checked-in trunk, we will merge it back in 0.9. 2084 2764 538 2646 2777 2585 2883 2926 Thanks, Ashutosh On Mon, Apr 2, 2012 at 19:02, Thomas Weise t...@yahoo-inc.com wrote: Would be great to get HIVE-2646 included. Thanks, Thomas On 4/2/12 6:59 PM, Ashutosh Chauhan hashut...@apache.org wrote: Here is a list of jiras which I plan to get in 0.9. HIVE-2084 HIVE-2822 HIVE-2764 HIVE-538 I will work with authors of these patches to see these can get in. Others, please feel free to add this list. Thanks, Ashutosh On Mon, Apr 2, 2012 at 18:39, Carl Steinbach c...@cloudera.com wrote: I'm +1 on doing an 0.9.0 release, but would also like to suggest that we put together a list of 0.9.0 blockers before cutting the release branch. In the past we have frequently underestimated the amount of work required to get trunk into a releasable state, with the consequence that we up wasting time doing a lot of backports from trunk to the release branch. It would be great if we could avoid all of that this time around. Thanks. Carl On Mon, Apr 2, 2012 at 6:33 PM, Ashutosh Chauhan hashut...@apache.org wrote: Hi all, Branch for 0.8-r2 was created on Dec 7, almost four months ago. Between then and now lots of cool stuff has landed in trunk waiting to be released and get to users. I think its a good time now to get the ball rolling for 0.9 release. If this sounds good, I would propose to cut a branch for 0.9 later this week. Then we can focus on stabilizing the branch and subsequent release from it. Thoughts? Thanks, Ashutosh
Re: Hive 0.9 release
As per the plan I am going to create a branch now. Please hold off any commits till I send an email for all-clear. Thanks, Ashutosh On Fri, Apr 6, 2012 at 15:00, Owen O'Malley omal...@apache.org wrote: I think we also need to get the RAT report cleaned up. Apache projects aren't supposed to release while they have files without the Apache header. I've filed HIVE-2930 to fix all of the issues. While working on it, I found that one of the files was added by HIVE-2246. HIVE-2246 was contributed by Sohan Jain, who hasn't filed an ICLA, and doesn't have the jira box checked for contribution. Does someone know him and can ask him to state on the jira that he intended to contribute it? Failing that, I believe he was working at Facebook at the time, so someone else who is still there can upload the patch to the jira? All of this brings up a challenge in that Phabricator and the Apache review tool upload patches to jira without providing a way to check the contribute to Apache box. Without the checkbox we should only commit patches from people who have filed ICLAs. Is there a way to add an option the arc command that will check the box? Even having it *always* check the box is better than having it not check the box. (Although it should warn users that it is doing so.) Thoughts? -- Owen
Re: Hive 0.9 release
All-clear. Trunk is now open for commits. Since HIVE-2929 have resulted in intermittent test failures. (See, HIVE-2937), I branched right before HIVE-2929. Additionally, I merged in HIVE-2764 in 0.9 I also added version 0.10 on jira, so any commits on trunk now must have 0.10 as fix version. Thanks, Ashutosh On Mon, Apr 9, 2012 at 17:01, Ashutosh Chauhan hashut...@apache.org wrote: As per the plan I am going to create a branch now. Please hold off any commits till I send an email for all-clear. Thanks, Ashutosh On Fri, Apr 6, 2012 at 15:00, Owen O'Malley omal...@apache.org wrote: I think we also need to get the RAT report cleaned up. Apache projects aren't supposed to release while they have files without the Apache header. I've filed HIVE-2930 to fix all of the issues. While working on it, I found that one of the files was added by HIVE-2246. HIVE-2246 was contributed by Sohan Jain, who hasn't filed an ICLA, and doesn't have the jira box checked for contribution. Does someone know him and can ask him to state on the jira that he intended to contribute it? Failing that, I believe he was working at Facebook at the time, so someone else who is still there can upload the patch to the jira? All of this brings up a challenge in that Phabricator and the Apache review tool upload patches to jira without providing a way to check the contribute to Apache box. Without the checkbox we should only commit patches from people who have filed ICLAs. Is there a way to add an option the arc command that will check the box? Even having it *always* check the box is better than having it not check the box. (Although it should warn users that it is doing so.) Thoughts? -- Owen
Re: Looking at the columns table
Hey Ed, Your thinking is correct and has been implemented in https://issues.apache.org/jira/browse/HIVE-2246 Time to upgrade to 0.8 :) Thanks, Ashutosh On Wed, Apr 11, 2012 at 07:53, Edward Capriolo edlinuxg...@gmail.comwrote: Hey all. Our metastore in mysql is fairly large over 12GB. All the storage here is the columns table. It seems that each column is stored for each partition/storage descriptor as a one-many relationship. In our case all the partitions have the same column definition. My thinking. Should the relationship from columns-partition/storage descriptor be a many-many? In this way we only store the column once and the current column table can reference the primary key of this column. This should bring the size of this table down really drastically. Since every other table in the metastore is so small this huge columns table looks like the only scalability choke point we have. Edward
Re: Problems with Arc/Phabricator
+1 on moving away from arc/phabricator. It works great when it works, but most of the time it doesnt work. Ashutosh On Wed, Apr 11, 2012 at 11:57, Owen O'Malley omal...@apache.org wrote: On Wed, Apr 11, 2012 at 11:48 AM, Edward Capriolo edlinuxg...@gmail.com wrote: If we are going to switch from fabricator we just might as well go back to not using anything. Review board was really clunky and confusing. I'm mostly +1 to that. If no one is supporting phabricator, then it won't work for long. Personally, I'd love it if we could move Hive to git completely. Has anyone used gerrit? The videos of it make it look better than sliced bread. -- Owen
Re: Problems with Arc/Phabricator
Is mac only supported OS ? Arc doesn't work for me on linux, which is unfortunate since thats where I do all my testing. Thanks, Ashutosh On Wed, Apr 11, 2012 at 17:37, John Sichi jsi...@gmail.com wrote: CC'ing David Recordon, who can probably help with a point of contact for coordinating future Phabricator upgrades. It looks like the test plan problem mentioned below (which affects git, but not svn) was introduced when the reviews.facebook.net Phabricator server was upgraded Feb 23. I've committed a change to the arc-jira module which should deal with it: https://github.com/facebook/arc-jira/commit/b62b5976ec9a974ed102c2f55b530edde48cfaa5 So if you run ant arc-setup in your Hive sandbox, you should be good to go. JVS On Wed, Apr 11, 2012 at 3:37 PM, Carl Steinbach c...@cloudera.com wrote: Hi John, Regarding the test plans: Carl, could you be more specific about what is going wrong so I can attempt to reproduce the problem? At some point Arc started requiring that the commit message contain a Test Plan string, or maybe this has always been a requirement and it was just automatically added before? Anyway, right now you have to manually add this or you get the following error: % git log -1 commit 2649ca167182bb02823b3fb00bbe7602f591717e Author: Carl Steinbach c...@cloudera.com Date: Wed Apr 11 15:12:53 2012 -0700 HIVE-2947. Test Phabricator % arc diff --trace --jira HIVE-2947 Loading phutil library 'arc_jira_lib' from '/Users/carl/Work/repos/hive-test/.arc_jira_lib'... [0] conduit conduit.connect() [0] conduit 329,414 us [1] exec $ (cd '/Users/carl/Work/repos/hive-test'; git rev-parse --show-cdup) [1] exec 16,731 us [2] exec $ (cd '/Users/carl/Work/repos/hive-test/'; git rev-parse --verify HEAD^) [2] exec 20,879 us [3] exec $ (cd '/Users/carl/Work/repos/hive-test/'; git log --first-parent --format=medium 'HEAD^'..HEAD) [3] exec 17,852 us [4] conduit differential.parsecommitmessage() [4] conduit 558,248 us Fatal error: Uncaught exception 'ArcanistDifferentialCommitMessageParserException' with message 'Invalid or missing field 'Test Plan': You must provide a test plan.' in /Users/carl/.local/pkg/arcanist/src/differential/commitmessage/ArcanistDifferentialCommitMessage.php:88 Stack trace: #0 /Users/carl/Work/repos/hive-test/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php(88): ArcanistDifferentialCommitMessage-pullDataFromConduit(Object(ConduitClient)) #1 /Users/carl/Work/repos/hive-test/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php(368): ArcJIRAConfiguration-willRunDiffWorkflow() #2 /Users/carl/.local/pkg/arcanist/scripts/arcanist.php(264): ArcJIRAConfiguration-willRunWorkflow('diff', Object(ArcanistDiffWorkflow)) #3 {main} thrown in /Users/carl/.local/pkg/arcanist/src/differential/commitmessage/ArcanistDifferentialCommitMessage.php on line 88 Thanks. Carl
[VOTE] Apache Hive 0.9.0 Release Candidate 0
Hey all, Apache Hive 0.9.0-rc0 is out and available at http://people.apache.org/~hashutosh/hive-0.9.0-rc0/ Release notes are available at: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843version=12317742 Please give it a try, let us know. Hive PMC members: Please test and vote. Thanks, Ashutosh
Re: [VOTE] Apache Hive 0.9.0 Release Candidate 0
Couple more points: Maven artifacts are available at https://repository.apache.org/content/repositories/orgapachehive-043/ for folks to try out. Vote runs for 3 business days so will expire on Wednesday, 4/18. Thanks, Ashutosh On Fri, Apr 13, 2012 at 11:50, Ashutosh Chauhan hashut...@apache.orgwrote: Hey all, Apache Hive 0.9.0-rc0 is out and available at http://people.apache.org/~hashutosh/hive-0.9.0-rc0/ Release notes are available at: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843version=12317742 Please give it a try, let us know. Hive PMC members: Please test and vote. Thanks, Ashutosh
Re: [VOTE] Apache Hive 0.9.0 Release Candidate 0
Hey Lars, Thanks for taking a look. HIVE-1634 introduced new storage type for hbase tables, namely binary. Since, bug manifests itself only for binary storage type. This doesnt count as a regression since functionality for binary storage itself was added through HIVE-1634. Since, this is not a regression of existing functionality, it won't count as a blocker for 0.9 release. Nonetheless, other folks have found other problems in RC0, so I have to respin. Thus, I will consider HIVE-2958 fix for RC1. Thanks, Ashutosh On Tue, Apr 17, 2012 at 23:46, Lars Francke lars.fran...@gmail.com wrote: Hey, thanks for putting up the RC. We tried it yesterday and we stumbled across HIVE-2958 which seems like a bug that should be fixed before release because it was introduced with HIVE-1634 which is new to 0.9 too and breaks GROUP BY queries on HBase which were working before. -1 (non-binding) Thanks, Lars
Re: Hive 0.9 now broken on HBase 0.90 ?
Hi Tim, Sorry that it broke your setup. Decision to move to hbase-0.92 was made in https://issues.apache.org/jira/browse/HIVE-2748 Thanks, Ashutosh On Wed, Apr 18, 2012 at 11:42, Tim Robertson timrobertson...@gmail.comwrote: Hi all, This is my first post to hive-dev so please go easy on me... I built Hive from trunk (0.90) a couple of weeks ago and have been using it against HBase, and today patched it with the offering of HIVE-2958 and it all worked fine. I just tried an Oozie workflow, built using Maven and the Apache snapshot repository to get the 0.90 snapshot. It fails with the following: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.mapred.TableMapReduceUtil.initCredentials(Lorg/apache/hadoop/mapred/JobConf;)V at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:419) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:292) I believe the source of the issue could be this commit which happened after I built from trunk a couple weeks ago: http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120409202655.bdb5d2388...@eris.apache.org%3E Is there a decision to make hive 0.9 require HBase 0.92.0+ ? It would be awesome if it still worked on 0.90.4 since CDH3 uses that. Hope this makes sense, Tim (suffering classpath hell)
Re: [VOTE] Apache Hive 0.9.0 Release Candidate 0
This vote stands cancelled because of various problems people have found in RC0. Thanks to all who tried RC0. I will respin RC1 shortly. Thanks, Ashutosh On Fri, Apr 13, 2012 at 11:50, Ashutosh Chauhan hashut...@apache.orgwrote: Hey all, Apache Hive 0.9.0-rc0 is out and available at http://people.apache.org/~hashutosh/hive-0.9.0-rc0/ Release notes are available at: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843version=12317742 Please give it a try, let us know. Hive PMC members: Please test and vote. Thanks, Ashutosh
hive 0.9.0 RC1
RC0 failed to pass because of variety of reasons. In the meantime, various folks have requested for inclusion of other fixes in 0.9. Following is the list. Following list is for committers to review these patches and to get them committed in 0.9 branch. These are in Patch Available status. HIVE-2958 HIVE-2777 HIVE-2646 HIVE-2883 HIVE-2585 HIVE-538 HIVE-2904 Following list is for contributors/committers to contribute patches. These are in Open status. HIVE-2961 HIVE-2965 HIVE-2966 Thanks, Ashutosh
Re: Problems with Arc/Phabricator
Hit a new problem with arc today: Fatal error: Uncaught exception 'Exception' with message 'Host returned HTTP/200, but invalid JSON data in response to a Conduit method call: br / bWarning/b: Unknown: POST Content-Length of 9079953 bytes exceeds the limit of 8388608 bytes in bUnknown/b on line b0/bbr / for(;;);{result:null,error_code:ERR-INVALID-SESSION,error_info:Session key is not present.}' in /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitFuture.php:48 Stack trace: #0 /Users/ashutosh/work/hive/libphutil/src/future/proxy/FutureProxy.php(62): ConduitFuture-didReceiveResult(Array) #1 /Users/ashutosh/work/hive/libphutil/src/future/proxy/FutureProxy.php(39): FutureProxy-getResult() #2 /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitClient.php(52): FutureProxy-resolve() #3 /Users/ashutosh/work/hive/arcanist/src/workflow/diff/ArcanistDiffWorkflow.php(341): ConduitClient-callMethodSynchronous('differential.cr...', Array) #4 /Users/ashutosh/work/hive/arcanist/scripts/arcanist.php(266): ArcanistDiffWo in /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitFuture.php on line 48 Any ideas how to solve this? Thanks, Ashutosh On Wed, Apr 11, 2012 at 18:37, Edward Capriolo edlinuxg...@gmail.comwrote: I think the most practical solution is try and use arc/phab and then if there is a problem fall back to Jira and do it the old way. Edward On Wed, Apr 11, 2012 at 7:17 PM, Carl Steinbach c...@cloudera.com wrote: +1 to switching over to Git. As for the rest of the Phabricator/Gerrit/Reviewboard discussion, I think we should pick this up again at the contributor meeting on Wednesday. Thanks. Carl On Wed, Apr 11, 2012 at 12:19 PM, Ashutosh Chauhan hashut...@apache.org wrote: +1 on moving away from arc/phabricator. It works great when it works, but most of the time it doesnt work. Ashutosh On Wed, Apr 11, 2012 at 11:57, Owen O'Malley omal...@apache.org wrote: On Wed, Apr 11, 2012 at 11:48 AM, Edward Capriolo edlinuxg...@gmail.com wrote: If we are going to switch from fabricator we just might as well go back to not using anything. Review board was really clunky and confusing. I'm mostly +1 to that. If no one is supporting phabricator, then it won't work for long. Personally, I'd love it if we could move Hive to git completely. Has anyone used gerrit? The videos of it make it look better than sliced bread. -- Owen
Re: hive 0.9.0 RC1
Release is not going to be blocked on it. If the patch gets committed on trunk by the time I roll RC1 I will merge it in. But, first step is to have it in trunk. Thanks, Ashutosh On Thu, Apr 19, 2012 at 17:32, Edward Capriolo edlinuxg...@gmail.comwrote: Am I missing something about? https://issues.apache.org/jira/browse/HIVE-2777 I see no way to access this feature from the hive QL language. Should we delay releases on features not usable? Edward On Thu, Apr 19, 2012 at 1:28 PM, Ashutosh Chauhan hashut...@apache.org wrote: RC0 failed to pass because of variety of reasons. In the meantime, various folks have requested for inclusion of other fixes in 0.9. Following is the list. Following list is for committers to review these patches and to get them committed in 0.9 branch. These are in Patch Available status. HIVE-2958 HIVE-2777 HIVE-2646 HIVE-2883 HIVE-2585 HIVE-538 HIVE-2904 Following list is for contributors/committers to contribute patches. These are in Open status. HIVE-2961 HIVE-2965 HIVE-2966 Thanks, Ashutosh
Re: POST limit
Hey John, Yeah this is exceptionally a large patch. https://issues.apache.org/jira/browse/HIVE-2965 Thanks, Ashutosh On Thu, Apr 19, 2012 at 23:19, John Sichi jsi...@gmail.com wrote: Ashutosh, are you submitting an exceptionally large patch of some kind? http://stackoverflow.com/questions/6279897/php-post-content-length-of-11933650-bytes-exceeds-the-limit-of-8388608-bytes We could try bumping up that limit on the server side, but first it would be good to find out whether that is really the problem (and if so what is contributing to such a big size). JVS On Thu, Apr 19, 2012 at 7:35 PM, Ashutosh Chauhan hashut...@apache.org wrote: Hit a new problem with arc today: Fatal error: Uncaught exception 'Exception' with message 'Host returned HTTP/200, but invalid JSON data in response to a Conduit method call: br / bWarning/b: Unknown: POST Content-Length of 9079953 bytes exceeds the limit of 8388608 bytes in bUnknown/b on line b0/bbr / for(;;);{result:null,error_code:ERR-INVALID-SESSION,error_info:Session key is not present.}' in /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitFuture.php:48 Stack trace: #0 /Users/ashutosh/work/hive/libphutil/src/future/proxy/FutureProxy.php(62): ConduitFuture-didReceiveResult(Array) #1 /Users/ashutosh/work/hive/libphutil/src/future/proxy/FutureProxy.php(39): FutureProxy-getResult() #2 /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitClient.php(52): FutureProxy-resolve() #3 /Users/ashutosh/work/hive/arcanist/src/workflow/diff/ArcanistDiffWorkflow.php(341): ConduitClient-callMethodSynchronous('differential.cr...', Array) #4 /Users/ashutosh/work/hive/arcanist/scripts/arcanist.php(266): ArcanistDiffWo in /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitFuture.php on line 48 Any ideas how to solve this? Thanks, Ashutosh
[VOTE] Apache Hive 0.9.0 Release Candidate 1
Hey all, Apache Hive 0.9.0 Release Candidate 1 is available here: http://people.apache.org/~hashutosh/hive-0.9.0-rc1/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-084/ Change List is available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843version=12317742 Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Ashutosh
FixVersion in jira
In case you are wondering whats up with deluge of emails from jira, here is whats going on. Committers, FixVersion is targeted for usage by committers. It should be set by a committer at the time of commit. In most cases, commit is going to be only on trunk, in which case next release version number should be picked. If commit is also made on an already released branch then, next release version number of that branch should be added in addition. For example, if you make a commit on trunk only now, fixVersion is 0.10.0. If you make a commit on trunk and 0.9 branch, then fixVersion is 0.9.1 and 0.10.0. So, request to committers is to please mark the fixVersion while committing patches. Contributors, Please don't set the fixVersion. While submitting bug report, please use Affect Version to indicate which version you have tested for your bug. Leave the fixVersion empty. It creates lot of confusion while generating release notes as to whats in the release and whats not. Thanks, Ashutosh
Re: [VOTE] Apache Hive 0.9.0 Release Candidate 1
Unfortunately, there is a small problem in RC1 that version string in build.properties is 0.9.0-SNAPSHOT, instead of 0.9.0. So, I have to rescind this vote. I will respin the RC2 shortly. Thanks, Ashutosh On Mon, Apr 23, 2012 at 10:47, Ashutosh Chauhan hashut...@apache.orgwrote: Hey all, Apache Hive 0.9.0 Release Candidate 1 is available here: http://people.apache.org/~hashutosh/hive-0.9.0-rc1/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-084/ Change List is available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843version=12317742 Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Ashutosh
[VOTE] Apache Hive 0.9.0 Release Candidate 2
Hey all, Apache Hive 0.9.0 Release Candidate 2 is available here: http://people.apache.org/~hashutosh/hive-0.9.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-094/ Change List is available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843version=12317742 Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Ashutosh
Re: [VOTE] Apache Hive 0.9.0 Release Candidate 2
Downloaded the bits. Installed on 5 node cluster. Did create table. Ran basic queries. Ran unit tests. All looks good. +1 Thanks, Ashutosh On Tue, Apr 24, 2012 at 12:29, Ashutosh Chauhan hashut...@apache.orgwrote: Hey all, Apache Hive 0.9.0 Release Candidate 2 is available here: http://people.apache.org/~hashutosh/hive-0.9.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-094/ Change List is available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843version=12317742 Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Ashutosh
Re: [VOTE] Apache Hive 0.9.0 Release Candidate 2
RC2 vote is closed now. I am excited to report that vote has passed. I will send out the note about availability of 0.9 bits once I finish publishing it. Thanks for all those who tested/voted the release. Ashutosh On Tue, Apr 24, 2012 at 12:29, Ashutosh Chauhan hashut...@apache.orgwrote: Hey all, Apache Hive 0.9.0 Release Candidate 2 is available here: http://people.apache.org/~hashutosh/hive-0.9.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-094/ Change List is available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843version=12317742 Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Ashutosh
[ANNOUNCE] Apache Hive 0.9.0 Released
The Apache Hive team is proud to announce the the release of Apache Hive version 0.9.0. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides: * Tools to enable easy data extract/transform/load (ETL) * A mechanism to impose structure on a variety of data formats * Access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM) * Query execution via MapReduce For Hive release details and downloads, please visit: http://hive.apache.org/releases.html Hive 0.9.0 Release Notes are available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843version=12317742 We would like to thank the many contributors who made this release possible. Regards, The Apache Hive Team
Re: Branch 0.9 maven snapshots?
Hey Carl, Build is reported to be failing, though all tests are passing. Any ideas? Thanks, Ashutosh On Wed, Apr 25, 2012 at 13:40, Thomas Weise t...@yahoo-inc.com wrote: Thanks Carl! On 4/25/12 1:12 PM, Carl Steinbach c...@cloudera.com wrote: Hi Thomas, I created the job: https://builds.apache.org/job/Hive-0.9.0-SNAPSHOT-h0.21/ Thanks. Carl On Tue, Apr 24, 2012 at 6:48 PM, Thomas Weise t...@yahoo-inc.com wrote: Looks like there is no maven snapshots build setup for the 0.9 branch yet. Can we have this setup similar to what is available for the 0.8 branch? Or switch that to build 0.9 instead? Thanks, Thomas
Re: Problems with Arc/Phabricator
Made some progress on using arc/phab on ubuntu. epriestley helped a ton over at #phabricator irc channel. Thanks, Evan! Now, able to make arc work on ubuntu, but seems like jira integration is broken. Hit the following problem: $arc diff —jira HIVE-3008 PHP Fatal error: Class 'ArcanistDifferentialRevisionRef' not found in /home/ashutosh/workspace/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on line 201 Fatal error: Class 'ArcanistDifferentialRevisionRef' not found in /home/ashutosh/workspace/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on line 201 Even with this error diff did get generated but it was not posted back on jira. Evan is working on a patch to fix this. He is also discussing with Facebook folks on how to tackle these issues in long term. Discussion is going on at https://secure.phabricator.com/T1206 I will request people who are actively working on Hive to follow the discussion on this ticket. Thanks, Ashutosh On Thu, Apr 19, 2012 at 5:24 PM, Ashutosh Chauhan hashut...@apache.orgwrote: Problem while using arc on ubuntu $ arc patch D2871 ARC: Cannot mix P and A UNIX: No such file or directory Any ideas whats up there. Thanks, Ashutosh On Thu, Apr 19, 2012 at 17:19, Edward Capriolo edlinuxg...@gmail.comwrote: Just throwing this out there. The phabricator IRC has more people and is usually more active then Hive IRC. #JustSaying... On Thu, Apr 19, 2012 at 7:35 PM, Ashutosh Chauhan hashut...@apache.org wrote: Hit a new problem with arc today: Fatal error: Uncaught exception 'Exception' with message 'Host returned HTTP/200, but invalid JSON data in response to a Conduit method call: br / bWarning/b: Unknown: POST Content-Length of 9079953 bytes exceeds the limit of 8388608 bytes in bUnknown/b on line b0/bbr / for(;;);{result:null,error_code:ERR-INVALID-SESSION,error_info:Session key is not present.}' in /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitFuture.php:48 Stack trace: #0 /Users/ashutosh/work/hive/libphutil/src/future/proxy/FutureProxy.php(62): ConduitFuture-didReceiveResult(Array) #1 /Users/ashutosh/work/hive/libphutil/src/future/proxy/FutureProxy.php(39): FutureProxy-getResult() #2 /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitClient.php(52): FutureProxy-resolve() #3 /Users/ashutosh/work/hive/arcanist/src/workflow/diff/ArcanistDiffWorkflow.php(341): ConduitClient-callMethodSynchronous('differential.cr...', Array) #4 /Users/ashutosh/work/hive/arcanist/scripts/arcanist.php(266): ArcanistDiffWo in /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitFuture.php on line 48 Any ideas how to solve this? Thanks, Ashutosh On Wed, Apr 11, 2012 at 18:37, Edward Capriolo edlinuxg...@gmail.com wrote: I think the most practical solution is try and use arc/phab and then if there is a problem fall back to Jira and do it the old way. Edward On Wed, Apr 11, 2012 at 7:17 PM, Carl Steinbach c...@cloudera.com wrote: +1 to switching over to Git. As for the rest of the Phabricator/Gerrit/Reviewboard discussion, I think we should pick this up again at the contributor meeting on Wednesday. Thanks. Carl On Wed, Apr 11, 2012 at 12:19 PM, Ashutosh Chauhan hashut...@apache.org wrote: +1 on moving away from arc/phabricator. It works great when it works, but most of the time it doesnt work. Ashutosh On Wed, Apr 11, 2012 at 11:57, Owen O'Malley omal...@apache.org wrote: On Wed, Apr 11, 2012 at 11:48 AM, Edward Capriolo edlinuxg...@gmail.com wrote: If we are going to switch from fabricator we just might as well go back to not using anything. Review board was really clunky and confusing. I'm mostly +1 to that. If no one is supporting phabricator, then it won't work for long. Personally, I'd love it if we could move Hive to git completely. Has anyone used gerrit? The videos of it make it look better than sliced bread. -- Owen
Re: Problems with Arc/Phabricator
Doesn't work for me either. I see error message. Ashutosh On Wed, May 9, 2012 at 12:24 AM, Carl Steinbach c...@cloudera.com wrote: Actually, I take that back. After logging in I'm now back to the original error message. On Wed, May 9, 2012 at 12:22 AM, Carl Steinbach c...@cloudera.com wrote: Hi John, Thanks for checking. I got the page to load again after clearing my browser's cache. Carl On Wed, May 9, 2012 at 12:17 AM, John Sichi jsi...@gmail.com wrote: Regarding the reviews.facebook.net website, I tried just now and it seems to be working for me; here's a screenshot of what I get for https://reviews.facebook.net/D3075: http://i.imgur.com/umHlB.png JVS
Re: new feature in hive: links
To kickstart the review, I did a quick review of the doc. Few questions popped out to me, which I asked. Sambavi was kind enough to come back with replies for them. I am continuing to look into it. Will encourage other folks to look into it as well. Thanks, Ashutosh Begin Forward Message Hi Ashutosh ** ** Thanks for looking through the design and providing your feedback! ** ** Responses below: * What exactly is contained in tracking capacity usage. One is disk space. That I presume you are going to track via summing size under database directory. Are you also thinking of tracking resource usage in terms of CPU/memory/network utilization for different teams? Right now the capacity usage in Hive we will track is the disk space (managed tables that belong to the namespace + imported tables). We will track the mappers and reducers that the namepace utilizes directly from Hadoop. ** ** * Each namespace (ns) will have exactly one database. If so, then users are not allowed to create/use databases in such deployment? Not necessarily a problem, just trying to understand design. Yes, you are correct – this is a limitation of the design. Introducing a new concept seemed heavyweight, so you can instead think of this as “self-contained” databases. But it means that a given namespace cannot have sub-databases in it. ** ** * How are you going to keep metadata consistent across two ns? If metadata gets updated in remote ns, will it get automatically updated in user's local ns? If yes, how will this be implemented? If no, then every time user need to use data from remote ns, she has to bring metadata uptodate in her ns. How will she do it? Metadata will be kept in sync for linked tables. We will make alter table on the remote table (source of the link) cause an update to the target of the link. Note that from a Hive perspective, the metadata for the source and target of a link is in the same metastore. ** ** * Is it even required that metadata of two linked tables to be consistent? Seems like user has to run alter link add partition herself for each partition. She can choose only to add few partitions. In this case, tables in two ns have different number of partitions and thus data. What you say above is true for static links. For dynamic links, add and drop partition on the source of the link will cause the target to get those partitions as well (we trap alter table add/drop partition to provide this behavior). ** ** * Who is allowed to create links? Any user on the database who has create/all privileges on the database. We could potentially create a new privilege for this, but I think create privilege should suffice. We can similarly map alter, drop privileges to the appropriate operations. ** ** * Once user creates a link, who can use it? If everyone is allowed to access, then I don't see how is it different from the problem that you are outlining in first alternative design option, wherein user having an access to two ns via roles has access to data on both ns. The link creates metadata in the target database. So you can only access data that has been linked into this database (access is via the T@Y or Y.T syntax depending on the chosen design option). Note that this is different than having a role that a user maps to since in that case, there is no local metadata in the target database specifying if the imported data is accessible from this database. ** ** * If links are first class concepts, then authorization model also needs to understand them? I don't see any mention of that. Yes, you are correct. We need to account for the authorization model. ** ** * I see there is a hdfs jira for implementing hard links of files in hdfs layer, so that takes care of linking physical data on hdfs. What about tables whose data is stored in external systems. For example, hbase. Does hbase also needs to implement feature of hard-linking their table for hive to make use of this feature? What about other storage handlers like cassandra, mongodb etc. The link does not create a link on HDFS. It just points to the source table/partitions. You can think of it as a Hive-level link so there is no need for any changes/features from the other storage handlers. ** ** * Migration will involve two step process of distcp'ing data from one cluster to another and then replicating one mysql instance to another. Are there any other steps? Do you plan to (later) build tools to automate this process of migration. Yes, we will be building tools to enable migration of a namespace. Migration will involve replicating the metadata and the data as you mention above. ** ** * When migrating ns from one datacenter to another, will links be dropped or they are also preserved? We will preserve them – by copying the data for the links to the other datacenter. ** ** Hope that helps. Please ask any more questions that come up as
Re: non-string partition columns
Some discussion for this has happened on https://issues.apache.org/jira/browse/HIVE-2702 Is the underlying problem same as the one which I described on that jira ? Thanks, Ashutosh On Thu, May 24, 2012 at 10:59 PM, Namit Jain nj...@fb.com wrote: Should we disallow non-string partition columns completely ? Does anyone depend on that ? On 5/24/12 6:49 PM, Namit Jain nj...@fb.com wrote: http://svn.apache.org/viewvc?view=revisionrevision=1308427 The patch above broke drop partitions if the partition happens to be non-string. This is due to a JDO issue with non-string columns. Is anyone using non-string partition columns ? Should be force the partition columns to be only of type string ? The documentation probably does not specify anything clearly. If someone is dependent on non-string partition column, we need to revert this patch, or make a special case for string partition columns. Thanks, -namit
Re: non-string partition columns
FWIW.. HCatalog only allows partition columns of type string precisely because in backend datastore type information is not recorded. In my opinion, partition type should be restricted to type string until we fix this problem, otherwise it gives unexpected behavior to endusers and/or bug-reps. One possibility is to introduce config variable hive.partition.column.type and has it value set to string by default. This ensures that new users get expected behavior of string-only partition columns. Users who already use other types can reset this config value to all in their deployment when they upgrade to newer version of Hive (assuming new version comes out without a proper fix). This extra step of reseting default config will help them to understand the risk they are taking by changing default value. Thanks, Ashutosh On Tue, May 29, 2012 at 10:02 AM, Namit Jain nj...@fb.com wrote: OK, I will keep the support. Add special casing for string columns in DDLTask On 5/29/12 9:27 AM, Edward Capriolo edlinuxg...@gmail.com wrote: We use them to we store our dates as integers like 20120130. This allows us to do partition pruning with ranges. On Tue, May 29, 2012 at 4:10 AM, Aniket Mokashi aniket...@gmail.com wrote: We are using non-string partition columns in production as well. Thanks, Aniket On Sat, May 26, 2012 at 1:20 AM, Philip Tromans philip.j.trom...@gmail.comwrote: We're using non-string partition columns in production. I think non string partition columns are a good thing to have - it allows you to do all sorts of date range calculations etc. AFAIK, MySQL's partition columns can be of any type. Phil. On May 26, 2012 7:55 AM, Namit Jain nj...@fb.com wrote: Should I go ahead and file a jira to disallow non-string partition columns ? Or, someone depends on that functionality. On 5/25/12 10:01 AM, Namit Jain nj...@fb.com wrote: Yes, but the meta-question is: Is anyone dependent on non-string partition columns ? Should we drop the support for non-string partition columns ? Thanks, -namit On 5/24/12 11:21 PM, Ashutosh Chauhan hashut...@apache.org wrote: Some discussion for this has happened on https://issues.apache.org/jira/browse/HIVE-2702 Is the underlying problem same as the one which I described on that jira ? Thanks, Ashutosh On Thu, May 24, 2012 at 10:59 PM, Namit Jain nj...@fb.com wrote: Should we disallow non-string partition columns completely ? Does anyone depend on that ? On 5/24/12 6:49 PM, Namit Jain nj...@fb.com wrote: http://svn.apache.org/viewvc?view=revisionrevision=1308427 The patch above broke drop partitions if the partition happens to be non-string. This is due to a JDO issue with non-string columns. Is anyone using non-string partition columns ? Should be force the partition columns to be only of type string ? The documentation probably does not specify anything clearly. If someone is dependent on non-string partition column, we need to revert this patch, or make a special case for string partition columns. Thanks, -namit -- ...:::Aniket:::... Quetzalco@tl
Re: Behavior of Hive 2837: insert into external tables should not be allowed
Hi Mark, I understand your concern w.r.t backward compatibility. But as Ed pointed out there is a config variable and by default semantic is unchanged so you can continue to insert into your external table. I have a question though. Why are you creating all your tables as external tables ? Why not regular tables? Thanks, Ashutosh On Thu, May 31, 2012 at 9:35 PM, Mark Grover grover.markgro...@gmail.comwrote: Hi folks, I have a question regarding HIVE 2837( https://issues.apache.org/jira/browse/HIVE-2837) that deals with disallowing external table from using insert into queries. From looking at the JIRA, it seems like it applies to external tables on HDFS as well. Technically, insert into should be ok for external tables on HDFS (and S3 as well). Seems like a storage file system level thing to specify whether insert into is applied and implement it. Historically, there hasn't been any real difference between creating an external table on HDFS vs creating a managed one. However, if we disallow insert into on external tables, that would mean that folks with external tables on HDFS wouldn't be able to make use of insert into functionality even though they should be able to. Do we want to allow insert into on HDFS tables regardless of whether they are external or not? Mark
Re: test errors
Works for me. $ svn up svn At revision 1348932. $ svn st $ ant clean package test -Dtestcase=TestZooKeeperTokenStore [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.783 sec Any logs to look at? Ashutosh On Mon, Jun 11, 2012 at 5:33 AM, Namit Jain nj...@fb.com wrote: I am seeing the following errors on a fresh hive trunk ? [junit] Running org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore FAILED (crashed) Is anyone else getting the same error ? Thanks, -namit
Re: Review Request: Allow to download resources from any external File Systems to local machine.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5687/#review8776 --- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java https://reviews.apache.org/r/5687/#comment18552 Instead of regex, it might be better to use URI to parse the string. String scheme = new Path(value).toURI().getScheme(); return (scheme != null) !scheme.equalsIgnoreCase(file); - Ashutosh Chauhan On June 30, 2012, 6:15 p.m., Kanna Karanam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5687/ --- (Updated June 30, 2012, 6:15 p.m.) Review request for hive, Carl Steinbach, Edward Capriolo, and Ashutosh Chauhan. Description --- Instead of restricting resources download to s3, s3n, hdfs make it open for any external file systems. This addresses bug HIVE-3146. https://issues.apache.org/jira/browse/HIVE-3146 Diffs - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1355510 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 1355510 Diff: https://reviews.apache.org/r/5687/diff/ Testing --- Yes. All unit tests passed. Thanks, Kanna Karanam
Re: Review Request: Resource Leak: Fix the File handle leak in EximUtil.java
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5777/#review9021 --- Ship it! Ship It! http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java https://reviews.apache.org/r/5777/#comment19184 path here is used to construct URI object later. URI constructor javadocs says that path should either begin with '/' or should be empty, irrespective of OS. So, looks like check for path.startsWith('/') makes most sense here. As Kanna noted his changes make path to /D:/hive/etc from D:/hive/etc thus making URI constructor happy. - Ashutosh Chauhan On July 5, 2012, 7:50 p.m., Kanna Karanam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5777/ --- (Updated July 5, 2012, 7:50 p.m.) Review request for hive, Carl Steinbach, Edward Capriolo, and Ashutosh Chauhan. Description --- 1) Not closing the file handle EximUtil after reading the metadata from the file. 2) Nit: Get the path from URI to handle the Windows paths. This addresses bug HIVE-3232. https://issues.apache.org/jira/browse/HIVE-3232 Diffs - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java 1357818 Diff: https://reviews.apache.org/r/5777/diff/ Testing --- Yes Thanks, Kanna Karanam
Re: Review Request: Remove the Unix specific absolute path of “Cat” utility in several .q files to make them run on Windows with CygWin in path.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6313/#review9874 --- Ship it! Ship It! - Ashutosh Chauhan On Aug. 2, 2012, 4:51 a.m., Kanna Karanam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6313/ --- (Updated Aug. 2, 2012, 4:51 a.m.) Review request for hive, Carl Steinbach, Edward Capriolo, and Ashutosh Chauhan. Description --- Several .q files have Unix absolute paths for Cat utility so all of them are failing on Windows even with CygWin support. This addresses bug HIVE-3327. https://issues.apache.org/jira/browse/HIVE-3327 Diffs - trunk/contrib/src/test/queries/clientpositive/serde_typedbytes.q 1368192 trunk/contrib/src/test/queries/clientpositive/serde_typedbytes2.q 1368192 trunk/contrib/src/test/queries/clientpositive/serde_typedbytes3.q 1368192 trunk/contrib/src/test/queries/clientpositive/serde_typedbytes4.q 1368192 trunk/contrib/src/test/results/clientpositive/serde_typedbytes.q.out 1368192 trunk/contrib/src/test/results/clientpositive/serde_typedbytes2.q.out 1368192 trunk/contrib/src/test/results/clientpositive/serde_typedbytes3.q.out 1368192 trunk/contrib/src/test/results/clientpositive/serde_typedbytes4.q.out 1368192 trunk/ql/src/test/queries/clientnegative/clusterbydistributeby.q 1368192 trunk/ql/src/test/queries/clientnegative/clusterbyorderby.q 1368192 trunk/ql/src/test/queries/clientnegative/clusterbysortby.q 1368192 trunk/ql/src/test/queries/clientnegative/orderbysortby.q 1368192 trunk/ql/src/test/queries/clientpositive/input14.q 1368192 trunk/ql/src/test/queries/clientpositive/input14_limit.q 1368192 trunk/ql/src/test/queries/clientpositive/input17.q 1368192 trunk/ql/src/test/queries/clientpositive/input18.q 1368192 trunk/ql/src/test/queries/clientpositive/input34.q 1368192 trunk/ql/src/test/queries/clientpositive/input35.q 1368192 trunk/ql/src/test/queries/clientpositive/input36.q 1368192 trunk/ql/src/test/queries/clientpositive/input38.q 1368192 trunk/ql/src/test/queries/clientpositive/input5.q 1368192 trunk/ql/src/test/queries/clientpositive/mapreduce1.q 1368192 trunk/ql/src/test/queries/clientpositive/mapreduce2.q 1368192 trunk/ql/src/test/queries/clientpositive/mapreduce3.q 1368192 trunk/ql/src/test/queries/clientpositive/mapreduce4.q 1368192 trunk/ql/src/test/queries/clientpositive/mapreduce7.q 1368192 trunk/ql/src/test/queries/clientpositive/mapreduce8.q 1368192 trunk/ql/src/test/queries/clientpositive/newline.q 1368192 trunk/ql/src/test/queries/clientpositive/nullscript.q 1368192 trunk/ql/src/test/queries/clientpositive/partcols1.q 1368192 trunk/ql/src/test/queries/clientpositive/ppd_transform.q 1368192 trunk/ql/src/test/queries/clientpositive/query_with_semi.q 1368192 trunk/ql/src/test/queries/clientpositive/regexp_extract.q 1368192 trunk/ql/src/test/queries/clientpositive/select_transform_hint.q 1368192 trunk/ql/src/test/queries/clientpositive/transform_ppr1.q 1368192 trunk/ql/src/test/queries/clientpositive/transform_ppr2.q 1368192 trunk/ql/src/test/results/clientpositive/input14.q.out 1368192 trunk/ql/src/test/results/clientpositive/input14_limit.q.out 1368192 trunk/ql/src/test/results/clientpositive/input17.q.out 1368192 trunk/ql/src/test/results/clientpositive/input18.q.out 1368192 trunk/ql/src/test/results/clientpositive/input34.q.out 1368192 trunk/ql/src/test/results/clientpositive/input35.q.out 1368192 trunk/ql/src/test/results/clientpositive/input36.q.out 1368192 trunk/ql/src/test/results/clientpositive/input38.q.out 1368192 trunk/ql/src/test/results/clientpositive/input5.q.out 1368192 trunk/ql/src/test/results/clientpositive/mapreduce1.q.out 1368192 trunk/ql/src/test/results/clientpositive/mapreduce2.q.out 1368192 trunk/ql/src/test/results/clientpositive/mapreduce3.q.out 1368192 trunk/ql/src/test/results/clientpositive/mapreduce4.q.out 1368192 trunk/ql/src/test/results/clientpositive/mapreduce7.q.out 1368192 trunk/ql/src/test/results/clientpositive/mapreduce8.q.out 1368192 trunk/ql/src/test/results/clientpositive/newline.q.out 1368192 trunk/ql/src/test/results/clientpositive/nullscript.q.out 1368192 trunk/ql/src/test/results/clientpositive/partcols1.q.out 1368192 trunk/ql/src/test/results/clientpositive/ppd_transform.q.out 1368192 trunk/ql/src/test/results/clientpositive/query_with_semi.q.out 1368192 trunk/ql/src/test/results/clientpositive/regexp_extract.q.out 1368192 trunk/ql/src/test/results/clientpositive/select_transform_hint.q.out 1368192 trunk
Re: Review Request: This function overloads the current DateDiff(expr1, expr2) by adding another parameter to specify the units.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6027/#review9924 --- trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java https://reviews.apache.org/r/6027/#comment21120 Instead of these enums, can we use these ints instead http://docs.oracle.com/javase/6/docs/api/constant-values.html#java.text.DateFormat ? Also, I don't think microseconds make sense, we don't have that precision in any case. trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java https://reviews.apache.org/r/6027/#comment21121 Lets get rid of formatter variable, add default format (-MM-dd) as first format in dateFormats and use formatLong() for all formats trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java https://reviews.apache.org/r/6027/#comment21126 Instead of doing instanceOf later on, use toDate() / toTimeStamp() depending on unit here itself. Then, have evalutateObj(Date, Date). trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java https://reviews.apache.org/r/6027/#comment21122 Avoid unnecessary object creation. Do, Date date1 = resolveDate(dateObj1, unit) which is more appropriate. Similarly for date2. trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java https://reviews.apache.org/r/6027/#comment21118 Looks like this function is not used anywhere. Please remove it. trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java https://reviews.apache.org/r/6027/#comment21119 Looks like this function is not used anywhere. Please, remove it. - Ashutosh Chauhan On July 18, 2012, 12:56 a.m., Shefali Vohra wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6027/ --- (Updated July 18, 2012, 12:56 a.m.) Review request for hive and Ashutosh Chauhan. Description --- Parameters This function overloads the current DateDiff(expr1, expr2) by adding another parameter to specify the units. It takes 3 parameters. The first two are timestamps, and the formats accepted are: -MM-dd -MM-dd HH:mm:ss -MM-dd HH:mm:ss.milli These are the formats accepted by the current DateDiff(expr1, expr2) function and allow for that consistency. The accepted data types for the timestamp will be Text, TimestampWritable, Date, and String, just as with the already existing function. The third parameter is the units the user wants the response to be in. Acceptable units are: Microsecond Millisecond Second Minute Hour Day Week Month Quarter Year When calculating the difference, the full timestamp is used when the specified unit is hour or smaller (microsecond, millisecond, second, minute, hour), and only the date part is used if the unit is day or larger (day, week, month, quarter, year). If for the smaller units the time is not specified and the format -MM-dd is used, the time 00:00:00.0 is used. Leap years are accounted for by the Calendar class in Java, which inherently addresses the issue. The assumption is made that all these time parameters are in the same time zone. Return Value The function returns expr1 - expr2 expressed as an int in the units specified. Hive vs. SQL SQL also has a DateDiff() function with some more acceptable units. The order of parameters is different between SQL and Hive. The reason for this is that Hive already has a DateDiff() function with the same first two parameters, and having this order here allows for that consistency within Hive. Example Query hive DATEDIFF(DATE_FIELD, '2012-06-01', ‘day’); Diagnostic Error Messages Invalid table alias or column name reference Table not found This addresses bug HIVE-3216. https://issues.apache.org/jira/browse/HIVE-3216 Diffs - trunk/data/files/datetable.txt PRE-CREATION trunk/data/files/timestamptable.txt PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java 1362724 trunk/ql/src/test/queries/clientnegative/udf_datediff.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/udf_datediff.q 1362724 trunk/ql/src/test/results/clientnegative/udf_datediff.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/udf_datediff.q.out 1362724 Diff: https://reviews.apache.org/r/6027/diff/ Testing --- positive and negative test cases included Thanks, Shefali Vohra
Re: [ANNOUNCE] New Hive Committer - Navis Ryu
Congrats, Navis! Well deserved. Welcome, aboard! Ashutosh On Fri, Aug 10, 2012 at 11:10 AM, Carl Steinbach c...@cloudera.com wrote: Congratulations Navis! This is very well deserved. Looking forward to many more patches from you. On Fri, Aug 10, 2012 at 8:10 AM, Bejoy KS bejoy...@yahoo.com wrote: Congrats Navis.. :) Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: alo alt wget.n...@gmail.com Date: Fri, 10 Aug 2012 17:08:07 To: u...@hive.apache.org Reply-To: u...@hive.apache.org Cc: dev@hive.apache.org; navis@nexr.com Subject: Re: [ANNOUNCE] New Hive Committer - Navis Ryu Congratulations! Well done :) cheers, ALex On Aug 10, 2012, at 11:58 AM, John Sichi jsi...@gmail.com wrote: The Apache Hive PMC has passed a vote to make Navis Ryu a new committer on the project. JIRA is currently down, so I can't send out a link with his contribution list at the moment, but if you have an account at reviews.facebook.net, you can see his activity here: https://reviews.facebook.net/p/navis/ Navis, please submit your CLA to the Apache Software Foundation as described here: http://www.apache.org/licenses/#clas Congratulations! JVS -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
Re: Review Request: HIVE-3409. Increase test.timeout value
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6780/#review10775 --- Ship it! Ship It! - Ashutosh Chauhan On Aug. 27, 2012, 10:09 a.m., Carl Steinbach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6780/ --- (Updated Aug. 27, 2012, 10:09 a.m.) Review request for hive and Ashutosh Chauhan. Description --- commit bfb69f3cb607a2daadcff07dd87b8924ae19ae2b Author: Carl Steinbach c...@cloudera.com Date: Mon Aug 27 03:05:19 2012 -0700 Add test.junit.timeout to build.properties, and double value build-common.xml | 3 +-- build.properties | 6 ++ common/build.xml | 2 +- 3 files changed, 8 insertions(+), 3 deletions(-) This addresses bug HIVE-3409. https://issues.apache.org/jira/browse/HIVE-3409 Diffs - build-common.xml f2697e1 build.properties ff9eba9 common/build.xml 2712c03 Diff: https://reviews.apache.org/r/6780/diff/ Testing --- Thanks, Carl Steinbach
Re: Review Request: HIVE-3409. Increase test.timeout value
On Aug. 27, 2012, 3:07 p.m., Ashutosh Chauhan wrote: Ship It! +1 Please commit. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6780/#review10775 --- On Aug. 27, 2012, 10:09 a.m., Carl Steinbach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6780/ --- (Updated Aug. 27, 2012, 10:09 a.m.) Review request for hive and Ashutosh Chauhan. Description --- commit bfb69f3cb607a2daadcff07dd87b8924ae19ae2b Author: Carl Steinbach c...@cloudera.com Date: Mon Aug 27 03:05:19 2012 -0700 Add test.junit.timeout to build.properties, and double value build-common.xml | 3 +-- build.properties | 6 ++ common/build.xml | 2 +- 3 files changed, 8 insertions(+), 3 deletions(-) This addresses bug HIVE-3409. https://issues.apache.org/jira/browse/HIVE-3409 Diffs - build-common.xml f2697e1 build.properties ff9eba9 common/build.xml 2712c03 Diff: https://reviews.apache.org/r/6780/diff/ Testing --- Thanks, Carl Steinbach
Re: [VOTE] Apache Hive 0.11.0 Release Candidate 2
+1 Built from sources, ran few unit tests and some simple queries against 1-node cluster. Thanks, Ashutosh On Wed, May 15, 2013 at 5:30 PM, Navis류승우 navis@nexr.com wrote: +1 - built from source, passed all tests (without assertion fail or conversion to backup task) - working good with queries from running-site - some complaints on missing HIVE-4172 (void type for JDBC2) Thanks 2013/5/12 Owen O'Malley omal...@apache.org: Based on feedback from everyone, I have respun release candidate, RC2. Please take a look. We've fixed 7 problems with the previous RC: * Release notes were incorrect * HIVE-4018 - MapJoin failing with Distributed Cache error * HIVE-4421 - Improve memory usage by ORC dictionaries * HIVE-4500 - Ensure that HiveServer 2 closes log files. * HIVE-4494 - ORC map columns get class cast exception in some contexts * HIVE-4498 - Fix TestBeeLineWithArgs failure * HIVE-4505 - Hive can't load transforms with remote scripts * HIVE-4527 - Fix the eclipse template Source tag for RC2 is at: https://svn.apache.org/repos/asf/hive/tags/release-0.11.0rc2 Source tar ball and convenience binary artifacts can be found at: http://people.apache.org/~omalley/hive-0.11.0rc2/ This release has many goodies including HiveServer2, integrated hcatalog, windowing and analytical functions, decimal data type, better query planning, performance enhancements and various bug fixes. In total, we resolved more than 350 issues. Full list of fixed issues can be found at: http://s.apache.org/8Fr Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Owen
Re: Review Request: HIVE-4489: beeline always return the same error message twice
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10917/#review21291 --- Ship it! +1 - Ashutosh Chauhan On May 3, 2013, 11:01 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10917/ --- (Updated May 3, 2013, 11:01 p.m.) Review request for hive. Description --- Beeline always returns the same error message twice -- because the error is logged out both in an exception catch block and its outer re-catch block. This addresses bug HIVE-4489. https://issues.apache.org/jira/browse/HIVE-4489 Diffs - beeline/src/java/org/apache/hive/beeline/Commands.java 8e2a52f Diff: https://reviews.apache.org/r/10917/diff/ Testing --- Have done the tests. Thanks, Chaoyu Tang
Re: Review Request: Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 , if all the column values larger than 0.0 (or if all column values smaller than 0.0)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11172/#review21293 --- Request for comments. http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java https://reviews.apache.org/r/11172/#comment44172 Can you add a comment about why we need to set it MAX value instead of 0, since its not apparent? http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java https://reviews.apache.org/r/11172/#comment44173 Similarly can you add a comment why we need to set max to -ve infinity and not 0, since its counter intuitive? - Ashutosh Chauhan On May 15, 2013, 7:11 a.m., Zhuoluo Yang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11172/ --- (Updated May 15, 2013, 7:11 a.m.) Review request for hive, Carl Steinbach, Carl Steinbach, and fangkun cao. Description --- An initialization error. Make double and long initialize correctly. Would you review that and assign the issue to me? This addresses bug HIVE-4561. https://issues.apache.org/jira/browse/HIVE-4561 Diffs - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 1482697 Diff: https://reviews.apache.org/r/11172/diff/ Testing --- Thanks, Zhuoluo Yang
Re: Review Request: HIVE-4513 - disable hivehistory logs by default
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11029/#review21352 --- data/conf/hive-site.xml https://reviews.apache.org/r/11029/#comment44263 Is there a reason for this to be set to true for tests? Unless there is, we should set config in tests to the default values, since we should test default configs. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java https://reviews.apache.org/r/11029/#comment44264 doesn't read right. I guess you wanted ... statistics into a file. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java https://reviews.apache.org/r/11029/#comment44266 This is existing comment which doesnt read right. But since we are doing major surgery on HiveHistory, it will be good to update to make it more sensible. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java https://reviews.apache.org/r/11029/#comment44268 I think word job is not required in this comment. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java https://reviews.apache.org/r/11029/#comment44269 I think query is a better word than job here. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java https://reviews.apache.org/r/11029/#comment44270 Better worded as Called at the end of query. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java https://reviews.apache.org/r/11029/#comment44271 Again use of word job is confusing, we shall use query here as well. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java https://reviews.apache.org/r/11029/#comment44272 Incorrect comment. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java https://reviews.apache.org/r/11029/#comment44274 Function name is IdtoTable, but comment says table to id. One of this needs to be corrected. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java https://reviews.apache.org/r/11029/#comment44275 Similar comment as in HiveHistory.java ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java https://reviews.apache.org/r/11029/#comment44277 Should this be hive.ql.exec.HiveHistoryImpl to avoid confusion? ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java https://reviews.apache.org/r/11029/#comment44278 and instead of an ? ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java https://reviews.apache.org/r/11029/#comment44280 In case of incorrect config, should this throw an exception instead of silent return, otherwise there will be errors later when something is tried to be written in history file. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java https://reviews.apache.org/r/11029/#comment44281 Same comment as above. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java https://reviews.apache.org/r/11029/#comment44283 This should be static class variable, otherwise nextInt() will return same value for each invocation. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java https://reviews.apache.org/r/11029/#comment44284 Instead of / we shall use File.Seprator ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java https://reviews.apache.org/r/11029/#comment44287 Consider using File.createNewFile here. ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java https://reviews.apache.org/r/11029/#comment44288 Use System.getProperty(line.separator) instead of \n ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java https://reviews.apache.org/r/11029/#comment44289 start of query ? ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java https://reviews.apache.org/r/11029/#comment44291 Missing apache header ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java https://reviews.apache.org/r/11029/#comment44292 HiveHistoryViewer.class - Ashutosh Chauhan On May 13, 2013, 10:12 p.m., Thejas Nair wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11029/ --- (Updated May 13, 2013, 10:12 p.m.) Review request for hive. Description --- HiveHistory log files (hive_job_log_hive_*.txt files) store information about hive query such as query string, plan , counters and MR job progress information. There is no mechanism to delete these files and as a result they get accumulated over time, using up lot of disk space. I don't think this is used by most people, so I think it would better to turn this off by default. Jobtracker logs already capture most of this information, though it is not as structured as history logs
Re: Review Request: Review Request for HIVE-4554 Failed to create a table from existing file if file path has spaces
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11335/#review21366 --- Patch looks good, apart from one comment. ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java https://reviews.apache.org/r/11335/#comment44301 Apart from this change, all other changes are contained within if(isLocal) block. Because of this it seems its possible it might be triggered for non-local paths as well. Can you test it for hdfs:// path which has spaces. If its easy, it will be good to add it in test, else manual test is fine as well. - Ashutosh Chauhan On June 3, 2013, 10:18 p.m., Xuefu Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11335/ --- (Updated June 3, 2013, 10:18 p.m.) Review request for hive and Ashutosh Chauhan. Description --- Patch includes fix and new test case. This addresses bug HIVE-4554. https://issues.apache.org/jira/browse/HIVE-4554 Diffs - data/files/person PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java bd8d252 ql/src/test/queries/clientpositive/load_file_with_space_in_the_name.q PRE-CREATION ql/src/test/results/clientpositive/load_file_with_space_in_the_name.q.out PRE-CREATION Diff: https://reviews.apache.org/r/11335/diff/ Testing --- Thanks, Xuefu Zhang
Re: Review Request: Review Request for HIVE-4554 Failed to create a table from existing file if file path has spaces
On June 3, 2013, 11:15 p.m., Ashutosh Chauhan wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java, line 273 https://reviews.apache.org/r/11335/diff/3/?file=300140#file300140line273 Apart from this change, all other changes are contained within if(isLocal) block. Because of this it seems its possible it might be triggered for non-local paths as well. Can you test it for hdfs:// path which has spaces. If its easy, it will be good to add it in test, else manual test is fine as well. Xuefu Zhang wrote: I tried to add a testcase loading file at HDFS into a table without a success. Doing this requires an HDFS accessible from the test machine. Please let me know if you think there is mechanism. However, I did manually test the case, and it works fine for me. (It fails w/o the patch.) Glad that its working. You can add this test-case for MinmrCliDriver . Just write a regular .q test file and then include that within minimr.query.files parameter in build-common.xml . Those testcases run against minicluster so you can access hdfs:// there. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11335/#review21366 --- On June 3, 2013, 10:18 p.m., Xuefu Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11335/ --- (Updated June 3, 2013, 10:18 p.m.) Review request for hive and Ashutosh Chauhan. Description --- Patch includes fix and new test case. This addresses bug HIVE-4554. https://issues.apache.org/jira/browse/HIVE-4554 Diffs - data/files/person PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java bd8d252 ql/src/test/queries/clientpositive/load_file_with_space_in_the_name.q PRE-CREATION ql/src/test/results/clientpositive/load_file_with_space_in_the_name.q.out PRE-CREATION Diff: https://reviews.apache.org/r/11335/diff/ Testing --- Thanks, Xuefu Zhang
Re: Review Request: Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 , if all the column values larger than 0.0 (or if all column values smaller than 0.0)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11172/#review21480 --- Ship it! +1 - Ashutosh Chauhan On June 5, 2013, 2:06 p.m., Zhuoluo Yang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11172/ --- (Updated June 5, 2013, 2:06 p.m.) Review request for hive, Carl Steinbach, Carl Steinbach, Ashutosh Chauhan, Shreepadma Venugopalan, and fangkun cao. Description --- An initialization error. Make double and long initialize correctly. Would you review that and assign the issue to me? This addresses bug HIVE-4561. https://issues.apache.org/jira/browse/HIVE-4561 Diffs - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 1489292 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/compute_stats_empty_table.q.out 1489292 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/compute_stats_long.q.out 1489292 Diff: https://reviews.apache.org/r/11172/diff/ Testing --- ant test -Dtestcase=TestCliDriver -Dqfile=compute_stats_long.q ant test -Dtestcase=TestCliDriver -Dqfile=compute_stats_double.q done. Thanks, Zhuoluo Yang
Re: Review Request: HIVE-4712: Fix TestCliDriver.truncate_* on 0.23
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11801/#review21708 --- Ship it! +1 - Ashutosh Chauhan On June 11, 2013, 7:18 a.m., Brock Noland wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11801/ --- (Updated June 11, 2013, 7:18 a.m.) Review request for hive and Ashutosh Chauhan. Description --- Queries just needed and order by to be deterministic. This addresses bug HIVE-4712. https://issues.apache.org/jira/browse/HIVE-4712 Diffs - ql/src/test/queries/clientpositive/truncate_column.q f172bae ql/src/test/queries/clientpositive/truncate_column_merge.q 20ef643 ql/src/test/results/clientpositive/truncate_column.q.out f8af6d0 ql/src/test/results/clientpositive/truncate_column_merge.q.out 64a917b Diff: https://reviews.apache.org/r/11801/diff/ Testing --- Passes with both 0.20S and 0.23. Thanks, Brock Noland
Re: Branch for HIVE-4660
Makes sense. I will create the branch soon. Thanks, Ashutosh On Tue, Jun 11, 2013 at 7:44 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: Hi, I am starting to work on integrating Tez into Hive (see HIVE-4660, design doc has already been uploaded - any feedback will be much appreciated). This will be a fair amount of work that will take time to stabilize/test. I'd like to propose creating a branch in order to be able to do this incrementally and collaboratively. In order to progress rapidly with this, I would also like to go commit-then-review. Thanks, Gunther.
Re: Review Request: Initialize object inspectors with union of table properties and partition properties
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11632/#review22081 --- Ship it! +1 - Ashutosh Chauhan On June 4, 2013, 6:01 p.m., Mark Wagner wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11632/ --- (Updated June 4, 2013, 6:01 p.m.) Review request for hive and Ashutosh Chauhan. Description --- Change the initialization of object inspectors and deserializers to use the union of partition properties and table properties for partitioned tables. There is no change for unpartitioned tables. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 9422bf7 ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java f0b16e4 ql/src/test/queries/clientpositive/avro_partitioned.q PRE-CREATION ql/src/test/results/clientpositive/avro_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/11632/diff/ Testing --- I've done manual end-to-end testing with various queries/tables and have created a .q test for reading partitioned Avro tables. Thanks, Mark Wagner
Re: Branch for HIVE-4660
Created the branch from tip of trunk. Check it out https://svn.apache.org/repos/asf/hive/branches/tez/ Thanks, Ashutosh On Thu, Jun 13, 2013 at 5:43 PM, Ashutosh Chauhan hashut...@apache.orgwrote: Makes sense. I will create the branch soon. Thanks, Ashutosh On Tue, Jun 11, 2013 at 7:44 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: Hi, I am starting to work on integrating Tez into Hive (see HIVE-4660, design doc has already been uploaded - any feedback will be much appreciated). This will be a fair amount of work that will take time to stabilize/test. I'd like to propose creating a branch in order to be able to do this incrementally and collaboratively. In order to progress rapidly with this, I would also like to go commit-then-review. Thanks, Gunther.
Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/#review22571 --- Can you also run all new tests with ant test -Dhadoop.mr.rev=23 to make sure we are getting right results. Else, you might need to add more columns in order-by columns. serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java https://reviews.apache.org/r/11925/#comment46291 I think determining schema from table definition should be default. There are multiple of determining schema. I think order should be: a) Try table definition. b) Try schema literal in properties. c) Try from hdfs. d) Try from url. serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java https://reviews.apache.org/r/11925/#comment46292 Any particular reason you made this synchronized ? serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java https://reviews.apache.org/r/11925/#comment46293 Have you tested this for both default db as well as non-default db? serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java https://reviews.apache.org/r/11925/#comment46294 Instead of \n, can you use File.Seperator? serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java https://reviews.apache.org/r/11925/#comment46296 Is this meant to be Array[tinyint] = bytes? serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java https://reviews.apache.org/r/11925/#comment46295 Lets take care of this TODO. Should be straight fwd. - Ashutosh Chauhan On June 18, 2013, 3:26 a.m., Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/ --- (Updated June 18, 2013, 3:26 a.m.) Review request for hive, Ashutosh Chauhan and Jakob Homan. Bugs: HIVE-3159 https://issues.apache.org/jira/browse/HIVE-3159 Repository: hive-git Description --- Problem: Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name. For multiple cases, it is very inconvenient for user. Some of the un-supported use cases: 1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE 2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE 3. Create table without specifying Avro schema. Diffs - ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/11925/diff/ Testing --- Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases. Thanks, Mohammad Islam
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/#review22698 --- ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/12050/#comment46400 I had quite a discussion on this with Rohini on HIVE-2936 on how to do these fs ops in a performant and robust way. Feel free to follow that. ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/12050/#comment46401 Why not use api instead of FsShell? Does filesystem doesn't offer an api for recursively doing chmod and chgrp? ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/12050/#comment46398 I think following way to write this is more robust: if(inheritPerms) { fs.mkdirs(destfp, destfp.getParent().getPerms); } else { fs.mkdirs(destfp); } ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/12050/#comment46399 same as previous comment. Couple of comments on api usage. - Ashutosh Chauhan On July 2, 2013, 4:39 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated July 2, 2013, 4:39 p.m.) Review request for hive. Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. Solution: After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 Diff: https://reviews.apache.org/r/12050/diff/ Testing --- The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. {code} hive dfs -ls -d /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive hive create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db hive use tester1; hive create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1 hive load data local inpath '/home/tester1/tst1.input' into table tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1
Re: Hive Jenkins Builds
I think we should disable 0.9 and 0.10 builds. But we should create 0.11 build, since that is our last release. Ashutosh On Wed, Jul 3, 2013 at 11:39 AM, Brock Noland br...@cloudera.com wrote: Hive has four builds that currently run: Hive-trunk-h0.21 (trunk on Hadoop 1.X) Hive-trunk-hadoop2 Hive-0.9.1-SNAPSHOT-h0.21 Hive-0.10.0-SNAPSHOT-h0.20.1 See https://builds.apache.org/user/brock/my-views/view/hive/ AFAIK there isn't active work on the 0.9.X and 0.10.X branches. Does anyone have an issue with disabling: Hive-0.9.1-SNAPSHOT-h0.21 Hive-0.10.0-SNAPSHOT-h0.20.1 Brock
Re: Tez branch and tez based patches
On Wed, Jul 17, 2013 at 1:41 PM, Edward Capriolo edlinuxg...@gmail.comwrote: In my opinion we should limit the amount of tez related optimizations to and trunk Refactoring that cleans up code is good, but as you have pointed out there wont be a tez release until sometime this fall, and this branch will be open for an extended period of time. Thus code cleanups and other tez related refactoring does not need to be disruptive to trunk. I agree Tez specific changes need not to go in trunk. But general refactoring and code cleanup needs to happen on trunk as and when someone is willing to work on those. We have to continually improve our code quality. Code maintainability and readability is a priority. Without that code quality suffers and discourages new contributors to contribute because code is unnecessarily complicated. SemanticAnalyzer is 11K line class. We need to simplify it. Patch like HIVE-4811 is a welcome change which tackled it. Exec package is all convoluted which mixes up runtime operators and drivers for runtime. Thats a welcome patch because it makes it much more easy to read and reason about that piece of code. HIVE-4825 is another example which improves modularity of code. For contributors who are exposed to Hive first time it will be easier for them to follow the code. Rather than disruptive to trunk, they are constructive for trunk and I am glad people are choosing to work on that. Tez or no Tez Hive is better off with these patches. Thanks, Ashutosh On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates ga...@hortonworks.com wrote: Answers to some of your questions inlined. Alan. On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote: There are some points I want to bring up. First, I am on the PMC. Here is something I find relevant: http://www.apache.org/foundation/how-it-works.html -- The role of the PMC from a Foundation perspective is oversight. The main role of the PMC is not code and not coding - but to ensure that all legal issues are addressed, that procedure is followed, and that each and every release is the product of the community as a whole. That is key to our litigation protection mechanisms. Secondly the role of the PMC is to further the long term development and health of the community as a whole, and to ensure that balanced and wide scale peer review and collaboration does happen. Within the ASF we worry about any community which centers around a few individuals who are working virtually uncontested. We believe that this is detrimental to quality, stability, and robustness of both code and long term social structures. https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different - All other decisions happen on the dev list, discussions on the private list are kept to a minimum. If it didn't happen on the dev list, it didn't happen - which leads to: a) Elections of committers and PMC members are published on the dev list once finalized. b) Out-of-band discussions (IRC etc.) are summarized on the dev list as soon as they have impact on the project, code or community. - https://issues.apache.org/jira/browse/HIVE-4660 ironically titled Let their be Tez has not be +1 ed by any committer. It was never discussed on the dev or the user list (as far as I can tell). As all JIRA creations and updates are sent to dev@hive, creating a JIRA is de facto posting to the list. As a PMC member I feel we need more discussion on Tez on the dev list along with a wiki-fied design document. Topics of discussion should include: I talked with Gunther and he's working on posting a design doc on the wiki. He has a PDF on the JIRA but he doesn't have write permissions yet on the wiki. 1) What is tez? In Hadoop 2.0, YARN opens up the ability to have multiple execution frameworks in Hadoop. Hadoop apps are no longer tied to MapReduce as the only execution option. Tez is an effort to build an execution engine that is optimized for relational data processing, such as Hive and Pig. The biggest change here is to move away from only Map and Reduce as processing options and to allow alternate combinations of processing, such as map - reduce - reduce or tasks that take multiple inputs or shuffles that avoid sorting when it isn't needed. For a good intro to Tez, see Arun's presentation on it at the recent Hadoop summit (video http://www.youtube.com/watch?v=9ZLLzlsz7h8 slides http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212) 2) How is tez different from oozie, http://code.google.com/p/hop/, http://cs.brown.edu/~backman/cmr.html , and other DAG and or streaming map reduce tools/frameworks? Why should we use this and
Re: Review Request 12767: [HIVE-4877] In ExecReducer, remove tag from the row which will be passed to the first Operator at the Reduce-side
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12767/#review23531 --- ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java https://reviews.apache.org/r/12767/#comment47421 Should we also add following in comment? .. and directly call process on children in process() method. One more : ) - Ashutosh Chauhan On July 19, 2013, 5 p.m., Yin Huai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12767/ --- (Updated July 19, 2013, 5 p.m.) Review request for hive. Bugs: HIVE-4877 https://issues.apache.org/jira/browse/HIVE-4877 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-4877 Diffs - data/files/kv1kv2.cogroup.txt 6d36e22 ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java 9898495 ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java d4be3d9 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ee76917 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java cbda70b ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java d12a53c ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6a74ae4 Diff: https://reviews.apache.org/r/12767/diff/ Testing --- Tests FailuresErrors Success rateTime 2688 2 0 99.93% 43249.945 Two failures are hbase_stats_empty_partition.q and ppd_key_ranges.q in TestHBaseCliDriver. I manually tested these two in my mac and tests passed. Thanks, Yin Huai
Re: Review Request 12767: [HIVE-4877] In ExecReducer, remove tag from the row which will be passed to the first Operator at the Reduce-side
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12767/#review23541 --- Ship it! Ship It! - Ashutosh Chauhan On July 19, 2013, 7:04 p.m., Yin Huai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12767/ --- (Updated July 19, 2013, 7:04 p.m.) Review request for hive. Bugs: HIVE-4877 https://issues.apache.org/jira/browse/HIVE-4877 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-4877 Diffs - data/files/kv1kv2.cogroup.txt 6d36e22 ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java 9898495 ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java d4be3d9 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ee76917 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java cbda70b ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java d12a53c ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6a74ae4 Diff: https://reviews.apache.org/r/12767/diff/ Testing --- Tests FailuresErrors Success rateTime 2688 2 0 99.93% 43249.945 Two failures are hbase_stats_empty_partition.q and ppd_key_ranges.q in TestHBaseCliDriver. I manually tested these two in my mac and tests passed. Thanks, Yin Huai
Re: Review Request 12767: [HIVE-4877] In ExecReducer, remove tag from the row which will be passed to the first Operator at the Reduce-side
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12767/#review23527 --- Good work, Yin! some minor comments. ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java https://reviews.apache.org/r/12767/#comment47408 I didn't get why there is an if check here? Can you add a comment explaining in which case we need not to update this childOIs map? ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java https://reviews.apache.org/r/12767/#comment47409 I think we should remove this if branch since its in inner loop of processing. We should put this check in initialization time of the Demux operator. Even if we cannot put it there, this will result in runtime exception which I think is fine. ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java https://reviews.apache.org/r/12767/#comment47410 Is this better wording for this comment: // Demux operator forwards a row to exactly one child in its children list based on the tag and newTagToChildIndex in process() method, so we need not to do anything in here. ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java https://reviews.apache.org/r/12767/#comment47414 Can you also add a line in comment saying, this key-val-tag structure is used by JoinOperator and Groupby operators to function correctly. ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java https://reviews.apache.org/r/12767/#comment47411 Same comment as in Demux operator. ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java https://reviews.apache.org/r/12767/#comment47417 I think this should read as: // remove the tag from key coming out of reducer and store it in separate variable. - Ashutosh Chauhan On July 19, 2013, 5 p.m., Yin Huai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12767/ --- (Updated July 19, 2013, 5 p.m.) Review request for hive. Bugs: HIVE-4877 https://issues.apache.org/jira/browse/HIVE-4877 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-4877 Diffs - data/files/kv1kv2.cogroup.txt 6d36e22 ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java 9898495 ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java d4be3d9 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ee76917 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java cbda70b ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java d12a53c ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6a74ae4 Diff: https://reviews.apache.org/r/12767/diff/ Testing --- Tests FailuresErrors Success rateTime 2688 2 0 99.93% 43249.945 Two failures are hbase_stats_empty_partition.q and ppd_key_ranges.q in TestHBaseCliDriver. I manually tested these two in my mac and tests passed. Thanks, Yin Huai
Re: Review Request 12690: HIVE-4870: Explain Extended to show partition info for Fetch Task
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12690/#review23654 --- Ship it! Ship It! - Ashutosh Chauhan On July 17, 2013, 5:14 p.m., John Pullokkaran wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12690/ --- (Updated July 17, 2013, 5:14 p.m.) Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch adds Partition Description info to Fetch Task. Diffs - ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 65c39d6 ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 0e8f96b ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out 42e25fa ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 47a8635 ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c39d057 ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out bd7381f ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out 6121722 ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e0cd848 ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 924fbad ql/src/test/results/clientpositive/bucketcontext_1.q.out 62910fb ql/src/test/results/clientpositive/bucketcontext_2.q.out 0857c9d ql/src/test/results/clientpositive/bucketcontext_3.q.out 69dc2b2 ql/src/test/results/clientpositive/bucketcontext_4.q.out 0d79901 ql/src/test/results/clientpositive/bucketcontext_7.q.out 19ea4fa ql/src/test/results/clientpositive/bucketcontext_8.q.out 9a7aaa0 ql/src/test/results/clientpositive/bucketmapjoin1.q.out 9f8552a ql/src/test/results/clientpositive/bucketmapjoin10.q.out 1a6bc06 ql/src/test/results/clientpositive/bucketmapjoin11.q.out bd9b1fe ql/src/test/results/clientpositive/bucketmapjoin12.q.out fc161a9 ql/src/test/results/clientpositive/bucketmapjoin13.q.out 30d8925 ql/src/test/results/clientpositive/bucketmapjoin2.q.out 7f3fb3e ql/src/test/results/clientpositive/bucketmapjoin3.q.out 913e925 ql/src/test/results/clientpositive/bucketmapjoin7.q.out 8105ba4 ql/src/test/results/clientpositive/bucketmapjoin8.q.out 92c74a9 ql/src/test/results/clientpositive/bucketmapjoin9.q.out b7aec66 ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out 1dd45d2 ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out 37f4a48 ql/src/test/results/clientpositive/join32.q.out 92d81b9 ql/src/test/results/clientpositive/join32_lessSize.q.out 82b3e4a ql/src/test/results/clientpositive/join33.q.out 92d81b9 ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out f6aae06 ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out dbce51a ql/src/test/results/clientpositive/stats11.q.out 57d2f9a ql/src/test/results/clientpositive/union22.q.out bec39f4 Diff: https://reviews.apache.org/r/12690/diff/ Testing --- All the hive unit tests passed. Thanks, John Pullokkaran
Re: Review Request 12705: HIVE-4878: With Dynamic partitioning, some queries would scan default partition even if query is not using it.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12705/#review23657 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java https://reviews.apache.org/r/12705/#comment47555 Why are we restricting this for strict mode? We should skip default partition in all cases unless explicitly requested by user. Assumption is default partition contains rows which were malformed in some ways at load times and will be excluded from all further query processing. - Ashutosh Chauhan On July 17, 2013, 10:19 p.m., John Pullokkaran wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12705/ --- (Updated July 17, 2013, 10:19 p.m.) Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- With Dynamic partitioning, Hive would scan default partitions in some cases even if query excludes it. As part of partition pruning, predicate is narrowed down to those pieces that involve partition columns only. This predicate is then evaluated with partition values to determine, if scan should include those partitions. But in some cases (like when comparing _HIVE_DEFAULT_PARTITION_ to numeric data types) expression evaluation would fail and would return NULL instead of true/false. In such cases the partition is added to unknown partitions which is then subsequently scanned. This fix avoids scanning default partition if all of the following is true: a) Hive dynamic partition mode is strict (hive.exec.dynamic.partition.mode=strict). b) partition pruning expression failed to evaluate for a given partition. c) at the least one of the columns in the partition is default partition. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 6a4a360 ql/src/test/queries/clientpositive/dynamic_partition_skip_default.q PRE-CREATION ql/src/test/results/clientpositive/dynamic_partition_skip_default.q.out PRE-CREATION Diff: https://reviews.apache.org/r/12705/diff/ Testing --- Hive Unit Tests Passed. Thanks, John Pullokkaran
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/#review23858 --- Ship it! Ship It! - Ashutosh Chauhan On July 19, 2013, 6:55 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated July 19, 2013, 6:55 p.m.) Review request for hive, Ashutosh Chauhan and Sushanth Sowmyan. Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. Solution: After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 87a584d Diff: https://reviews.apache.org/r/12050/diff/ Testing --- The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. {code} hive dfs -ls -d /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive hive create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db hive use tester1; hive create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1 hive load data local inpath '/home/tester1/tst1.input' into table tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input hive create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db/tst2 hive insert overwrite table tst2 select * from tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 /user/tester1/hive/tester1.db
Re: Adding to the hive contributor list
Done. Welcome Hari to the project. Thanks, Ashutosh On Wed, Aug 14, 2013 at 10:32 AM, Hari Subramaniyan hsubramani...@hortonworks.com wrote: Hi, I would like to get added to contributor list. Thanks Hari -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Adding WebHCat sub component to Hive project in ASF Jira
Done. Looking forward to contributions in that area! Thanks, Ashutosh On Fri, Aug 16, 2013 at 11:44 AM, Eugene Koifman ekoif...@hortonworks.comwrote: Hi, could somebody who has permissions to do so create WebHCat component under Hive? It will help track things. Thanks, Eugene -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Last time request for cwiki update privileges
Hi Sanjay, Really sorry for that. I apologize for the delay. You are added now. Feel free to make changes to make Hive even better! Thanks, Ashutosh On Tue, Aug 20, 2013 at 2:39 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: Hey guys I can only think of two reasons for my request is not yet accepted 1. The admins don't want to give me access 2. The admins have not seen my mail yet. This is the fourth and the LAST time I am requesting permission to edit wiki docs…Nobody likes being ignored and that includes me. Meanwhile to show my thankfulness to the Hive community I shall continue to answer questions .There will be no change in that behavior Regards sanjay From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com Date: Wednesday, August 14, 2013 3:52 PM To: u...@hive.apache.org u...@hive.apache.org Cc: dev@hive.apache.org dev@hive.apache.org Subject: Re: Review Request (wikidoc): LZO Compression in Hive Once again, I am down on my knees humbling calling upon the Hive Jedi Masters to please provide this paadwaan with cwiki update privileges May the Force be with u Thanks sanjay From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com Reply-To: u...@hive.apache.org u...@hive.apache.org Date: Wednesday, July 31, 2013 9:38 AM To: u...@hive.apache.org u...@hive.apache.org Cc: dev@hive.apache.org dev@hive.apache.org Subject: Re: Review Request (wikidoc): LZO Compression in Hive Hi guys Any chance I could get cwiki update privileges today ? Thanks sanjay From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com Date: Tuesday, July 30, 2013 4:26 PM To: u...@hive.apache.org u...@hive.apache.org Cc: dev@hive.apache.org dev@hive.apache.org Subject: Review Request (wikidoc): LZO Compression in Hive Hi Met with Lefty this afternoon and she was kind to spend time to add my documentation to the site - since I still don't have editing privileges :-) Please review the new wikidoc about LZO compression in the Hive language manual. If anything is unclear or needs more information, you can email suggestions to this list or edit the wiki yourself (if you have editing privileges). Here are the links: 1. Language Manualhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual (new bullet under File Formats) 2. LZO Compressionhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO 3. CREATE TABLEhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable (near end of section, pasted in here:) Use STORED AS TEXTFILE if the data needs to be stored as plain text files. Use STORED AS SEQUENCEFILE if the data needs to be compressed. Please read more about CompressedStoragehttps://cwiki.apache.org/confluence/display/Hive/CompressedStorage if you are planning to keep data compressed in your Hive tables. Use INPUTFORMAT and OUTPUTFORMAT to specify the name of a corresponding InputFormat and OutputFormat class as a string literal, e.g., 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. For LZO compression, the values to use are 'INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' (see LZO Compressionhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO ). My cwiki id is https://cwiki.apache.org/confluence/display/~sanjaysubraman...@yahoo.com It will be great if I could get edit privileges Thanks sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator. CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Last time request for cwiki update privileges
Hey Mikhail, Sure. Whats ur cwiki id? Thanks, Ashutosh On Wed, Aug 21, 2013 at 1:58 PM, Mikhail Antonov olorinb...@gmail.comwrote: Can I also get the edit privilege for wiki please? I'd like to add some details about LDAP authentication.. Mikhail 2013/8/21 Stephen Sprague sprag...@gmail.com Sanjay gets some love after all! :) On Tue, Aug 20, 2013 at 4:00 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: Thanks Ashutosh From: Ashutosh Chauhan hashut...@apache.orgmailto:hashut...@apache.org Reply-To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Date: Tuesday, August 20, 2013 3:13 PM To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Cc: dev@hive.apache.orgmailto:dev@hive.apache.org dev@hive.apache.orgmailto:dev@hive.apache.org Subject: Re: Last time request for cwiki update privileges Hi Sanjay, Really sorry for that. I apologize for the delay. You are added now. Feel free to make changes to make Hive even better! Thanks, Ashutosh On Tue, Aug 20, 2013 at 2:39 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto: sanjay.subraman...@wizecommerce.com wrote: Hey guys I can only think of two reasons for my request is not yet accepted 1. The admins don't want to give me access 2. The admins have not seen my mail yet. This is the fourth and the LAST time I am requesting permission to edit wiki docs…Nobody likes being ignored and that includes me. Meanwhile to show my thankfulness to the Hive community I shall continue to answer questions .There will be no change in that behavior Regards sanjay From: Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto: sanjay.subraman...@wizecommerce.com Date: Wednesday, August 14, 2013 3:52 PM To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Cc: dev@hive.apache.orgmailto:dev@hive.apache.org dev@hive.apache.orgmailto:dev@hive.apache.org Subject: Re: Review Request (wikidoc): LZO Compression in Hive Once again, I am down on my knees humbling calling upon the Hive Jedi Masters to please provide this paadwaan with cwiki update privileges May the Force be with u Thanks sanjay From: Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto: sanjay.subraman...@wizecommerce.com Reply-To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Date: Wednesday, July 31, 2013 9:38 AM To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Cc: dev@hive.apache.orgmailto:dev@hive.apache.org dev@hive.apache.orgmailto:dev@hive.apache.org Subject: Re: Review Request (wikidoc): LZO Compression in Hive Hi guys Any chance I could get cwiki update privileges today ? Thanks sanjay From: Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto: sanjay.subraman...@wizecommerce.com Date: Tuesday, July 30, 2013 4:26 PM To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Cc: dev@hive.apache.orgmailto:dev@hive.apache.org dev@hive.apache.orgmailto:dev@hive.apache.org Subject: Review Request (wikidoc): LZO Compression in Hive Hi Met with Lefty this afternoon and she was kind to spend time to add my documentation to the site - since I still don't have editing privileges :-) Please review the new wikidoc about LZO compression in the Hive language manual. If anything is unclear or needs more information, you can email suggestions to this list or edit the wiki yourself (if you have editing privileges). Here are the links: 1. Language Manual https://cwiki.apache.org/confluence/display/Hive/LanguageManual (new bullet under File Formats) 2. LZO Compression https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO 3. CREATE TABLE https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable (near end of section, pasted in here:) Use STORED AS TEXTFILE if the data needs to be stored as plain text files. Use STORED AS SEQUENCEFILE if the data needs to be compressed. Please read more about CompressedStorage https://cwiki.apache.org/confluence/display/Hive/CompressedStorage if you are planning to keep data compressed in your Hive tables. Use INPUTFORMAT and OUTPUTFORMAT to specify the name of a corresponding InputFormat and OutputFormat class as a string literal, e.g., 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. For LZO compression, the values to use are 'INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' (see LZO Compression https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO). My cwiki id is https://cwiki.apache.org/confluence/display/~sanjaysubraman
Re: Last time request for cwiki update privileges
Not able to find this id in cwiki. Did you create an account on cwiki.apache.org On Wed, Aug 21, 2013 at 2:59 PM, Mikhail Antonov olorinb...@gmail.comwrote: mantonov
Re: LIKE filter pushdown for tables and partitions
Couple of questions: 1. What about LIKE operator for Hive itself? Will that continue to work (presumably because there is an alternative path for that). 2. This will nonetheless break other direct consumers of metastore client api (like HCatalog). I see your point that we have a buggy implementation, so whats out there is not safe to use. Question than really is shall we remove this code, thereby breaking people for whom current buggy implementation is good enough (or you can say salvage them from breaking in future). Or shall we try to fix it now? My take is if there are no users of this anyways, then there is no point fixing it for non-existing users, but if there are we probably have to. I will suggest you to send an email to users@hive to ask if there are users for this. Thanks, Ashutosh On Mon, Aug 26, 2013 at 2:08 PM, Sergey Shelukhin ser...@hortonworks.comwrote: Since there's no response I am assuming nobody cares about this code... Jira is HIVE-5134, I will attach a patch with removal this week. On Wed, Aug 21, 2013 at 2:28 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Hi. I think there are issues with the way hive can currently do LIKE operator JDO pushdown and it the code should be removed for partitions and tables. Are there objections to removing LIKE from Filter.g and related areas? If no I will file a JIRA and do it. Details: There's code in metastore that is capable of pushing down LIKE expression into JDO for string partition keys, as well as tables. The code for tables doesn't appear used, and partition code definitely doesn't run in Hive proper because metastore client doesn't send LIKE expressions to server. It may be used in e.g. HCat and other places, but after asking some people here, I found out it probably isn't. I was trying to make it run and noticed some problems: 1) For partitions, Hive sends SQL patterns in a filter for like, e.g. %foo%, whereas metastore passes them into matches() JDOQL method which expects Java regex. 2) Converting the pattern to Java regex via UDFLike method, I found out that not all regexes appear to work in DN. .*foo seems to work but anything complex (such as escaping the pattern using Pattern.quote, which UDFLike does) breaks and no longer matches properly. 3) I tried to implement common cases using JDO methods startsWith/endsWith/indexOf (I will file a JIRA), but when I run tests on Derby, they also appear to have problems with some strings (for example, partition with backslash in the name cannot be matched by LIKE %\% (single backslash in a string), after being converted to .indexOf(param) where param is \ (escaping the backslash once again doesn't work either, and anyway there's no documented reason why it shouldn't work properly), while other characters match correctly, even e.g. %. For tables, there's no SQL-like, it expects Java regex, but I am not convinced all Java regexes are going to work. So, I think that for future correctness sake it's better to remove this code. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the wrong partitioning columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13862/#review25818 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java https://reviews.apache.org/r/13862/#comment50365 else { throw new SemanticException(Not able to correctly identify partitioning columns. Hint: Try hive.optimize.reducededuplication=false; );} Thanks for adding comments! - Ashutosh Chauhan On Aug. 30, 2013, 3:29 p.m., Yin Huai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13862/ --- (Updated Aug. 30, 2013, 3:29 p.m.) Review request for hive. Bugs: HIVE-5149 https://issues.apache.org/jira/browse/HIVE-5149 Repository: hive-git Description --- https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb Diff: https://reviews.apache.org/r/13862/diff/ Testing --- Thanks, Yin Huai
Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the wrong partitioning columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13862/#review25819 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java https://reviews.apache.org/r/13862/#comment50366 In here. if(result[0] = 0) throw new SemanticException(Sort columns and order don't match. Try hive.optimize.reducesinkdeduplication=false;); Another sanity check. - Ashutosh Chauhan On Aug. 30, 2013, 3:29 p.m., Yin Huai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13862/ --- (Updated Aug. 30, 2013, 3:29 p.m.) Review request for hive. Bugs: HIVE-5149 https://issues.apache.org/jira/browse/HIVE-5149 Repository: hive-git Description --- https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb Diff: https://reviews.apache.org/r/13862/diff/ Testing --- Thanks, Yin Huai