[jira] [Created] (HIVE-6613) Control when spcific Inputs / Outputs are started
Siddharth Seth created HIVE-6613: Summary: Control when spcific Inputs / Outputs are started Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 18179: Support more generic way of using composite key for HBaseHandler
On March 10, 2014, 10:22 p.m., Swarnim Kulkarni wrote: hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java, line 100 https://reviews.apache.org/r/18179/diff/4/?file=513316#file513316line100 I think we are restricting the capability here by limiting the type of filters that can be provided. This might be difficult to evolve and is less dynamic as consumers cannot plugin their custom filters. Also the toByteArray() method on line 60 for filter worries me a little bit a custom filter implementation might not have it properly implemented. Any reason we cannot get rid of this FilterDesc class completely and instead just have ListFilter instead of ListFilterDesc on line 41? Currently, ExprNodeDesc for predicates are serialized into a string and conveyed to InputFormat#getSplits and there is no other way to provide custom objects to InputFormat except using thread local. I didn't wanted to add one more thread local so I decided to serialized Filters. But Filter is not serializable instance. I didn't tried to serialize it with kryo, but felt not a safe way. I might extend this part more generic (serialize FQCN of filter and find static parseFrom() method in the class to instantiate the Filter). On March 10, 2014, 10:22 p.m., Swarnim Kulkarni wrote: hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKey.java, line 116 https://reviews.apache.org/r/18179/diff/4/?file=513313#file513313line116 One of my primary motivation behind adding this here was to provide the capability to consumers to be able to plugin any custom filter implementation down to the hbase scan and tieing it with the custom key implementation looked like an apt place to do so. TL;DR I am fine with removing this as long as the factory implementation can support a similar capability. Thanks. Then I can do some more refactoring. - Navis --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18179/#review36711 --- On March 7, 2014, 7:46 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18179/ --- (Updated March 7, 2014, 7:46 a.m.) Review request for hive. Bugs: HIVE-6411 https://issues.apache.org/jira/browse/HIVE-6411 Repository: hive-git Description --- HIVE-2599 introduced using custom object for the row key. But it forces key objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If user provides proper Object and OI, we can replace internal key and keyOI with those. Initial implementation is based on factory interface. {code} public interface HBaseKeyFactory { void init(SerDeParameters parameters, Properties properties) throws SerDeException; ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException; LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException; } {code} Diffs - hbase-handler/pom.xml 132af43 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKey.java 5008f15 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseLazyObjectFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 2cd65cb hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 29e5da5 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseWritableKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 704fcb9 hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java fc40195 hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestCompositeKey.java 13c344b hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java PRE-CREATION hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java PRE-CREATION hbase-handler/src/test/queries/positive/hbase_custom_key.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_custom_key2.q PRE-CREATION hbase-handler/src/test/results/positive/hbase_custom_key.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_custom_key2.q.out PRE-CREATION itests/util/pom.xml e9720df ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java b966d33
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-6613: - Attachment: TEZ-6613.1.txt Patch to make the changes mentioned in the description. Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-6613: - Status: Patch Available (was: Open) Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5931) SQL std auth - add metastore get_principals_in_role api, support SHOW ROLE PRINCIPALS
[ https://issues.apache.org/jira/browse/HIVE-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929892#comment-13929892 ] Ashutosh Chauhan commented on HIVE-5931: Patch looks good. I have a minor comment on RB. Although, I think following syntax is better : {code} SHOW PRINCIPALS role_name; {code} having role there is redundant. SQL std auth - add metastore get_principals_in_role api, support SHOW ROLE PRINCIPALS - Key: HIVE-5931 URL: https://issues.apache.org/jira/browse/HIVE-5931 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Attachments: HIVE-5931.1.patch, HIVE-5931.nothrifgen.1.patch, HIVE-5931.thriftapi.2.patch, HIVE-5931.thriftapi.3.patch, HIVE-5931.thriftapi.followup.patch, HIVE-5931.thriftapi.patch Original Estimate: 24h Remaining Estimate: 24h Support command for listing all members of a role. A new metastore api call also needs to be added for this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6611) Joining multiple union all outputs fails on Tez
[ https://issues.apache.org/jira/browse/HIVE-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929916#comment-13929916 ] Vikram Dixit K commented on HIVE-6611: -- LGTM +1. Joining multiple union all outputs fails on Tez --- Key: HIVE-6611 URL: https://issues.apache.org/jira/browse/HIVE-6611 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Critical Attachments: HIVE-6611.1.patch Queries like: with u as (select * from src union all select * from src) select * from u join u; will fail on Tez because only one union flows into the join reduce phase. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Tez Branch
I've been wondering too. -- Lefty On Mon, Mar 10, 2014 at 9:51 PM, Brock Noland br...@cloudera.com wrote: I noticed that patches are still being committed to the tez branch. I am just curious what the plans on there? No real reason, just curious since we did one merge to trunk from that branch so I assumed development would be occurring on trunk. Brock
Re: Review Request 18179: Support more generic way of using composite key for HBaseHandler
On March 10, 2014, 9:25 p.m., Xuefu Zhang wrote: hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 730 https://reviews.apache.org/r/18179/diff/5/?file=513392#file513392line730 Can we define serielize() interface in HBaseKeyFactory, move the existing implementation here to HBaseCompositeKeyFactory? Serialize() seems seems generic enough to expect from all key factories. Doing this will eliminate HBaseWritableKeyFactory and use of the class to detect what method to call. If the default serialization can be done by simple decent method call, I would have done like that. But current implementation needs seven argument for that(+serdeParams), which made me think twice of it. byte[] serialize( int i, ListColumnMapping mapping, List? extends StructField fields, ListObject list, List? extends StructField declaredFields, boolean useJSONSerialize, ByteStream.Output serializeStream) throws IOException; - Navis --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18179/#review36688 --- On March 7, 2014, 7:46 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18179/ --- (Updated March 7, 2014, 7:46 a.m.) Review request for hive. Bugs: HIVE-6411 https://issues.apache.org/jira/browse/HIVE-6411 Repository: hive-git Description --- HIVE-2599 introduced using custom object for the row key. But it forces key objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If user provides proper Object and OI, we can replace internal key and keyOI with those. Initial implementation is based on factory interface. {code} public interface HBaseKeyFactory { void init(SerDeParameters parameters, Properties properties) throws SerDeException; ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException; LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException; } {code} Diffs - hbase-handler/pom.xml 132af43 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKey.java 5008f15 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseLazyObjectFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 2cd65cb hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 29e5da5 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseWritableKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 704fcb9 hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java fc40195 hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestCompositeKey.java 13c344b hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java PRE-CREATION hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java PRE-CREATION hbase-handler/src/test/queries/positive/hbase_custom_key.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_custom_key2.q PRE-CREATION hbase-handler/src/test/results/positive/hbase_custom_key.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_custom_key2.q.out PRE-CREATION itests/util/pom.xml e9720df ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java b966d33 ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java d39ee2e ql/src/java/org/apache/hadoop/hive/ql/index/IndexSearchCondition.java 5f1329c ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 647a9a6 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStoragePredicateHandler.java 9f35575 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java e50026b ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 10bae4d ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 serde/src/java/org/apache/hadoop/hive/serde2/StructObject.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/StructObjectBaseInspector.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java 1fd6853 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 10f4c05 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java
[jira] [Updated] (HIVE-6447) Bucket map joins in hive-tez
[ https://issues.apache.org/jira/browse/HIVE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-6447: - Status: Open (was: Patch Available) Bucket map joins in hive-tez Key: HIVE-6447 URL: https://issues.apache.org/jira/browse/HIVE-6447 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-6447.1.patch, HIVE-6447.WIP.patch Support bucket map joins in tez. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6447) Bucket map joins in hive-tez
[ https://issues.apache.org/jira/browse/HIVE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-6447: - Attachment: HIVE-6447.2.patch Updated golden test file results and minor changes. Bucket map joins in hive-tez Key: HIVE-6447 URL: https://issues.apache.org/jira/browse/HIVE-6447 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-6447.1.patch, HIVE-6447.2.patch, HIVE-6447.WIP.patch Support bucket map joins in tez. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6447) Bucket map joins in hive-tez
[ https://issues.apache.org/jira/browse/HIVE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-6447: - Status: Patch Available (was: Open) Bucket map joins in hive-tez Key: HIVE-6447 URL: https://issues.apache.org/jira/browse/HIVE-6447 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-6447.1.patch, HIVE-6447.2.patch, HIVE-6447.WIP.patch Support bucket map joins in tez. -- This message was sent by Atlassian JIRA (v6.2#6252)