Re: Questions related to HBase general use
+ hive-dev Thanks for your question. We recently have been busy adding quite a few features on top on Hive/HBase Integration to make it more stable and easy to use. We also did a talk very recently at HBaseCon 2015 showing off the latest improvements. Slides here[1]. Like Jerry mentioned, if you run a regular query from Hive on an HBase table with billions of rows, it is going to be slow as it would trigger a full table scan. However, Hive has smarts around filter pushdown where the attributes in a where clause are pushed down and converted to scan ranges and filters to optimize the scan. Plus with the recent Hive On Spark uplift, I see this integration take benefit of that as well. That said, we here use this integration daily over billions of rows to run hundreds of queries without any issues. Since you mentioned that you are a already a big consumer of Hive, I would highly recommend to give this a spin and report back with whatever issues you face so we can work on making this more stable. Hope that helps. Swarnim [1] https://docs.google.com/presentation/d/1K2A2NMsNbmKWuG02aUDxsLo0Lal0lhznYy8SB6HjC9U/edit#slide=id.p On Wed, May 13, 2015 at 6:26 PM, Nick Dimiduk ndimi...@gmail.com wrote: + Swarnim, who's expert on HBase/Hive integration. Yes, snapshots may be interesting for you. I believe Hive can access HBase timestamps, exposed as a virtual column. It's assumed across there whole row however, not per cell. On Sun, May 10, 2015 at 9:14 PM, Jerry He jerry...@gmail.com wrote: Hi, Yong You have a good understanding of the benefit of HBase already. Generally speaking, HBase is suitable for real time read/write to your big data set. Regarding the HBase performance evaluation tool, the 'read' test use HBase 'get'. For 1m rows, the test would issue 1m 'get' (and RPC) to the server. The 'scan' test scans the table and transfers the rows to the client in batches (e.g. 100 rows at a time), which will take shorter time for the whole test to complete for the same number of rows. The hive/hbase integration, as you said, needs more consideration. 1) The performance. Hive access HBase via HBase client API, which involves going to the HBase server for all the data access. This will slow things down. There are a couple of things you can explore. e.g. Hive/HBase snapshot integration. This would provide direct access to HBase hfiles. 2) In your email, you are interested in HBase's capability of storing multiple versions of data. You need to consider if Hive supports this HBase feature. i.e provide you access to multi versions. As I can remember, it is not fully. Jerry On Thu, May 7, 2015 at 6:18 PM, java8964 java8...@hotmail.com wrote: Hi, I am kind of new to HBase. Currently our production run IBM BigInsight V3, comes with Hadoop 2.2 and HBase 0.96.0. We are mostly using HDFS and Hive/Pig for our BigData project, it works very good for our big datasets. Right now, we have a one dataset needs to be loaded from Mysql, about 100G, and will have about Gs change daily. This is a very important slow change dimension data, we like to sync between Mysql and BigData platform. I am thinking of using HBase to store it, instead of refreshing the whole dataset in HDFS, due to: 1) HBase makes the merge the change very easy.2) HBase could store all the changes in the history, as a function out of box. We will replicate all the changes from the binlog level from Mysql, and we could keep all changes in HBase (or long history), then it can give us some insight that cannot be done easily in HDFS.3) HBase could give us the benefit to access the data by key fast, for some cases.4) HBase is available out of box. What I am not sure is the Hive/HBase integration. Hive is the top tool in our environment. If one dataset stored in Hbase (even only about 100G as now), the join between it with the other Big datasets in HDFS worries me. I read quite some information about Hive/HBase integration, and feel that it is not really mature, as not too many usage cases I can find online, especially on performance. There are quite some JIRAs related to make Hive utilize the HBase for performance in MR job are still pending. I want to know other people experience to use HBase in this way. I understand HBase is not designed as a storage system for Data Warehouse component or analytics engine. But the benefits to use HBase in this case still attractive me. If my use cases of HBase is mostly read or full scan the data, how bad it is compared to HDFS in the same cluster? 3x? 5x? To help me understand the read throughput of HBase, I use the HBase performance evaluation tool, but the output is quite confusing. I have 2 clusters, one is with 5 nodes with 3 slaves all running on VM (Each with 24G + 4 cores, so cluster has 12 mappers + 6 reducers), another is real cluster with 5 nodes with 3 slaves with 64G + 24 cores and with (48 mapper
Re: Review Request 34197: HIVE-10706 Make vectorized_timestamp_funcs test more stable
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34197/#review83782 --- Ship it! Ship It! - Swarnim Kulkarni On May 14, 2015, 6:28 a.m., Alexander Pivovarov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34197/ --- (Updated May 14, 2015, 6:28 a.m.) Review request for hive and Jason Dere. Bugs: HIVE-10706 https://issues.apache.org/jira/browse/HIVE-10706 Repository: hive-git Description --- HIVE-10706 Make vectorized_timestamp_funcs test more stable Diffs - ql/src/test/queries/clientpositive/vectorized_timestamp_funcs.q 8a2d5aaf5fb0396e551bdefdde507d1e9902919b ql/src/test/results/clientpositive/spark/vectorized_timestamp_funcs.q.out 304458215b4dcbc4d49321ba5f14ca5a87f2ec26 ql/src/test/results/clientpositive/tez/vectorized_timestamp_funcs.q.out fa3ed21232004d710b33cadac66680eabaca2c8a ql/src/test/results/clientpositive/vectorized_timestamp_funcs.q.out 31a96c68b22bd5332fb71b52982de71710df65fa Diff: https://reviews.apache.org/r/34197/diff/ Testing --- Thanks, Alexander Pivovarov
[jira] [Created] (HIVE-10709) Update Avro version to 1.7.7
Swarnim Kulkarni created HIVE-10709: --- Summary: Update Avro version to 1.7.7 Key: HIVE-10709 URL: https://issues.apache.org/jira/browse/HIVE-10709 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni We should update the avro version to 1.7.7 to consumer some of the nicer compatibility features. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34143: Fix stats annotation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34143/ --- (Updated May 14, 2015, 4:50 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Changes --- update the failing q tests. Repository: hive-git Description --- This is a umbrella patch for a bunch of issues: HIVE-8769 Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected) HIVE-9392 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName HIVE-10107 Union All : Vertex missing stats resulting in OOM and in-efficient plans Diffs (updated) - hbase-handler/src/test/results/positive/external_table_ppd.q.out 6d48edb hbase-handler/src/test/results/positive/hbase_custom_key2.q.out c9b5a84 hbase-handler/src/test/results/positive/hbase_custom_key3.q.out 76848e0 hbase-handler/src/test/results/positive/hbase_ppd_key_range.q.out 6174bfb hbase-handler/src/test/results/positive/hbase_pushdown.q.out 8a979bf hbase-handler/src/test/results/positive/hbase_queries.q.out 7863f69 hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3aae7d0 hbase-handler/src/test/results/positive/ppd_key_ranges.q.out 5936735 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java 0de7488 ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java 44269f0 ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractOperatorDesc.java 0a83440 ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java c420190 ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java f66279f ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 508d880 ql/src/test/results/clientpositive/annotate_stats_filter.q.out e8cd06d ql/src/test/results/clientpositive/annotate_stats_limit.q.out 5f8b6f8 ql/src/test/results/clientpositive/annotate_stats_part.q.out 241192b ql/src/test/results/clientpositive/annotate_stats_select.q.out 753ab4e ql/src/test/results/clientpositive/annotate_stats_table.q.out 9bf82ac ql/src/test/results/clientpositive/auto_join30.q.out b068493 ql/src/test/results/clientpositive/auto_join31.q.out 1e19dd0 ql/src/test/results/clientpositive/auto_join32.q.out bfc8be8 ql/src/test/results/clientpositive/auto_join_stats.q.out 9100762 ql/src/test/results/clientpositive/auto_join_stats2.q.out ed09875 ql/src/test/results/clientpositive/auto_join_without_localtask.q.out ce4ad8a ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 383defd ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out e9fb705 ql/src/test/results/clientpositive/auto_sortmerge_join_14.q.out 43504d8 ql/src/test/results/clientpositive/auto_sortmerge_join_15.q.out afd5518 ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c089419 ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 6e443fa ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out feaea04 ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out f64ecf0 ql/src/test/results/clientpositive/auto_sortmerge_join_6.q.out f039dda ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e89f548 ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 44c037f ql/src/test/results/clientpositive/auto_sortmerge_join_9.q.out 65aa3ef ql/src/test/results/clientpositive/binarysortable_1.q.out c4ba7e0 ql/src/test/results/clientpositive/bucket_map_join_1.q.out d778203 ql/src/test/results/clientpositive/bucket_map_join_2.q.out aef77aa ql/src/test/results/clientpositive/bucketmapjoin1.q.out 72f2a07 ql/src/test/results/clientpositive/bucketsortoptimize_insert_2.q.out eec099c ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out 1a644a9 ql/src/test/results/clientpositive/bucketsortoptimize_insert_5.q.out e4f90e4 ql/src/test/results/clientpositive/bucketsortoptimize_insert_6.q.out 307c83b ql/src/test/results/clientpositive/column_access_stats.q.out a779564 ql/src/test/results/clientpositive/complex_alias.q.out 133ce91 ql/src/test/results/clientpositive/correlationoptimizer1.q.out 0eb1596 ql/src/test/results/clientpositive/correlationoptimizer10.q.out 3c3564d ql/src/test/results/clientpositive/correlationoptimizer11.q.out bd86942 ql/src/test/results/clientpositive/correlationoptimizer15.q.out b57203e ql/src/test/results/clientpositive/correlationoptimizer2.q.out 43d209f ql/src/test/results/clientpositive/correlationoptimizer3.q.out 5389647 ql/src/test/results/clientpositive/correlationoptimizer4.q.out b350816 ql/src/test/results/clientpositive/correlationoptimizer5.q.out 6ba3462 ql/src/test/results/clientpositive/correlationoptimizer6.q.out be518dc
[jira] [Created] (HIVE-10710) Delete GenericUDF.getConstantLongValue
Alexander Pivovarov created HIVE-10710: -- Summary: Delete GenericUDF.getConstantLongValue Key: HIVE-10710 URL: https://issues.apache.org/jira/browse/HIVE-10710 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Trivial GenericUDF.getConstantLongValue has a bug. Instead of fixing the bug it was suggested to delete the method because it is not used in hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Supporting Hadoop-1 and experimental features
Hi, +1 on the idea. Having a stable release branch with ongoing fixes where we do not drop major features would be good all around. It lets us accelerate the pace of development, drop major features or rewrite them entirely without dragging everyone else kicking screaming into that release. Cheers, Gopal On 5/11/15, 7:17 PM, Sergey Shelukhin ser...@hortonworks.com wrote: That sounds like a good idea. Some features could be back ported to branch-1 if viable, but at least new stuff would not be burdened by Hadoop 1/MR code paths. Probably also a good place to enable vectorization and other perf features by default while we make alpha releases. +1 On 15/5/11, 15:38, Alan Gates alanfga...@gmail.com wrote: There is a lot of forward-looking work going on in various branches of Hive: LLAP, the HBase metastore, and the work to drop the CLI. It would be good to have a way to release this code to users so that they can experiment with it. Releasing it will also provide feedback to developers. At the same time there are discussions on whether to keep supporting Hadoop-1. The burden of supporting older, less used functionality such as Hadoop-1 is becoming ever harder as many new features are added. I propose that the best way to deal with this would be to make a branch-1. We could continue to make new feature releases off of this branch (1.3, 1.4, etc.). This branch would not drop old functionality. This provides stability and continuity for users and developers. We could then merge these new features branches (LLAP, HBase metastore, CLI drop) into the trunk, as well as turn on by default newer features such as the vectorization and ACID. We could also drop older, less used features such as support for Hadoop-1 and MapReduce. It will be a while before we are ready to make stable, production ready releases of this code. But we could start making alpha quality releases soon. We would call these releases 2.x, to stress the non-backward compatible changes such as dropping Hadoop-1. This will give users a chance to play with the new code and developers a chance to get feedback. Thoughts?
Re: JIRA notifications
@Swarnim.. Generating patch with git diff needs to include the full index for it to be uploaded to review board. “git diff —full-index”. https://code.google.com/p/reviewboard/issues/detail?id=3115 - Prasanth On May 14, 2015, at 9:14 AM, Thejas Nair thejas.n...@gmail.com wrote: Now that we have moved to git, you can try using github pull request instead. It also integrates with jira. More git instructions - http://accumulo.apache.org/git.html On Thu, May 14, 2015 at 8:01 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Also not sure if it's related but seems like RB has been pretty sluggish lately too for me. It takes forever for a patch to submitted and a review request created(the latest one is still running for past 30 minutes with no output) On Wed, May 13, 2015 at 4:26 PM, Lefty Leverenz leftylever...@gmail.com wrote: By the way, we still need to add iss...@hive.apache.org to the website's Mailing Lists http://hive.apache.org/mailing_lists.html page -- see HIVE-10124 https://issues.apache.org/jira/browse/HIVE-10124. -- Lefty On Wed, May 13, 2015 at 2:16 PM, Lefty Leverenz leftylever...@gmail.com wrote: But some notifications and comments aren't making it onto any Hive mailing list -- see INFRA-9221 https://issues.apache.org/jira/browse/INFRA-9221 (please add your own comments and examples). This means the mail archives don't have a complete record of JIRA activity. -- Lefty On Wed, May 13, 2015 at 10:03 AM, Thejas Nair thejas.n...@gmail.com wrote: comments now added go to iss...@hive.apache.org . emails for JIRAs created should still go to dev@ On Wed, May 13, 2015 at 9:25 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: I noticed that I haven't been getting notifications(or they are really delayed) on any of the new JIRAs created/ comments added. Anyone else noticing similar issues as well? -- Swarnim -- Swarnim
Re: [DISCUSS] Hive API passivity
By passivity do you mean backward compatibility ? Not all API's have same level of maturity, and the audience for them can also be different. Public api's are supposed to be marked with the annotations under org.apache.hadoop.hive.common.classification.InterfaceAudience as Public, and the expectations regarding backward compatibility set using InterfaceStability annotations. For example, the UDF apis should be marked as @Public and @Stable. However, api's for new functionality might be marked @unstable or @evolving. On Thu, May 14, 2015 at 9:19 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: While reviewing some of the recent patches, I came across a few with non-passive changes and or discussion around them. I was wondering what kind of passivity guarantees should we provide to our consumers? I understand that Hive API is probably not as widely used as some of its peers in the ecosystem like HBase. But should that be something we should start thinking on especially around user facing interfaces like UDFs, SerDes, StorageHandlers etc? More so given that we are 1.0 now? IMO we should avoid doing any of such changes and/or if we have to do so with a major version bump for the next release. Thoughts? -- Swarnim
Review Request 34223: HIVE-10710 Delete GenericUDF.getConstantLongValue
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34223/ --- Review request for hive, Ashutosh Chauhan and Jason Dere. Bugs: HIVE-10710 https://issues.apache.org/jira/browse/HIVE-10710 Repository: hive-git Description --- HIVE-10710 Delete GenericUDF.getConstantLongValue Diffs - ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java b043bdc882af7c0b83787526a5a55c9dc29c6681 Diff: https://reviews.apache.org/r/34223/diff/ Testing --- Thanks, Alexander Pivovarov
Re: JIRA notifications
You can use the following command to create new review. It takes about 3-5 sec $ rbt post -g yes To update the review you can run. $ rbt post -u -g yes On Thu, May 14, 2015 at 10:48 AM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: @Swarnim.. Generating patch with git diff needs to include the full index for it to be uploaded to review board. “git diff —full-index”. https://code.google.com/p/reviewboard/issues/detail?id=3115 - Prasanth On May 14, 2015, at 9:14 AM, Thejas Nair thejas.n...@gmail.com wrote: Now that we have moved to git, you can try using github pull request instead. It also integrates with jira. More git instructions - http://accumulo.apache.org/git.html On Thu, May 14, 2015 at 8:01 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Also not sure if it's related but seems like RB has been pretty sluggish lately too for me. It takes forever for a patch to submitted and a review request created(the latest one is still running for past 30 minutes with no output) On Wed, May 13, 2015 at 4:26 PM, Lefty Leverenz leftylever...@gmail.com wrote: By the way, we still need to add iss...@hive.apache.org to the website's Mailing Lists http://hive.apache.org/mailing_lists.html page -- see HIVE-10124 https://issues.apache.org/jira/browse/HIVE-10124. -- Lefty On Wed, May 13, 2015 at 2:16 PM, Lefty Leverenz leftylever...@gmail.com wrote: But some notifications and comments aren't making it onto any Hive mailing list -- see INFRA-9221 https://issues.apache.org/jira/browse/INFRA-9221 (please add your own comments and examples). This means the mail archives don't have a complete record of JIRA activity. -- Lefty On Wed, May 13, 2015 at 10:03 AM, Thejas Nair thejas.n...@gmail.com wrote: comments now added go to iss...@hive.apache.org . emails for JIRAs created should still go to dev@ On Wed, May 13, 2015 at 9:25 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: I noticed that I haven't been getting notifications(or they are really delayed) on any of the new JIRAs created/ comments added. Anyone else noticing similar issues as well? -- Swarnim -- Swarnim
[jira] [Created] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
Jason Dere created HIVE-10711: - Summary: Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem Key: HIVE-10711 URL: https://issues.apache.org/jira/browse/HIVE-10711 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Tez HashTableLoader bases its memory allocation on HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the process max memory then this can result in the HashTableLoader trying to use more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: JIRA notifications
Also not sure if it's related but seems like RB has been pretty sluggish lately too for me. It takes forever for a patch to submitted and a review request created(the latest one is still running for past 30 minutes with no output) On Wed, May 13, 2015 at 4:26 PM, Lefty Leverenz leftylever...@gmail.com wrote: By the way, we still need to add iss...@hive.apache.org to the website's Mailing Lists http://hive.apache.org/mailing_lists.html page -- see HIVE-10124 https://issues.apache.org/jira/browse/HIVE-10124. -- Lefty On Wed, May 13, 2015 at 2:16 PM, Lefty Leverenz leftylever...@gmail.com wrote: But some notifications and comments aren't making it onto any Hive mailing list -- see INFRA-9221 https://issues.apache.org/jira/browse/INFRA-9221 (please add your own comments and examples). This means the mail archives don't have a complete record of JIRA activity. -- Lefty On Wed, May 13, 2015 at 10:03 AM, Thejas Nair thejas.n...@gmail.com wrote: comments now added go to iss...@hive.apache.org . emails for JIRAs created should still go to dev@ On Wed, May 13, 2015 at 9:25 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: I noticed that I haven't been getting notifications(or they are really delayed) on any of the new JIRAs created/ comments added. Anyone else noticing similar issues as well? -- Swarnim -- Swarnim
Re: Review Request 33881: HIVE-10623 Implement hive cli options using beeline functionality
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33881/ --- (Updated May 14, 2015, 3:28 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10623 https://issues.apache.org/jira/browse/HIVE-10623 Repository: hive-git Description --- Changes: 1. Support the hive cli options including database, e, !, H, f. 2. Add error handler for using f and e together 3. Add error handler for invalid option Diffs (updated) - beeline/src/java/org/apache/hive/beeline/BeeLine.java 0da15f6 beeline/src/java/org/apache/hive/beeline/cli/CliOptionsProcessor.java PRE-CREATION beeline/src/java/org/apache/hive/beeline/cli/HiveCli.java PRE-CREATION beeline/src/test/org/apache/hive/beeline/cli/TestHiveCli.java PRE-CREATION beeline/src/test/resources/hive-site.xml PRE-CREATION Diff: https://reviews.apache.org/r/33881/diff/ Testing --- Newly add unit test passed locally. Thanks, cheng xu
[DISCUSS] Hive API passivity
While reviewing some of the recent patches, I came across a few with non-passive changes and or discussion around them. I was wondering what kind of passivity guarantees should we provide to our consumers? I understand that Hive API is probably not as widely used as some of its peers in the ecosystem like HBase. But should that be something we should start thinking on especially around user facing interfaces like UDFs, SerDes, StorageHandlers etc? More so given that we are 1.0 now? IMO we should avoid doing any of such changes and/or if we have to do so with a major version bump for the next release. Thoughts? -- Swarnim
[jira] [Created] (HIVE-10708) Add SchemaCompatibility check to AvroDeserializer
Swarnim Kulkarni created HIVE-10708: --- Summary: Add SchemaCompatibility check to AvroDeserializer Key: HIVE-10708 URL: https://issues.apache.org/jira/browse/HIVE-10708 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Avro provides a nice API[1] to check if the given reader schema can be used to deserialize the data given its writer schema. I think it would be super nice to integrate this into the AvroDeserializer so that we can fail fast and gracefully if there is a bad schema compatibility [1] https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaCompatibility.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: JIRA notifications
Now that we have moved to git, you can try using github pull request instead. It also integrates with jira. More git instructions - http://accumulo.apache.org/git.html On Thu, May 14, 2015 at 8:01 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Also not sure if it's related but seems like RB has been pretty sluggish lately too for me. It takes forever for a patch to submitted and a review request created(the latest one is still running for past 30 minutes with no output) On Wed, May 13, 2015 at 4:26 PM, Lefty Leverenz leftylever...@gmail.com wrote: By the way, we still need to add iss...@hive.apache.org to the website's Mailing Lists http://hive.apache.org/mailing_lists.html page -- see HIVE-10124 https://issues.apache.org/jira/browse/HIVE-10124. -- Lefty On Wed, May 13, 2015 at 2:16 PM, Lefty Leverenz leftylever...@gmail.com wrote: But some notifications and comments aren't making it onto any Hive mailing list -- see INFRA-9221 https://issues.apache.org/jira/browse/INFRA-9221 (please add your own comments and examples). This means the mail archives don't have a complete record of JIRA activity. -- Lefty On Wed, May 13, 2015 at 10:03 AM, Thejas Nair thejas.n...@gmail.com wrote: comments now added go to iss...@hive.apache.org . emails for JIRAs created should still go to dev@ On Wed, May 13, 2015 at 9:25 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: I noticed that I haven't been getting notifications(or they are really delayed) on any of the new JIRAs created/ comments added. Anyone else noticing similar issues as well? -- Swarnim -- Swarnim
[jira] [Created] (HIVE-10712) Hive on Apache Flink
Greg Senia created HIVE-10712: - Summary: Hive on Apache Flink Key: HIVE-10712 URL: https://issues.apache.org/jira/browse/HIVE-10712 Project: Hive Issue Type: Wish Reporter: Greg Senia Flink as an open-source data analytics cluster computing framework has gained some momentum recently. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Flink users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Flink also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez/Spark does. This is an umbrella JIRA which will cover many coming subtask. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10713) Update to HBase 1.0
Swarnim Kulkarni created HIVE-10713: --- Summary: Update to HBase 1.0 Key: HIVE-10713 URL: https://issues.apache.org/jira/browse/HIVE-10713 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni HBase is now 1.0. We should look into upgrading the HBase deps in Hive to 1.0 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10714) Bloom filter column names specification should be case insensitive
Prasanth Jayachandran created HIVE-10714: Summary: Bloom filter column names specification should be case insensitive Key: HIVE-10714 URL: https://issues.apache.org/jira/browse/HIVE-10714 Project: Hive Issue Type: Bug Affects Versions: 1.3.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Column names specified for orc bloom filter creation should be case insensitive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:04 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java, line 33 https://reviews.apache.org/r/34059/diff/1/?file=955664#file955664line33 booleans in java are false by default I find this provides better readability. Are there any negatives to having the initial value set here? On May 12, 2015, 6:04 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java, line 50 https://reviews.apache.org/r/34059/diff/1/?file=955664#file955664line50 It is not necessary but I do not see a reason why the visibility of this method should be reduced. Should it be public as all others? The public functionality we need from that class is provided by the Iterator/Iterable interfaces, I didn't think it would be necessary to expose reset() since it is only really being used by the outer class. - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83359 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:42 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java, line 439 https://reviews.apache.org/r/34059/diff/1/?file=955675#file955675line439 trailing space will fix On May 12, 2015, 6:42 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java, line 315 https://reviews.apache.org/r/34059/diff/1/?file=955677#file955677line315 Remove this line and add String type declaration 3 lines below. Do not confuse GC. will fix - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83371 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 5:51 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 674 https://reviews.apache.org/r/34059/diff/1/?file=955661#file955661line674 I think it's better to use Map.Entry here to avoid unnecessary lookup get(pos) Map.Entry provides getKey, getValue, setValue methods. will fix On May 12, 2015, 5:51 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 679 https://reviews.apache.org/r/34059/diff/1/?file=955661#file955661line679 the same recommendation as avove will fix On May 12, 2015, 5:51 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 715 https://reviews.apache.org/r/34059/diff/1/?file=955661#file955661line715 Using replace(char, char) is faster than replace(CharSequence target, CharSequence replacement) because it is not using Pattern.compile().matcher().replaceAll API Can you use replace('.', '_') instead of replace(., _)? will fix - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83356 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java, line 95 https://reviews.apache.org/r/34059/diff/1/?file=955671#file955671line95 usually static Log should be private because superclass static methods should use their own static Log to avoid confusion. will change to private On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java, line 1094 https://reviews.apache.org/r/34059/diff/1/?file=955671#file955671line1094 Can you use Map.Entry to avoid unnecesary lookup 3 lines below? will fix On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 107 https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line107 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods So that means equals() only works if it is the exact same ReduceSinkOperator object. This should be ok for our usage, if we are referring to the same ReduceSinkOperator, we should be using that exact same object. - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:35 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java, line 109 https://reviews.apache.org/r/34059/diff/1/?file=955673#file955673line109 trailing space will fix On May 12, 2015, 6:35 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java, line 423 https://reviews.apache.org/r/34059/diff/1/?file=955675#file955675line423 Why calling getEntry(key) two times consequently? containsKey() and get() call getEntry internally Just call get(rs) one time, check thet result is not null and remove the second get(rs) will fix - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83367 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Review Request 34238: HIVE-10709 Update avro dependency to 1.7.7
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34238/ --- Review request for hive and Alexander Pivovarov. Bugs: HIVE-10709 https://issues.apache.org/jira/browse/HIVE-10709 Repository: hive-git Description --- HIVE-10709 Update avro dependency to 1.7.7 Diffs - pom.xml 2e4ca36f31f2bbe89f9c0bb90ab9b4203085e773 Diff: https://reviews.apache.org/r/34238/diff/ Testing --- Thanks, Swarnim Kulkarni
fixed couple q tests which failed in recent builds. Need committer review
HIVE-10665 https://issues.apache.org/jira/browse/HIVE-10665 udaf_percentile_approx_23.q HIVE-10706 https://issues.apache.org/jira/browse/HIVE-10706 vectorized_timestamp_funcs.q
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 107 https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line107 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods Jason Dere wrote: So that means equals() only works if it is the exact same ReduceSinkOperator object. This should be ok for our usage, if we are referring to the same ReduceSinkOperator, we should be using that exact same object. Do you want to use IdentityHashMap then? This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2) - Alexander --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: [VOTE] Apache Hive 1.2.0 release candidate 4
?+1 ditto. same checks as last time. From: Alan Gates alanfga...@gmail.com Sent: Wednesday, May 13, 2015 1:35 PM To: dev@hive.apache.org Subject: Re: [VOTE] Apache Hive 1.2.0 release candidate 4 +1, same checks as last vote. Alan. [cid:part1.05080005.07060302@gmail.com] Sushanth Sowmyanmailto:khorg...@gmail.com May 13, 2015 at 11:50 Hi Folks, We've cleared all the blockers listed for 1.2.0 release, either committing them, or deferring out to an eventual 1.2.1 stabilization release. (Any deferrals were a result of discussion between myself and the committer responsible for the issue.) More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.0 Release Candidate 4 is available here: https://people.apache.org/~khorgath/releases/1.2.0_RC4/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1035 Source tag for RC4 is up on the apache git repo as tag release-1.2.0-rc4 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=38c3daef84bafb13bf911ec6c69d7640430fba70 ) Since this has minimal changes from the previous RC, I would further request that this vote conclude in 30 hours(which is past the 72 hr time from the previous RC announcement) if we have enough +1s in the meanwhile. Hive PMC Members: Please test and vote. Thanks, -Sushanth
Re: [VOTE] Apache Hive 1.2.0 release candidate 4
+1 Verfied the signature and checksum Build the src.tar.gz , ran queries from both newly built package and bin.tar.gz. Ran hive cli and beeline queries in local mode. Checked RELEASE_NOTES.txt , README.txt, LICENSE, NOTICE On Wed, May 13, 2015 at 1:35 PM, Alan Gates alanfga...@gmail.com wrote: +1, same checks as last vote. Alan. Sushanth Sowmyan khorg...@gmail.com May 13, 2015 at 11:50 Hi Folks, We've cleared all the blockers listed for 1.2.0 release, either committing them, or deferring out to an eventual 1.2.1 stabilization release. (Any deferrals were a result of discussion between myself and the committer responsible for the issue.) More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.0 Release Candidate 4 is available here: https://people.apache.org/~khorgath/releases/1.2.0_RC4/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1035 Source tag for RC4 is up on the apache git repo as tag release-1.2.0-rc4 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=38c3daef84bafb13bf911ec6c69d7640430fba70 ) Since this has minimal changes from the previous RC, I would further request that this vote conclude in 30 hours(which is past the 72 hr time from the previous RC announcement) if we have enough +1s in the meanwhile. Hive PMC Members: Please test and vote. Thanks, -Sushanth
Review Request 34235: HIVE-10687 Fix avro deserialization issues for evolved unions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34235/ --- Review request for hive and Brock Noland. Bugs: HIVE-10687 https://issues.apache.org/jira/browse/HIVE-10687 Repository: hive-git Description --- HIVE-10687 Fix avro deserialization issues for evolved unions Diffs - serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java e94cd83c064199ba719cc2de222edd0e12401c8c serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java eb495b4e1fc5874b30936f646b5bdb5aa8734130 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java c9e7d68b211ebc8c66af243fe85f4f89c6fd6cf3 Diff: https://reviews.apache.org/r/34235/diff/ Testing --- Thanks, Swarnim Kulkarni
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 107 https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line107 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods Jason Dere wrote: So that means equals() only works if it is the exact same ReduceSinkOperator object. This should be ok for our usage, if we are referring to the same ReduceSinkOperator, we should be using that exact same object. Alexander Pivovarov wrote: Do you want to use IdentityHashMap then? This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2) Jason Dere wrote: We're using a Set here as opposed to a Map. I'll change to use Sets.newIdentityHashSet() from Guava. IdentityHashMap contains private KeySet class already to get its instance you can call keySet() method e.g. IdentityHashMapInteger, Object rsMap = new IdentityHashMapInteger, Object(); rsMap.put(1, null); rsMap.put(2, null); rsMap.put(3, null); SetInteger rsSet = rsMap.keySet(); System.out.println(rsSet); [3, 1, 2] - Alexander --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- On May 15, 2015, 1:02 a.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 15, 2015, 1:02 a.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties f9c9351 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
[jira] [Created] (HIVE-10715) RAT failures - many files do not have ASF licenses
Sushanth Sowmyan created HIVE-10715: --- Summary: RAT failures - many files do not have ASF licenses Key: HIVE-10715 URL: https://issues.apache.org/jira/browse/HIVE-10715 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Lots of files do not have proper ASF headers included in. We should add them in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 107 https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line107 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods Jason Dere wrote: So that means equals() only works if it is the exact same ReduceSinkOperator object. This should be ok for our usage, if we are referring to the same ReduceSinkOperator, we should be using that exact same object. Alexander Pivovarov wrote: Do you want to use IdentityHashMap then? This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2) We're using a Set here as opposed to a Map. I'll change to use Sets.newIdentityHashSet() from Guava. - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Review Request 34248: HIVE-10684 Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary jar files
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34248/ --- Review request for hive and Sushanth Sowmyan. Bugs: HIVE-10684 https://issues.apache.org/jira/browse/HIVE-10684 Repository: hive-git Description --- Remove binaries from source and fix the failed cases Diffs - ql/pom.xml f1a6f7d ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java 45ba07e ql/src/test/resources/RefreshedJarClassV1.txt PRE-CREATION ql/src/test/resources/RefreshedJarClassV2.txt PRE-CREATION Diff: https://reviews.apache.org/r/34248/diff/ Testing --- UT passed Thanks, cheng xu
[jira] [Created] (HIVE-10717) Fix failed qtest encryption_insert_partition_static test in Jenkin
Ferdinand Xu created HIVE-10717: --- Summary: Fix failed qtest encryption_insert_partition_static test in Jenkin Key: HIVE-10717 URL: https://issues.apache.org/jira/browse/HIVE-10717 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu It can be reproduced in Jenkins. See http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3898/testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: JIRA notifications
Yeah I was having issues with both the manual method as well as with rbt. But seems like things are back to normal now. Thanks guys! On May 14, 2015 12:51 PM, Alexander Pivovarov apivova...@gmail.com wrote: You can use the following command to create new review. It takes about 3-5 sec $ rbt post -g yes To update the review you can run. $ rbt post -u -g yes On Thu, May 14, 2015 at 10:48 AM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: @Swarnim.. Generating patch with git diff needs to include the full index for it to be uploaded to review board. “git diff —full-index”. https://code.google.com/p/reviewboard/issues/detail?id=3115 - Prasanth On May 14, 2015, at 9:14 AM, Thejas Nair thejas.n...@gmail.com wrote: Now that we have moved to git, you can try using github pull request instead. It also integrates with jira. More git instructions - http://accumulo.apache.org/git.html On Thu, May 14, 2015 at 8:01 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Also not sure if it's related but seems like RB has been pretty sluggish lately too for me. It takes forever for a patch to submitted and a review request created(the latest one is still running for past 30 minutes with no output) On Wed, May 13, 2015 at 4:26 PM, Lefty Leverenz leftylever...@gmail.com wrote: By the way, we still need to add iss...@hive.apache.org to the website's Mailing Lists http://hive.apache.org/mailing_lists.html page -- see HIVE-10124 https://issues.apache.org/jira/browse/HIVE-10124. -- Lefty On Wed, May 13, 2015 at 2:16 PM, Lefty Leverenz leftylever...@gmail.com wrote: But some notifications and comments aren't making it onto any Hive mailing list -- see INFRA-9221 https://issues.apache.org/jira/browse/INFRA-9221 (please add your own comments and examples). This means the mail archives don't have a complete record of JIRA activity. -- Lefty On Wed, May 13, 2015 at 10:03 AM, Thejas Nair thejas.n...@gmail.com wrote: comments now added go to iss...@hive.apache.org . emails for JIRAs created should still go to dev@ On Wed, May 13, 2015 at 9:25 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: I noticed that I haven't been getting notifications(or they are really delayed) on any of the new JIRAs created/ comments added. Anyone else noticing similar issues as well? -- Swarnim -- Swarnim
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 15, 2015, 1:02 a.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Changes --- Addressing RB feedback from apivovarov Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties f9c9351 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Review Request 34249: Case folding with nulls in expression with filter operator
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34249/ --- Review request for hive and Gopal V. Bugs: HIVE-10716 https://issues.apache.org/jira/browse/HIVE-10716 Repository: hive-git Description --- Case folding with nulls in expression with filter operator Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java 209f717 ql/src/test/queries/clientpositive/fold_case.q 3f9e3a3 ql/src/test/results/clientpositive/fold_case.q.out de6c43e ql/src/test/results/clientpositive/fold_eq_with_case_when.q.out 45a0cb1 ql/src/test/results/clientpositive/fold_when.q.out 51d4767 Diff: https://reviews.apache.org/r/34249/diff/ Testing --- New tests. Thanks, Ashutosh Chauhan
Re: [VOTE] Apache Hive 1.2.0 release candidate 4
Sorry folks, we discovered one more issue with the RC - ASF headers were missing in a couple of files. I'm in the process of spinning out an RC5 with the fix. And since we already have functional testing of the RC done, and these are trivial changes from the previous RC, I again propose a shortened RC time for this RC as well. Those that have tested so far, I thank you for your efforts and your patience, and request another testing of the next RC. Thanks, -Sushanth On Thu, May 14, 2015 at 3:14 PM, Thejas Nair thejas.n...@gmail.com wrote: +1 Verfied the signature and checksum Build the src.tar.gz , ran queries from both newly built package and bin.tar.gz. Ran hive cli and beeline queries in local mode. Checked RELEASE_NOTES.txt , README.txt, LICENSE, NOTICE On Wed, May 13, 2015 at 1:35 PM, Alan Gates alanfga...@gmail.com wrote: +1, same checks as last vote. Alan. Sushanth Sowmyan khorg...@gmail.com May 13, 2015 at 11:50 Hi Folks, We've cleared all the blockers listed for 1.2.0 release, either committing them, or deferring out to an eventual 1.2.1 stabilization release. (Any deferrals were a result of discussion between myself and the committer responsible for the issue.) More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.0 Release Candidate 4 is available here: https://people.apache.org/~khorgath/releases/1.2.0_RC4/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1035 Source tag for RC4 is up on the apache git repo as tag release-1.2.0-rc4 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=38c3daef84bafb13bf911ec6c69d7640430fba70 ) Since this has minimal changes from the previous RC, I would further request that this vote conclude in 30 hours(which is past the 72 hr time from the previous RC announcement) if we have enough +1s in the meanwhile. Hive PMC Members: Please test and vote. Thanks, -Sushanth
Re: [VOTE] Apache Hive 1.2.0 release candidate 5
I built against hadoop1 and hadoop2 and ran the rat tool as well. Ran a couple of queries. +1 Thanks Vikram. On Thu, May 14, 2015 at 6:30 PM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi Folks, We've cleared all the blockers listed for 1.2.0 release, either committing them, or deferring out to an eventual 1.2.1 stabilization release. (Any deferrals were a result of discussion between myself and the committer responsible for the issue.) More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.0 Release Candidate 5 is available here: https://people.apache.org/~khorgath/releases/1.2.0_RC5/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1039 Source tag for RC5 is up on the apache git repo as tag release-1.2.0-rc5 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=76b90268084f529852396302884297b3c22fcf00 ) Since this has minimal changes from the previous RC, I would further request that this vote conclude in 20 hours(which is past the 72 hr time from the previous RC announcement) if we have enough +1s in the meanwhile. Hive PMC Members: Please test and vote. Thanks, -Sushanth -- Nothing better than when appreciated for hard work. -Mark
Re: Review Request 34235: HIVE-10687 Fix avro deserialization issues for evolved unions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34235/#review83883 --- serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java https://reviews.apache.org/r/34235/#comment134958 tail space serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java https://reviews.apache.org/r/34235/#comment134959 tailing space serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java https://reviews.apache.org/r/34235/#comment134964 Do you need to cover null value case? serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java https://reviews.apache.org/r/34235/#comment134960 tailing spaces serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java https://reviews.apache.org/r/34235/#comment134962 remove space pls Some minor issues and a question - cheng xu On May 14, 2015, 10:07 p.m., Swarnim Kulkarni wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34235/ --- (Updated May 14, 2015, 10:07 p.m.) Review request for hive and Brock Noland. Bugs: HIVE-10687 https://issues.apache.org/jira/browse/HIVE-10687 Repository: hive-git Description --- HIVE-10687 Fix avro deserialization issues for evolved unions Diffs - serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java e94cd83c064199ba719cc2de222edd0e12401c8c serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java eb495b4e1fc5874b30936f646b5bdb5aa8734130 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java c9e7d68b211ebc8c66af243fe85f4f89c6fd6cf3 Diff: https://reviews.apache.org/r/34235/diff/ Testing --- Thanks, Swarnim Kulkarni
[jira] [Created] (HIVE-10716) Fold case/when udf for expression involving nulls in filter operator.
Ashutosh Chauhan created HIVE-10716: --- Summary: Fold case/when udf for expression involving nulls in filter operator. Key: HIVE-10716 URL: https://issues.apache.org/jira/browse/HIVE-10716 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 1.3.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan From HIVE-10636 comments, more folding is possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[VOTE] Apache Hive 1.2.0 release candidate 5
Hi Folks, We've cleared all the blockers listed for 1.2.0 release, either committing them, or deferring out to an eventual 1.2.1 stabilization release. (Any deferrals were a result of discussion between myself and the committer responsible for the issue.) More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.0 Release Candidate 5 is available here: https://people.apache.org/~khorgath/releases/1.2.0_RC5/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1039 Source tag for RC5 is up on the apache git repo as tag release-1.2.0-rc5 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=76b90268084f529852396302884297b3c22fcf00 ) Since this has minimal changes from the previous RC, I would further request that this vote conclude in 20 hours(which is past the 72 hr time from the previous RC announcement) if we have enough +1s in the meanwhile. Hive PMC Members: Please test and vote. Thanks, -Sushanth
[jira] [Created] (HIVE-10719) Hive metastore failure when alter table rename is attempted.
Vikram Dixit K created HIVE-10719: - Summary: Hive metastore failure when alter table rename is attempted. Key: HIVE-10719 URL: https://issues.apache.org/jira/browse/HIVE-10719 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0, 1.2.0, 1.1.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K {code} create database newDB location /tmp/; describe database extended newDB; use newDB; create table tab (name string); alter table tab rename to newName; {code} Fails: {code} InvalidOperationException(message:Unable to access old location hdfs://localhost:8020/tmp/tab for table x.tab) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 34197: HIVE-10706 Make vectorized_timestamp_funcs test more stable
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34197/ --- Review request for hive and Jason Dere. Bugs: HIVE-10706 https://issues.apache.org/jira/browse/HIVE-10706 Repository: hive-git Description --- HIVE-10706 Make vectorized_timestamp_funcs test more stable Diffs - ql/src/test/queries/clientpositive/vectorized_timestamp_funcs.q 8a2d5aaf5fb0396e551bdefdde507d1e9902919b ql/src/test/results/clientpositive/spark/vectorized_timestamp_funcs.q.out 304458215b4dcbc4d49321ba5f14ca5a87f2ec26 ql/src/test/results/clientpositive/tez/vectorized_timestamp_funcs.q.out fa3ed21232004d710b33cadac66680eabaca2c8a ql/src/test/results/clientpositive/vectorized_timestamp_funcs.q.out 31a96c68b22bd5332fb71b52982de71710df65fa Diff: https://reviews.apache.org/r/34197/diff/ Testing --- Thanks, Alexander Pivovarov
[jira] [Created] (HIVE-10707) CBO: debug logging OOMs
Gopal V created HIVE-10707: -- Summary: CBO: debug logging OOMs Key: HIVE-10707 URL: https://issues.apache.org/jira/browse/HIVE-10707 Project: Hive Issue Type: Bug Components: CBO Reporter: Gopal V Priority: Trivial {code} hive source xcross.sql; OK Time taken: 0.837 seconds Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuilder.append(StringBuilder.java:136) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:111) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119) {code} The query contains 360 join clauses, wrapped in a UNION ALL. Looks like {{genOpTree}} does {code} this.ctx.setCboInfo(Plan optimized by CBO.); this.ctx.setCboSucceeded(true); LOG.debug(newAST.dump()); } {code} the debug logging OOMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 33881: HIVE-10623 Implement hive cli options using beeline functionality
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33881/#review83724 --- beeline/src/test/org/apache/hive/beeline/cli/TestHiveCli.java https://reviews.apache.org/r/33881/#comment134770 You can use IOUtils.closeQuietly(bw) I do not think we need to log close buffer error - Alexander Pivovarov On May 14, 2015, 5:51 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33881/ --- (Updated May 14, 2015, 5:51 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-10623 https://issues.apache.org/jira/browse/HIVE-10623 Repository: hive-git Description --- Changes: 1. Support the hive cli options including database, e, !, H, f. 2. Add error handler for using f and e together 3. Add error handler for invalid option Diffs - beeline/src/java/org/apache/hive/beeline/BeeLine.java 0da15f6 beeline/src/java/org/apache/hive/beeline/cli/CliOptionsProcessor.java PRE-CREATION beeline/src/java/org/apache/hive/beeline/cli/HiveCli.java PRE-CREATION beeline/src/test/org/apache/hive/beeline/cli/TestHiveCli.java PRE-CREATION beeline/src/test/resources/hive-site.xml PRE-CREATION Diff: https://reviews.apache.org/r/33881/diff/ Testing --- Newly add unit test passed locally. Thanks, cheng xu
[jira] [Created] (HIVE-10718) Update committer list - Add Ferdinand Xu
Ferdinand Xu created HIVE-10718: --- Summary: Update committer list - Add Ferdinand Xu Key: HIVE-10718 URL: https://issues.apache.org/jira/browse/HIVE-10718 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu Priority: Minor NO PRECOMMIT TESTS add myself to committer list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache Hive 1.2.0 release candidate 5
One more time, with feeling :-) +1, same verification as last time. From: Vikram Dixit K vikram.di...@gmail.com Sent: Thursday, May 14, 2015 6:51 PM To: dev@hive.apache.org Cc: hive-...@hadoop.apache.org Subject: Re: [VOTE] Apache Hive 1.2.0 release candidate 5 I built against hadoop1 and hadoop2 and ran the rat tool as well. Ran a couple of queries. +1 Thanks Vikram. On Thu, May 14, 2015 at 6:30 PM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi Folks, We've cleared all the blockers listed for 1.2.0 release, either committing them, or deferring out to an eventual 1.2.1 stabilization release. (Any deferrals were a result of discussion between myself and the committer responsible for the issue.) More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.0 Release Candidate 5 is available here: https://people.apache.org/~khorgath/releases/1.2.0_RC5/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1039 Source tag for RC5 is up on the apache git repo as tag release-1.2.0-rc5 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=76b90268084f529852396302884297b3c22fcf00 ) Since this has minimal changes from the previous RC, I would further request that this vote conclude in 20 hours(which is past the 72 hr time from the previous RC announcement) if we have enough +1s in the meanwhile. Hive PMC Members: Please test and vote. Thanks, -Sushanth -- Nothing better than when appreciated for hard work. -Mark
FAILED: IndexOutOfBoundsException Index: 3, Size: 3
Hello, Man! When I execute the hql ,for example: FROM INPUT INSERT OVERWRITE TABLE temp2 partition(dt='2015-05-15',type='type1') SELECT key,ip, count(distinct(value)) as uv, count(distinct(val1)) as pv where ip='2015-05-13' and key='dstdomain1' GROUP BY key,ip INSERT OVERWRITE TABLE temp2 partition(dt='2015-05-15',type='type2') SELECT key,ip, count(distinct(value)) as uv, count(distinct(val1)) as pv where ip='2015-05-13' and key='dstdomain' GROUP BY key,ip; Throw an exeception: FAILED: IndexOutOfBoundsException Index: 3, Size: 3 15/05/15 09:50:39 [main]: ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 3, Size: 3 java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$ReduceSinkLineage.process(OpProcFactory.java:477) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:95) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:182) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10216) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1158) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Review Request 34261: query on view results fails with table not found error if view is created with subquery alias (CTE).
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34261/ --- Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- 1. When a fully qualified identifier (db.tablename) is specified in the from clause we seems to resolve it against CTE aliases. This is wrong if table doesn't exist in catalog then we should fail. 2. If fully qualified name is not used in the from clause then a) we should first resolve the identifier against CTE aliases b) if identifier is not found in the CTE list then try to resolve against catalog. 3) Views: in unparsetranslator we treat CTE name as catalog table; this is a bug. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 4bb256d ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 30c87ad ql/src/test/queries/clientpositive/cteViews.q PRE-CREATION ql/src/test/results/clientpositive/cteViews.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34261/diff/ Testing --- Thanks, pengcheng xiong