Re: Reading 2 table data in MapReduce for Performing Join
This is solved. Used Writable instead of LongWritable or NullWritable in Mapper input key type. Thanks Suraj Nayak On 19-Mar-2015 9:48 PM, Suraj Nayak snay...@gmail.com wrote: Is this related to https://issues.apache.org/jira/browse/HIVE-4329 ? Is there a workaround? On Thu, Mar 19, 2015 at 9:47 PM, Suraj Nayak snay...@gmail.com wrote: Hi All, I was successfully able to integrate HCatMultipleInputs with the patch for the tables created with TEXTFILE. But I get error when I read table created with ORC file. The error is below : 15/03/19 10:51:32 INFO mapreduce.Job: Task Id : attempt_1425012118520_9756_m_00_0, Status : FAILED Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.io.LongWritable at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Can anyone help? Thanks in advance! On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak snay...@gmail.com wrote: Hi All, https://issues.apache.org/jira/browse/HIVE-4997 patch helped! On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote: Hi, I tried reading data via HCatalog for 1 Hive table in MapReduce using something similar to https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog. I was able to read successfully. Now am trying to read 2 tables, as the requirement is to join 2 tables. I did not find API similar to *FileInputFormat.addInputPaths* in *HCatInputFormat*. What is the equivalent of the same in HCat ? I had performed join using FilesInputFormat in HDFS(by getting split information in mapper). This article( http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join. http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone suggest how I can perform join operation using HCatalog ? Briefly, the aim is to - Read 2 tables (almost similar schema) - If key exists in both the table send it to same reducer. - Do some processing on the records in reducer. - Save the output into file/Hive table. *P.S : The reason for using MapReduce to perform join is because of complex requirement which can't be solved via Hive/Pig directly. * Any help will be greatly appreciated :) -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M
Hive-0.14 - Build # 906 - Fixed
Changes for Build #905 Changes for Build #906 No tests ran. The Apache Jenkins build system has built Hive-0.14 (build #906) Status: Fixed Check console output at https://builds.apache.org/job/Hive-0.14/906/ to view the results.
Re: Review Request 32370: HIVE-10040
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/ --- (Updated March 27, 2015, 4:15 p.m.) Review request for hive and John Pullokkaran. Changes --- Override annotation. Bugs: HIVE-10040 https://issues.apache.org/jira/browse/HIVE-10040 Repository: hive-git Description --- CBO (Calcite Return Path): Pluggable cost modules [CBO branch] Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java 977313a5a632329fc963daf7ff276ccdd59ce7c5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 41604cd0af68e7f90296fa271c42debc5aaf743a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java e2b010b641d48ea1bf04750ddf5eb24fb3a7fcbe ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java 6c215c96190f0fcebe063b15c2763c49ebf1faaf ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java f2c5408d913bfe2648c4e1e1e43b1bbc5f43a549 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java 4984683c3c8c6c0378a22e21fd6d961f3901f25c ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java f846dd19899af51194f3407ef913fcb9bcc24977 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java dabbe280278dc80f00f0240a0c615fe6c7b8533a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java 95515b23e409d73d5c61e107931727add3f992a6 Diff: https://reviews.apache.org/r/32370/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
[jira] [Created] (HIVE-10116) CBO (Calcite Return Path): RelMdSize throws an Exception when Join is actually a Semijoin [CBO branch]
Jesus Camacho Rodriguez created HIVE-10116: -- Summary: CBO (Calcite Return Path): RelMdSize throws an Exception when Join is actually a Semijoin [CBO branch] Key: HIVE-10116 URL: https://issues.apache.org/jira/browse/HIVE-10116 Project: Hive Issue Type: Sub-task Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez {{cbo_semijoin.q}} reproduces the error. Stacktrace: {noformat} 2015-03-26 09:55:20,652 ERROR [main]: parse.CalcitePlanner (CalcitePlanner.java:genOPTree(269)) - CBO failed, skipping CBO. java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.calcite.rel.metadata.RelMdSize.averageColumnSizes(RelMdSize.java:193) at sun.reflect.GeneratedMethodAccessor134.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$2$1.invoke(ReflectiveRelMetadataProvider.java:194) at com.sun.proxy.$Proxy30.averageColumnSizes(Unknown Source) at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at at java.lang.reflect.Method.invoke(Method.java:606) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32370: HIVE-10040
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/ --- (Updated March 27, 2015, 4:12 p.m.) Review request for hive and John Pullokkaran. Changes --- New patch with different hierarchy among classes/methods. Bugs: HIVE-10040 https://issues.apache.org/jira/browse/HIVE-10040 Repository: hive-git Description --- CBO (Calcite Return Path): Pluggable cost modules [CBO branch] Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java 977313a5a632329fc963daf7ff276ccdd59ce7c5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 41604cd0af68e7f90296fa271c42debc5aaf743a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java e2b010b641d48ea1bf04750ddf5eb24fb3a7fcbe ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java 6c215c96190f0fcebe063b15c2763c49ebf1faaf ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java f2c5408d913bfe2648c4e1e1e43b1bbc5f43a549 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java 4984683c3c8c6c0378a22e21fd6d961f3901f25c ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java f846dd19899af51194f3407ef913fcb9bcc24977 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java dabbe280278dc80f00f0240a0c615fe6c7b8533a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java 95515b23e409d73d5c61e107931727add3f992a6 Diff: https://reviews.apache.org/r/32370/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
Re: Review Request 32370: HIVE-10040
On March 25, 2015, 8:51 p.m., John Pullokkaran wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java, line 42 https://reviews.apache.org/r/32370/diff/1/?file=902177#file902177line42 How about modifying this t0: 1. Get Join Algorithms 2. Walk through Join Algorithm a. Get Cost for Join b. Find cheapest This way algorithm walking and finding cost would be generic. John, I have been refactoring the code to do it how you proposed. Actually, I am not convinced about the result; at the end it seems more difficult to extend the hierarchy in case a new cost model is added, plus null values need to be given to some parameters needed to decide which algorithms are available (e.g. maxMemory). Another idea would be that HiveOnTezCostModel extends HiveDefaultCostModel instead of implementing HiveCostModel directly; I have done this and it looks clean. - Jesús --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/#review77792 --- On March 21, 2015, 5:49 p.m., Jesús Camacho Rodríguez wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/ --- (Updated March 21, 2015, 5:49 p.m.) Review request for hive and John Pullokkaran. Bugs: HIVE-10040 https://issues.apache.org/jira/browse/HIVE-10040 Repository: hive-git Description --- CBO (Calcite Return Path): Pluggable cost modules [CBO branch] Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java 977313a5a632329fc963daf7ff276ccdd59ce7c5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 41604cd0af68e7f90296fa271c42debc5aaf743a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java e2b010b641d48ea1bf04750ddf5eb24fb3a7fcbe ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java 6c215c96190f0fcebe063b15c2763c49ebf1faaf ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java c53d6ae80fe9c8111f609bfebd8530eca67d27b7 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java 4984683c3c8c6c0378a22e21fd6d961f3901f25c ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java f846dd19899af51194f3407ef913fcb9bcc24977 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java dabbe280278dc80f00f0240a0c615fe6c7b8533a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java 95515b23e409d73d5c61e107931727add3f992a6 Diff: https://reviews.apache.org/r/32370/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
Re: Review Request 32499: HIVE-10086: Hive throws error when accessing Parquet file schema using field name match
On March 26, 2015, 10:36 p.m., Szehon Ho wrote: Looks good to me, Ryan is more an expert. All I contribute is some minor syntax comment. Thanks Szehon. I added the changes to the patch. I uploaded the patch to the Jira. - Sergio --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32499/#review77960 --- On March 26, 2015, 8:51 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32499/ --- (Updated March 26, 2015, 8:51 p.m.) Review request for hive. Bugs: HIVE-10086 https://issues.apache.org/jira/browse/HIVE-10086 Repository: hive-git Description --- Attached is the patch that handles schema that do not match between Parquet and Hive. The access to Parquet data is with name matching in this case. The table column may have different schema order, but if the name matches the parquet column name, then the value is retrieved. Also, if the Hive schema has columns and struct elements that do not match with the Parquet schema, then it will return NULL values instead. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java a43661eb54ba29692c07c264584b5aecf648ef99 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 57ae7a9740d55b407cadfc8bc030593b29f90700 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java a26199612cf338e336f210f29acb0398c536e1f9 ql/src/test/queries/clientpositive/parquet_schema_evolution.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_table_with_subschema.q PRE-CREATION ql/src/test/results/clientpositive/parquet_schema_evolution.q.out PRE-CREATION ql/src/test/results/clientpositive/parquet_table_with_subschema.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32499/diff/ Testing --- Thanks, Sergio Pena
[jira] [Created] (HIVE-10118) CBO :
Mostafa Mokhtar created HIVE-10118: -- Summary: CBO : Key: HIVE-10118 URL: https://issues.apache.org/jira/browse/HIVE-10118 Project: Hive Issue Type: Sub-task Reporter: Mostafa Mokhtar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session
Hari Sankar Sivarama Subramaniyan created HIVE-10119: Summary: Allow Log verbosity to be set in hiveserver2 session Key: HIVE-10119 URL: https://issues.apache.org/jira/browse/HIVE-10119 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan We need to be able to set logging per HS2 session. The client often uses the map-reduce completion matrix (Execution) that shows up in Beeline to debug performance. User might not want the verbose log view all the time since it obfuscates the Execution information. Hence the client should be able to change the verbosity level. Also, there are 2 levels of verbosity at HS2 logging and not 3. The users might want Execution + Performance counters only - so that level needs to be added. So for logs, the user should be able to set 3 levels of verbosity in the session, that will override the default verbosity specified in the hive-site.xml file. 0. None - IGNORE 1. Execution - Just shows the map-reduce tasks completing 2. Performance - Execution + Performance counters dumped at the end 3. Verbose - All logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10120) Disallow create table with dot/colon in column name
Pengcheng Xiong created HIVE-10120: -- Summary: Disallow create table with dot/colon in column name Key: HIVE-10120 URL: https://issues.apache.org/jira/browse/HIVE-10120 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Since we don't allow users to query column names with dot in the middle such as emp.no, don't allow users to create tables with such columns that cannot be queried. Fix the documentation to reflect this fix. Here is an example. Consider this table: {code} CREATE TABLE a (`emp.no` string); {code} select `emp.no` from a; fails with this message: FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp from [0:emp.no] {code} The hive documentation needs to be fixed: {code} (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) seems to indicate that any Unicode character can go between the backticks in the select statement, but it doesn’t like the dot/colon or even select * when there is a column that has a dot/colon. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32489: HIVE-9518 Implement MONTHS_BETWEEN aligned with Oracle one
On March 27, 2015, 6:08 p.m., Mohit Sabharwal wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMonthsBetween.java, line 129 https://reviews.apache.org/r/32489/diff/2-3/?file=906532#file906532line129 static? IMHO in OOP it's recommended not to use static methods becasue they can not be overwritten by child classes On March 27, 2015, 6:08 p.m., Mohit Sabharwal wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java, line 507 https://reviews.apache.org/r/32489/diff/3/?file=907259#file907259line507 19...a comment on where this comes from would be great added comment // lets be strict here and // support only exact 10 char string for short date format - Alexander --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32489/#review78088 --- On March 27, 2015, 12:58 a.m., Alexander Pivovarov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32489/ --- (Updated March 27, 2015, 12:58 a.m.) Review request for hive and Jason Dere. Bugs: HIVE-9518 https://issues.apache.org/jira/browse/HIVE-9518 Repository: hive-git Description --- HIVE-9518 Implement MONTHS_BETWEEN aligned with Oracle one Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2476e832b8b7101971ea2226368aa82633b7e7d1 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java ce981232382e993c7c9d640efe9b2d21f70a0ed4 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMonthsBetween.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFMonthsBetween.java PRE-CREATION ql/src/test/queries/clientpositive/udf_months_between.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 22091d06241218a5c0ee21d6ee6be00a71706971 ql/src/test/results/clientpositive/udf_months_between.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32489/diff/ Testing --- Thanks, Alexander Pivovarov
Re: Review Request 32489: HIVE-9518 Implement MONTHS_BETWEEN aligned with Oracle one
On March 27, 2015, 6:08 p.m., Mohit Sabharwal wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java, lines 504-505 https://reviews.apache.org/r/32489/diff/3/?file=907259#file907259line504 same as PrimitiveGrouping.getPrimitiveGroup(inputTypes[i]) == STRING_GROUP ? great suggestion. Thank you! - Alexander --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32489/#review78088 --- On March 27, 2015, 6:35 p.m., Alexander Pivovarov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32489/ --- (Updated March 27, 2015, 6:35 p.m.) Review request for hive and Jason Dere. Bugs: HIVE-9518 https://issues.apache.org/jira/browse/HIVE-9518 Repository: hive-git Description --- HIVE-9518 Implement MONTHS_BETWEEN aligned with Oracle one Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2476e832b8b7101971ea2226368aa82633b7e7d1 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java ce981232382e993c7c9d640efe9b2d21f70a0ed4 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMonthsBetween.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFMonthsBetween.java PRE-CREATION ql/src/test/queries/clientpositive/udf_months_between.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 22091d06241218a5c0ee21d6ee6be00a71706971 ql/src/test/results/clientpositive/udf_months_between.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32489/diff/ Testing --- Thanks, Alexander Pivovarov
Re: Review Request 32489: HIVE-9518 Implement MONTHS_BETWEEN aligned with Oracle one
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32489/#review78088 --- LGTM. Some nits. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMonthsBetween.java https://reviews.apache.org/r/32489/#comment126512 static? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java https://reviews.apache.org/r/32489/#comment126504 same as PrimitiveGrouping.getPrimitiveGroup(inputTypes[i]) == STRING_GROUP ? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java https://reviews.apache.org/r/32489/#comment126505 19...a comment on where this comes from would be great - Mohit Sabharwal On March 27, 2015, 12:58 a.m., Alexander Pivovarov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32489/ --- (Updated March 27, 2015, 12:58 a.m.) Review request for hive and Jason Dere. Bugs: HIVE-9518 https://issues.apache.org/jira/browse/HIVE-9518 Repository: hive-git Description --- HIVE-9518 Implement MONTHS_BETWEEN aligned with Oracle one Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2476e832b8b7101971ea2226368aa82633b7e7d1 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java ce981232382e993c7c9d640efe9b2d21f70a0ed4 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMonthsBetween.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFMonthsBetween.java PRE-CREATION ql/src/test/queries/clientpositive/udf_months_between.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 22091d06241218a5c0ee21d6ee6be00a71706971 ql/src/test/results/clientpositive/udf_months_between.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32489/diff/ Testing --- Thanks, Alexander Pivovarov
Re: Review Request 32489: HIVE-9518 Implement MONTHS_BETWEEN aligned with Oracle one
On March 27, 2015, 6:08 p.m., Mohit Sabharwal wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java, line 507 https://reviews.apache.org/r/32489/diff/3/?file=907259#file907259line507 19...a comment on where this comes from would be great Alexander Pivovarov wrote: added comment // lets be strict here and // support only exact 10 char string for short date format getTimestampValue method was added recently as part the activity to add common methods to GenericUDF to validate/extract values. getTimestampValue uses sql.Timestamp.getValue(str) internaly to parse input strings. sql.Timestamp.getValue supports only string which are 19 chars and more. BTW, getTimestampValue was not used by any UDF before. GenericUDFMonthsBetween is the first function which uses it. I think it's quite common situation when UDF should support both short date -MM-dd and long date+time string formats and not skip time part From the other side we need to decid what to do with incorrect strings which contains just hour and minutes for example 2015-03-27 10:30 original getTimestampValue will return null The change which I added yesterday - if (dateStr.length() 19) will return not null value but time part will be skipped I think it might be confused for end users. They might expect that if function returs not null value then it parses partial time part of the string. This is why today I decided to return null for such incorrect strings. I think it's less confused to end users. So, I only left support for exact short date format which is 10 chars -MM-dd - Alexander --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32489/#review78088 --- On March 27, 2015, 6:35 p.m., Alexander Pivovarov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32489/ --- (Updated March 27, 2015, 6:35 p.m.) Review request for hive and Jason Dere. Bugs: HIVE-9518 https://issues.apache.org/jira/browse/HIVE-9518 Repository: hive-git Description --- HIVE-9518 Implement MONTHS_BETWEEN aligned with Oracle one Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2476e832b8b7101971ea2226368aa82633b7e7d1 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java ce981232382e993c7c9d640efe9b2d21f70a0ed4 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMonthsBetween.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFMonthsBetween.java PRE-CREATION ql/src/test/queries/clientpositive/udf_months_between.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 22091d06241218a5c0ee21d6ee6be00a71706971 ql/src/test/results/clientpositive/udf_months_between.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32489/diff/ Testing --- Thanks, Alexander Pivovarov
[jira] [Created] (HIVE-10117) LLAP: Use task number, attempt number to cache plans
Siddharth Seth created HIVE-10117: - Summary: LLAP: Use task number, attempt number to cache plans Key: HIVE-10117 URL: https://issues.apache.org/jira/browse/HIVE-10117 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Instead of relying on thread locals only. This can be used to share the work between Inputs / Processor / Outputs in Tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32489: HIVE-9518 Implement MONTHS_BETWEEN aligned with Oracle one
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32489/ --- (Updated March 27, 2015, 6:35 p.m.) Review request for hive and Jason Dere. Changes --- 2 improvements in GenericUDF.getTimestampValue related to short date format support Bugs: HIVE-9518 https://issues.apache.org/jira/browse/HIVE-9518 Repository: hive-git Description --- HIVE-9518 Implement MONTHS_BETWEEN aligned with Oracle one Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2476e832b8b7101971ea2226368aa82633b7e7d1 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java ce981232382e993c7c9d640efe9b2d21f70a0ed4 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMonthsBetween.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFMonthsBetween.java PRE-CREATION ql/src/test/queries/clientpositive/udf_months_between.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 22091d06241218a5c0ee21d6ee6be00a71706971 ql/src/test/results/clientpositive/udf_months_between.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32489/diff/ Testing --- Thanks, Alexander Pivovarov
[jira] [Created] (HIVE-10121) Implement a hive --service udflint command to check UDF jars for common shading mistakes
Gopal V created HIVE-10121: -- Summary: Implement a hive --service udflint command to check UDF jars for common shading mistakes Key: HIVE-10121 URL: https://issues.apache.org/jira/browse/HIVE-10121 Project: Hive Issue Type: New Feature Components: UDF Reporter: Gopal V Several SerDe and UDF jars tend to shade in various parts of the dependencies including hadoop-common or guava without relocation. Implement a simple udflint tool which automates some part of the class path and shaded resources audit process required when upgrading a hive install from an old version to a new one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Not getting some JIRA resolution messages
This is still a problem. I'm working around it but the archives are incomplete, and that's not good. I updated INFRA-9221 https://issues.apache.org/jira/browse/INFRA-9221 with a few more examples. Can someone corroborate please (so INFRA doesn't think I'm a nut-case)? Thanks. -- Lefty On Tue, Mar 3, 2015 at 4:12 AM, Lefty Leverenz leftylever...@gmail.com wrote: Good call Alan, the messages are also missing from the archives so I filed INFRA-9221 https://issues.apache.org/jira/browse/INFRA-9221. By the way, we need to add issues@hive to the Mailing Lists http://hive.apache.org/mailing_lists.html page. I'll file a Hive JIRA for that. -- Lefty On Thu, Feb 26, 2015 at 6:40 PM, Alan Gates alanfga...@gmail.com wrote: The issues list should be getting all JIRA notifications except CREATE, you shouldn't need to watch it. You can go look at the mail archives to see if the issue is with the list or on your end. If they aren't all showing up in the list we should file an INFRA JIRA. Alan. Lefty Leverenz leftylever...@gmail.com February 26, 2015 at 14:03 Being a watcher could explain why I got the last two messages for HIVE-9731 https://issues.apache.org/jira/browse/HIVE-9731 https://issues.apache.org/jira/browse/HIVE-9731 (TODOC1.2, doc comment) but then why did I get all the messages before the commit, when I wasn't a watcher yet? And I'm not a watcher for HIVE-9509 https://issues.apache.org/jira/browse/HIVE-9509 https://issues.apache.org/jira/browse/HIVE-9509 although I got all the messages up to Ashutosh's +1. This is something I can work around, but if others have the same problem we might need to figure it out and get it fixed. -- Lefty Xuefu Zhang xzh...@cloudera.com February 26, 2015 at 13:12 I think you get subsequent messages only if you're a watcher. You become a watcher after you comment or make changes for a JIRA. Is this your case? On Thu, Feb 26, 2015 at 1:05 PM, Lefty Leverenz leftylever...@gmail.com leftylever...@gmail.com Lefty Leverenz leftylever...@gmail.com February 26, 2015 at 13:05 Is it just me, or has JIRA email been flaky since the issues mailing list went into effect? For example, although I got a commit message for HIVE-9731 https://issues.apache.org/jira/browse/HIVE-9731 https://issues.apache.org/jira/browse/HIVE-9731 on the commits@hive mailing list, I didn't get either the change of Resolution or the comment about the commit -- not on dev@hive nor on issues@hive. However I did get two subsequent messages on issues@hive when I added a TODOC1.2 label and doc comment. Another example: for HIVE-9509 https://issues.apache.org/jira/browse/HIVE-9509 https://issues.apache.org/jira/browse/HIVE-9509 I got the commit comment and Fix Version change for branch-1.0.1 on the issues@hive list but not the prior commit comment and Resolved - Fix Version for 1.2.0. Does anyone else have this problem? -- Lefty
[jira] [Created] (HIVE-10122) Hive metastore filter-by-expression is broken for non-partition expressions
Sergey Shelukhin created HIVE-10122: --- Summary: Hive metastore filter-by-expression is broken for non-partition expressions Key: HIVE-10122 URL: https://issues.apache.org/jira/browse/HIVE-10122 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin See https://issues.apache.org/jira/browse/HIVE-10091?focusedCommentId=14382413page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14382413 These two lines of code {noformat} // Replace virtual columns with nulls. See javadoc for details. prunerExpr = removeNonPartCols(prunerExpr, extractPartColNames(tab), partColsUsedInFilter); // Remove all parts that are not partition columns. See javadoc for details. ExprNodeDesc compactExpr = compactExpr(prunerExpr.clone()); {noformat} are supposed to take care of this; I see there were bunch of changes to this code over some time, and now it appears to be broken. Thanks to [~thejas] for info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10124) Add iss...@hive.apache.org to Mailing Lists page
Lefty Leverenz created HIVE-10124: - Summary: Add iss...@hive.apache.org to Mailing Lists page Key: HIVE-10124 URL: https://issues.apache.org/jira/browse/HIVE-10124 Project: Hive Issue Type: Bug Reporter: Lefty Leverenz Now that Hive has a separate mailing list for issue comments and QA messages, it needs to be added to the Mailing Lists page on the website (http://hive.apache.org/mailing_lists.html). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10123) Hybrid grace Hash join : Use estimate key count from stats to initialize BytesBytesMultiHashMap
Mostafa Mokhtar created HIVE-10123: -- Summary: Hybrid grace Hash join : Use estimate key count from stats to initialize BytesBytesMultiHashMap Key: HIVE-10123 URL: https://issues.apache.org/jira/browse/HIVE-10123 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 1.2.0 Hybrid grace Hash join is not using estimated number of rows from the statistics to initialize BytesBytesMultiHashMap. Add some logging to BytesBytesMultiHashMap to track get probes and use msec for expandAndRehash as us overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10125) LLAP: Print execution modes in tez in-place UI
Prasanth Jayachandran created HIVE-10125: Summary: LLAP: Print execution modes in tez in-place UI Key: HIVE-10125 URL: https://issues.apache.org/jira/browse/HIVE-10125 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran There are different execution modes container, llap and uber. Print the execution mode of the work in in-place UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10130) Merge from Spark branch to trunk 03/27/2015
Xuefu Zhang created HIVE-10130: -- Summary: Merge from Spark branch to trunk 03/27/2015 Key: HIVE-10130 URL: https://issues.apache.org/jira/browse/HIVE-10130 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10128) LLAP: BytesBytesMultiHashMap does not allow concurrent read-only access
Gopal V created HIVE-10128: -- Summary: LLAP: BytesBytesMultiHashMap does not allow concurrent read-only access Key: HIVE-10128 URL: https://issues.apache.org/jira/browse/HIVE-10128 Project: Hive Issue Type: Sub-task Reporter: Gopal V Assignee: Sergey Shelukhin The multi-threaded performance takes a serious hit when LLAP shares hashtables between the probe threads running in parallel. !hashmap-sync.png! This is an explicit synchronized block inside ReusableRowContainer which triggers this particular pattern. !hashmap-sync-source.png! Looking deeper into the code, the synchronization seems to be caused due to the fact that WriteBuffers.setReadPoint modifies the otherwise read-only hashtable. To generate this sort of result, run LLAP at a WARN log-level, to avoid all the log synchronization that otherwise affects the thread sync. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10129) LLAP: Fix ordering of execution modes
Prasanth Jayachandran created HIVE-10129: Summary: LLAP: Fix ordering of execution modes Key: HIVE-10129 URL: https://issues.apache.org/jira/browse/HIVE-10129 Project: Hive Issue Type: Sub-task Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10129.1.patch uber llap container execution modes. Fix the ordering in in-place update UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10126) upgrade Tez dependency to the the latest released version
Na Yang created HIVE-10126: -- Summary: upgrade Tez dependency to the the latest released version Key: HIVE-10126 URL: https://issues.apache.org/jira/browse/HIVE-10126 Project: Hive Issue Type: Bug Reporter: Na Yang Tez 0.6 has been released. It will be nice to upgrade the tez dependency to the latest released version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10127) LLAP: Port changes to timestamp stream reader after timezone fix in trunk
Prasanth Jayachandran created HIVE-10127: Summary: LLAP: Port changes to timestamp stream reader after timezone fix in trunk Key: HIVE-10127 URL: https://issues.apache.org/jira/browse/HIVE-10127 Project: Hive Issue Type: Sub-task Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Timezone fix changes in trunk (HIVE-8746) needs changes to llap stream readers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10131) LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs
Sergey Shelukhin created HIVE-10131: --- Summary: LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs Key: HIVE-10131 URL: https://issues.apache.org/jira/browse/HIVE-10131 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Refs are always allocated and cleared. Should be reused. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 32595: Add Calcite's project merge rule.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32595/ --- Review request for hive and John Pullokkaran. Bugs: HIVE-10038 https://issues.apache.org/jira/browse/HIVE-10038 Repository: hive Description --- Add Calcite's project merge rule. Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java 1669497 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java 1669497 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 1669497 trunk/ql/src/test/queries/clientpositive/leadlag.q 1669497 trunk/ql/src/test/results/clientpositive/alter_partition_coltype.q.out 1669497 trunk/ql/src/test/results/clientpositive/annotate_stats_select.q.out 1669675 trunk/ql/src/test/results/clientpositive/auto_join1.q.out 1669497 trunk/ql/src/test/results/clientpositive/auto_join10.q.out 1669497 trunk/ql/src/test/results/clientpositive/auto_join11.q.out 1669497 trunk/ql/src/test/results/clientpositive/auto_join12.q.out 1669497 trunk/ql/src/test/results/clientpositive/auto_join13.q.out 1669497 trunk/ql/src/test/results/clientpositive/auto_join14.q.out 1669497 trunk/ql/src/test/results/clientpositive/auto_join22.q.out 1669497 trunk/ql/src/test/results/clientpositive/auto_join26.q.out 1669497 trunk/ql/src/test/results/clientpositive/auto_join_nulls.q.out 1669497 trunk/ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 1669497 trunk/ql/src/test/results/clientpositive/combine2.q.out 1669497 trunk/ql/src/test/results/clientpositive/correlationoptimizer1.q.out 1669497 trunk/ql/src/test/results/clientpositive/correlationoptimizer12.q.out 1669497 trunk/ql/src/test/results/clientpositive/correlationoptimizer3.q.out 1669497 trunk/ql/src/test/results/clientpositive/ctas_colname.q.out 1669497 trunk/ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_dynamic.q.out 1669497 trunk/ql/src/test/results/clientpositive/explain_logical.q.out 1669497 trunk/ql/src/test/results/clientpositive/groupby_map_ppr.q.out 1669497 trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct.q.out 1669497 trunk/ql/src/test/results/clientpositive/groupby_ppr.q.out 1669497 trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out 1669497 trunk/ql/src/test/results/clientpositive/groupby_sort_1_23.q.out 1669497 trunk/ql/src/test/results/clientpositive/groupby_sort_6.q.out 1669497 trunk/ql/src/test/results/clientpositive/groupby_sort_skew_1_23.q.out 1669497 trunk/ql/src/test/results/clientpositive/input_part1.q.out 1669497 trunk/ql/src/test/results/clientpositive/join28.q.out 1669497 trunk/ql/src/test/results/clientpositive/join29.q.out 1669497 trunk/ql/src/test/results/clientpositive/join31.q.out 1669497 trunk/ql/src/test/results/clientpositive/join32.q.out 1669497 trunk/ql/src/test/results/clientpositive/join32_lessSize.q.out 1669497 trunk/ql/src/test/results/clientpositive/join33.q.out 1669497 trunk/ql/src/test/results/clientpositive/join35.q.out 1669497 trunk/ql/src/test/results/clientpositive/leadlag.q.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_1.q.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_11.q.java1.7.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_12.q.java1.7.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_13.q.java1.7.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_2.q.java1.7.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_3.q.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_4.q.java1.7.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_5.q.java1.7.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_6.q.java1.7.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_7.q.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_8.q.java1.7.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_dml_9.q.java1.7.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_query_multiskew_1.q.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_query_multiskew_3.q.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_query_oneskew_1.q.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_query_oneskew_2.q.out 1669497 trunk/ql/src/test/results/clientpositive/list_bucket_query_oneskew_3.q.out 1669497 trunk/ql/src/test/results/clientpositive/load_dyn_part14.q.out 1669497
[jira] [Created] (HIVE-10132) LLAP: Tez heartbeats are delayed by ~500+ ms due to Hadoop IPC client
Gopal V created HIVE-10132: -- Summary: LLAP: Tez heartbeats are delayed by ~500+ ms due to Hadoop IPC client Key: HIVE-10132 URL: https://issues.apache.org/jira/browse/HIVE-10132 Project: Hive Issue Type: Sub-task Reporter: Gopal V HADOOP-11772 has a clearer bug report of the core issue inside hadoop-common. Due to the delayed heartbeats reaching the AM, the reducers are losing up-to a couple of seconds for a 60ms (x10 parallel) mapper + 300ms reducer instead of finishing the query in under a second. -- This message was sent by Atlassian JIRA (v6.3.4#6332)