Re: Review Request 38663: HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38663/ --- (Updated Nov. 18, 2015, 5:21 a.m.) Review request for hive. Changes --- Using SystemClassloder as parent of per session classloader Summary (updated) - HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive Bugs: HIVE-11878 https://issues.apache.org/jira/browse/HIVE-11878 Repository: hive-git Description (updated) --- HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive Diffs (updated) - conf/ivysettings.xml bda842a89bb07710fdcd7180a00833a7388ada8f itests/custom-udfs/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf1/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf1/src/main/java/hive/it/custom/udfs/UDF1.java PRE-CREATION itests/custom-udfs/udf-classloader-udf2/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf2/src/main/java/hive/it/custom/udfs/UDF2.java PRE-CREATION itests/custom-udfs/udf-classloader-util/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-util/src/main/java/hive/it/custom/udfs/Util.java PRE-CREATION itests/pom.xml 0686f1fd58c2be26b2ee645c4e244159aec565e5 itests/qtest/pom.xml 8db6fb04d0a5d4600bc23543a0215d31c1cd0648 ql/src/java/org/apache/hadoop/hive/ql/exec/UDFClassLoader.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java de2eb984159526048e8dacf71d3ff8b0647394a3 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java ff875df98e1dd64a8af3ad22f4b38dbc1d6a1923 ql/src/test/queries/clientpositive/udf_classloader.q PRE-CREATION ql/src/test/queries/clientpositive/udf_classloader_dynamic_dependency_resolution.q PRE-CREATION ql/src/test/results/clientpositive/udf_classloader.q.out PRE-CREATION ql/src/test/results/clientpositive/udf_classloader_dynamic_dependency_resolution.q.out PRE-CREATION Diff: https://reviews.apache.org/r/38663/diff/ Testing --- Thanks, Ratandeep Ratti
Re: Review Request 38663: HIVE-11878
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38663/ --- (Updated Oct. 1, 2015, 6:24 p.m.) Review request for hive. Summary (updated) - HIVE-11878 Bugs: HIVE-11878 https://issues.apache.org/jira/browse/HIVE-11878 Repository: hive-git Description (updated) --- HIVE-11878 Diffs (updated) - .reviewboardrc abc33f91a44b76573cbba334c33417307c63956f conf/ivysettings.xml bda842a89bb07710fdcd7180a00833a7388ada8f itests/custom-udfs/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf1/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf1/src/main/java/hive/it/custom/udfs/UDF1.java PRE-CREATION itests/custom-udfs/udf-classloader-udf2/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf2/src/main/java/hive/it/custom/udfs/UDF2.java PRE-CREATION itests/custom-udfs/udf-classloader-util/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-util/src/main/java/hive/it/custom/udfs/Util.java PRE-CREATION itests/pom.xml acce7131948edd5aeab34af6879d781daa12ba30 itests/qtest/pom.xml 74ca88f586b7ff9ddc38e8bc22a9445c85790c87 ql/src/java/org/apache/hadoop/hive/ql/exec/UDFClassLoader.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java bcf85a471c421dd0de62a894fd6c90024dff5691 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java dc8c336c01921c8e4ae251f3627a80ce95a66182 ql/src/test/queries/clientpositive/udf_classloader.q PRE-CREATION ql/src/test/queries/clientpositive/udf_classloader_dynamic_dependency_resolution.q PRE-CREATION ql/src/test/results/clientpositive/udf_classloader.q.out PRE-CREATION ql/src/test/results/clientpositive/udf_classloader_dynamic_dependency_resolution.q.out PRE-CREATION Diff: https://reviews.apache.org/r/38663/diff/ Testing --- Thanks, Ratandeep Ratti
Review Request 38663: HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38663/ --- Review request for hive. Bugs: HIVE-11878 https://issues.apache.org/jira/browse/HIVE-11878 Repository: hive-git Description --- HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive Diffs - contrib/src/java/org/apache/hadoop/hive/contrib/classloader/ClassA.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/classloader/UDF1.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/classloader/UDF2.java PRE-CREATION itests/pom.xml acce7131948edd5aeab34af6879d781daa12ba30 itests/qtest/pom.xml 0588233b250f7c78f594bb36554a80990e907550 ql/src/java/org/apache/hadoop/hive/ql/exec/UDFClassLoader.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ca863019f3347c94852dcad2a21c43758aed30a7 ql/src/test/queries/clientpositive/test_classloader.q PRE-CREATION ql/src/test/results/clientpositive/test_classloader.q.out PRE-CREATION Diff: https://reviews.apache.org/r/38663/diff/ Testing --- Thanks, Ratandeep Ratti
Re: Review Request 38663: HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38663/ --- (Updated Dec. 9, 2015, 8:33 a.m.) Review request for hive. Changes --- Addressed failing tests Bugs: HIVE-11878 https://issues.apache.org/jira/browse/HIVE-11878 Repository: hive-git Description --- HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive Diffs (updated) - conf/ivysettings.xml bda842a itests/custom-udfs/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf1/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf1/src/main/java/hive/it/custom/udfs/UDF1.java PRE-CREATION itests/custom-udfs/udf-classloader-udf2/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf2/src/main/java/hive/it/custom/udfs/UDF2.java PRE-CREATION itests/custom-udfs/udf-classloader-util/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-util/src/main/java/hive/it/custom/udfs/Util.java PRE-CREATION itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerShowFilters.java 0c03a00 itests/pom.xml 5d8249f itests/qtest/pom.xml 8f6807a ql/src/java/org/apache/hadoop/hive/ql/exec/UDFClassLoader.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java c01994f ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 5c69fb6 ql/src/test/queries/clientpositive/udf_classloader.q PRE-CREATION ql/src/test/queries/clientpositive/udf_classloader_dynamic_dependency_resolution.q PRE-CREATION ql/src/test/results/clientpositive/udf_classloader.q.out PRE-CREATION ql/src/test/results/clientpositive/udf_classloader_dynamic_dependency_resolution.q.out PRE-CREATION Diff: https://reviews.apache.org/r/38663/diff/ Testing --- Thanks, Ratandeep Ratti
Re: Review Request 38663: HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38663/#review107745 --- ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java (line 371) <https://reviews.apache.org/r/38663/#comment167030> Makes sense. ContextClassLoader will be set when ever SessionState.start() will be called (which internally will call attach). I'll remove it from the Constructor. - Ratandeep Ratti On Nov. 18, 2015, 5:21 a.m., Ratandeep Ratti wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/38663/ > --- > > (Updated Nov. 18, 2015, 5:21 a.m.) > > > Review request for hive. > > > Bugs: HIVE-11878 > https://issues.apache.org/jira/browse/HIVE-11878 > > > Repository: hive-git > > > Description > --- > > HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are > registered one at a time in Hive > > > Diffs > - > > conf/ivysettings.xml bda842a89bb07710fdcd7180a00833a7388ada8f > itests/custom-udfs/pom.xml PRE-CREATION > itests/custom-udfs/udf-classloader-udf1/pom.xml PRE-CREATION > > itests/custom-udfs/udf-classloader-udf1/src/main/java/hive/it/custom/udfs/UDF1.java > PRE-CREATION > itests/custom-udfs/udf-classloader-udf2/pom.xml PRE-CREATION > > itests/custom-udfs/udf-classloader-udf2/src/main/java/hive/it/custom/udfs/UDF2.java > PRE-CREATION > itests/custom-udfs/udf-classloader-util/pom.xml PRE-CREATION > > itests/custom-udfs/udf-classloader-util/src/main/java/hive/it/custom/udfs/Util.java > PRE-CREATION > itests/pom.xml 0686f1fd58c2be26b2ee645c4e244159aec565e5 > itests/qtest/pom.xml 8db6fb04d0a5d4600bc23543a0215d31c1cd0648 > ql/src/java/org/apache/hadoop/hive/ql/exec/UDFClassLoader.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java > de2eb984159526048e8dacf71d3ff8b0647394a3 > ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java > ff875df98e1dd64a8af3ad22f4b38dbc1d6a1923 > ql/src/test/queries/clientpositive/udf_classloader.q PRE-CREATION > > ql/src/test/queries/clientpositive/udf_classloader_dynamic_dependency_resolution.q > PRE-CREATION > ql/src/test/results/clientpositive/udf_classloader.q.out PRE-CREATION > > ql/src/test/results/clientpositive/udf_classloader_dynamic_dependency_resolution.q.out > PRE-CREATION > > Diff: https://reviews.apache.org/r/38663/diff/ > > > Testing > --- > > > Thanks, > > Ratandeep Ratti > >
Re: Review Request 38663: HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38663/ --- (Updated Nov. 24, 2015, 9:56 a.m.) Review request for hive. Changes --- Addressed Jason's comments Bugs: HIVE-11878 https://issues.apache.org/jira/browse/HIVE-11878 Repository: hive-git Description --- HIVE-11878: ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive Diffs (updated) - conf/ivysettings.xml bda842a89bb07710fdcd7180a00833a7388ada8f itests/custom-udfs/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf1/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf1/src/main/java/hive/it/custom/udfs/UDF1.java PRE-CREATION itests/custom-udfs/udf-classloader-udf2/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-udf2/src/main/java/hive/it/custom/udfs/UDF2.java PRE-CREATION itests/custom-udfs/udf-classloader-util/pom.xml PRE-CREATION itests/custom-udfs/udf-classloader-util/src/main/java/hive/it/custom/udfs/Util.java PRE-CREATION itests/pom.xml 0686f1fd58c2be26b2ee645c4e244159aec565e5 itests/qtest/pom.xml 8db6fb04d0a5d4600bc23543a0215d31c1cd0648 ql/src/java/org/apache/hadoop/hive/ql/exec/UDFClassLoader.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java de2eb984159526048e8dacf71d3ff8b0647394a3 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java ff875df98e1dd64a8af3ad22f4b38dbc1d6a1923 ql/src/test/queries/clientpositive/udf_classloader.q PRE-CREATION ql/src/test/queries/clientpositive/udf_classloader_dynamic_dependency_resolution.q PRE-CREATION ql/src/test/results/clientpositive/udf_classloader.q.out PRE-CREATION ql/src/test/results/clientpositive/udf_classloader_dynamic_dependency_resolution.q.out PRE-CREATION Diff: https://reviews.apache.org/r/38663/diff/ Testing --- Thanks, Ratandeep Ratti
Re: Review Request 45348: HIVE-13363: Add hive.metastore.token.signature property to HiveConf
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45348/#review131711 --- Ship it! Looks good to me! - Ratandeep Ratti On May 4, 2016, 1:30 a.m., Anthony Hsu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45348/ > --- > > (Updated May 4, 2016, 1:30 a.m.) > > > Review request for hive, Carl Steinbach and Ratandeep Ratti. > > > Bugs: HIVE-13363 > https://issues.apache.org/jira/browse/HIVE-13363 > > > Repository: hive-git > > > Description > --- > > No logic changes, just added METASTORE_TOKEN_SIGNATURE property to HiveConf > and replaced all instances of `hive.metastore.token.signature` with a > references to `HiveConf.ConfVars.METASTORE_TOKEN_SIGNATURE`. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > 06a6906ef1f5e0b7d941c042c74d257089f46f96 > hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java > 3ee30edef50940b4d9da21230177d6fb2a796819 > > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SecureProxySupport.java > 13f3c9bd5e523e770dd8ccfd75a442bbbf93b680 > > itests/hive-unit-hadoop2/src/test/java/org/apache/hadoop/hive/thrift/TestHadoopAuthBridge23.java > d07162bd46f8bea88d8c856552a2b4a2d83caf8d > > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java > 7d37d0706d5f0269b89c4c6486adecf4bb3d85b8 > > service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java > 025b0b810b040ba6ea72b900ccd0802e207033a8 > > Diff: https://reviews.apache.org/r/45348/diff/ > > > Testing > --- > > Ran `grep -r hive.metastore.token.signature --include=*.java *` and saw that > the only occurrences of this string are in HiveConf.java and a comment in > Security.java. > > > Thanks, > > Anthony Hsu > >
Re: Review Request 57632: HIVE-16206: Provide wrapper classes for current metrics reporters to allow uniform instantiation through reflection
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57632/#review169572 --- common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/JsonFileMetricsReporter.java Lines 83 (patched) <https://reviews.apache.org/r/57632/#comment241986> Can some of the logic can be moved out of the run method? 1. jsonMapper.writerWithDefaultPrettyPrettyPrinter can be cached outside of run. 2. determining tmpPath 3. determing right fs? - Ratandeep Ratti On March 21, 2017, 4:05 p.m., Sunitha Beeram wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57632/ > --- > > (Updated March 21, 2017, 4:05 p.m.) > > > Review request for hive, Carl Steinbach and Ratandeep Ratti. > > > Bugs: Hive-16206 > https://issues.apache.org/jira/browse/Hive-16206 > > > Repository: hive-git > > > Description > --- > > HIVE-16206: Address review comments > > > Diffs > - > > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java > e8abf6cf06afc9fa590af3a447eacc67735a69e6 > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleReporter.java > PRE-CREATION > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/ConsoleMetricsReporter.java > PRE-CREATION > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/JmxMetricsReporter.java > PRE-CREATION > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/JsonFileMetricsReporter.java > PRE-CREATION > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics2Reporter.java > PRE-CREATION > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > 1fb32533d58af4ec622feb320bf9315da5db6e76 > > common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/TestCodahaleMetrics.java > aa4e75f9f8160d1b54b14c1a23ea42e156bd45ca > > > Diff: https://reviews.apache.org/r/57632/diff/4/ > > > Testing > --- > > Updated unit tests and all unit tests passed locally. > > > Thanks, > > Sunitha Beeram > >
Re: Review Request 57632: HIVE-16206: Provide wrapper classes for current metrics reporters to allow uniform instantiation through reflection
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57632/#review169570 --- common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/JsonFileMetricsReporter.java Lines 74 (patched) <https://reviews.apache.org/r/57632/#comment241985> What all exception types are thrown here? Is it possible to act on some of these? - Ratandeep Ratti On March 21, 2017, 4:05 p.m., Sunitha Beeram wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57632/ > --- > > (Updated March 21, 2017, 4:05 p.m.) > > > Review request for hive, Carl Steinbach and Ratandeep Ratti. > > > Bugs: Hive-16206 > https://issues.apache.org/jira/browse/Hive-16206 > > > Repository: hive-git > > > Description > --- > > HIVE-16206: Address review comments > > > Diffs > - > > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java > e8abf6cf06afc9fa590af3a447eacc67735a69e6 > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleReporter.java > PRE-CREATION > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/ConsoleMetricsReporter.java > PRE-CREATION > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/JmxMetricsReporter.java > PRE-CREATION > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/JsonFileMetricsReporter.java > PRE-CREATION > > common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics2Reporter.java > PRE-CREATION > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > 1fb32533d58af4ec622feb320bf9315da5db6e76 > > common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/TestCodahaleMetrics.java > aa4e75f9f8160d1b54b14c1a23ea42e156bd45ca > > > Diff: https://reviews.apache.org/r/57632/diff/4/ > > > Testing > --- > > Updated unit tests and all unit tests passed locally. > > > Thanks, > > Sunitha Beeram > >
Re: Review Request 62321: HIVE-17530: ClassCastException when converting uniontype
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62321/#review185485 --- Ship it! LGTM - Ratandeep Ratti On Sept. 15, 2017, 1:52 a.m., Anthony Hsu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62321/ > --- > > (Updated Sept. 15, 2017, 1:52 a.m.) > > > Review request for hive, Carl Steinbach and Ratandeep Ratti. > > > Bugs: HIVE-17530 > https://issues.apache.org/jira/browse/HIVE-17530 > > > Repository: hive-git > > > Description > --- > > Previously, StandardUnionObjectInspector was creating an ArrayList instead of > a StandardUnion, causing the exception > > ``` > java.lang.ClassCastException: java.util.ArrayList cannot be cast to > org.apache.hadoop.hive.serde2.objectinspector.UnionObject > ``` > > This patch fixes this. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorDeserializeRow.java > 2ad06fc12869e74e14aae7b7a36685482c4a1ade > ql/src/test/queries/clientpositive/orc_avro_partition_uniontype.q > PRE-CREATION > ql/src/test/results/clientpositive/orc_avro_partition_uniontype.q.out > PRE-CREATION > > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java > 7921de8d9c4a56af715de5498954794aaba32fff > > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SettableUnionObjectInspector.java > 564d8d60451d9756eca1f1edcc84248e4f559828 > > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardUnionObjectInspector.java > 7b2868233f127899c7dca07d4f899b24ae2cbc1b > > serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorConverters.java > 2e1bb22cea715501749ee5e169ce34f7dc789e64 > > > Diff: https://reviews.apache.org/r/62321/diff/2/ > > > Testing > --- > > Added qtest. > > > Thanks, > > Anthony Hsu > >
Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62247/#review185210 --- Ship it! LGTM - Ratandeep Ratti On Sept. 12, 2017, 3:04 p.m., Anthony Hsu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62247/ > --- > > (Updated Sept. 12, 2017, 3:04 p.m.) > > > Review request for hive, Carl Steinbach and Ratandeep Ratti. > > > Bugs: HIVE-17394 > https://issues.apache.org/jira/browse/HIVE-17394 > > > Repository: hive-git > > > Description > --- > > Previously, when Avro found a nullable union in the reader schema, it would > regenerate the TypeInfo for the field for every record. This patch reuses the > same TypeInfo that only needs to be calculated once. > > In our testing, we found this improved count() queries by 2x. > > > Diffs > - > > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java > ecfe15f59dac04bda3f8f1275babebf736608a6b > > > Diff: https://reviews.apache.org/r/62247/diff/1/ > > > Testing > --- > > `mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded. > > > Thanks, > > Anthony Hsu > >
Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62247/#review185212 --- serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java Line 305 (original), 305 (patched) <https://reviews.apache.org/r/62247/#comment261498> This comment is misleading now and can be removed. - Ratandeep Ratti On Sept. 12, 2017, 3:04 p.m., Anthony Hsu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62247/ > --- > > (Updated Sept. 12, 2017, 3:04 p.m.) > > > Review request for hive, Carl Steinbach and Ratandeep Ratti. > > > Bugs: HIVE-17394 > https://issues.apache.org/jira/browse/HIVE-17394 > > > Repository: hive-git > > > Description > --- > > Previously, when Avro found a nullable union in the reader schema, it would > regenerate the TypeInfo for the field for every record. This patch reuses the > same TypeInfo that only needs to be calculated once. > > In our testing, we found this improved count() queries by 2x. > > > Diffs > - > > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java > ecfe15f59dac04bda3f8f1275babebf736608a6b > > > Diff: https://reviews.apache.org/r/62247/diff/1/ > > > Testing > --- > > `mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded. > > > Thanks, > > Anthony Hsu > >
[jira] [Created] (HIVE-9851) org.apache.hadoop.hive.serde2.avro.AvroSerializer should use org.apache.avro.generic.GenericData.Array when serializing a list
Ratandeep Ratti created HIVE-9851: - Summary: org.apache.hadoop.hive.serde2.avro.AvroSerializer should use org.apache.avro.generic.GenericData.Array when serializing a list Key: HIVE-9851 URL: https://issues.apache.org/jira/browse/HIVE-9851 Project: Hive Issue Type: Bug Components: Hive, Serializers/Deserializers Reporter: Ratandeep Ratti Currently AvroSerializer uses java.util.ArrayList for serializing a list in Hive. This causes problems when we need to convert the avro object into some other representation say a tuple in Pig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11639) hive-exec jar contains within itself other jars
Ratandeep Ratti created HIVE-11639: -- Summary: hive-exec jar contains within itself other jars Key: HIVE-11639 URL: https://issues.apache.org/jira/browse/HIVE-11639 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Ratandeep Ratti Looking at hive-exec-1.2.1.jar . I see that it contains the following other jars {code} jar -tf lib/hive-exec-1.2.1.jar | grep .jar minlog-1.2.jar objenesis-1.2.jar reflectasm-1.07-shaded.jar {code} The classes in these jars cannot be used unless we mess around with custom-classloaders -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered in Hive
Ratandeep Ratti created HIVE-11878: -- Summary: ClassNotFoundException can possibly occur if multiple jars are registered in Hive Key: HIVE-11878 URL: https://issues.apache.org/jira/browse/HIVE-11878 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Ratandeep Ratti Assignee: Ratandeep Ratti When we register a jar on the Hive console. Hive creates a fresh URL classloader which includes the path of the current jar to be registered and all the jar paths of the parent classloader. The parent classlaoder is the current ThreadContextClassLoader. Once the URLClassloader is created Hive sets that as the current ThreadContextClassloader. So if we register multiple jars in Hive, there will be multiple URLClassLoaders created, each classloader including the jars from its parent and the one extra jar to be registered. The last URLClassLoader created will end up as the current ThreadContextClassLoader. (See details: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) Now here's an example in which the above strategy can lead to a CNF exception. We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, the URLClassLoader *u1* is created and also set as the ThreadContextClassLoader. We register *j2* next, the new URLClassLoader created will be *u2* with *u1* as parent and *u2* becomes the new ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* whereas *u1* only has paths to *j1* (For details see: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). Now when we register class *c1* under a temporary function in Hive, we load the class using {code} class.forName("c1", true, Thread.currentThread().getContextClassLoader()) {code} . The currentThreadContext class-loader is *u2*, and it has the path to the class *c1*, but note that Class-loaders work by delegating to parent class-loader first. In this case class *c1* will be found and *defined* by class-loader *u1*. Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say initialize) is called in *c1*, which references the class *c2*, *c2* will not be found since the class-loader used to search for *c2* will be *u1* (Since the caller's class-loader is used to load a class) I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12714) Document and make explicit GenericUDF state serialization features.
Ratandeep Ratti created HIVE-12714: -- Summary: Document and make explicit GenericUDF state serialization features. Key: HIVE-12714 URL: https://issues.apache.org/jira/browse/HIVE-12714 Project: Hive Issue Type: New Feature Reporter: Ratandeep Ratti Assignee: Ratandeep Ratti Hi GenericUDF has a sort of hidden feature which is not publicized on any official Hive wikis. GenericUDF's state is serialized on the client state and is reconstructed on the slave nodes, using Kryo, with the required state intact. Seems like this is a nice feature. We should document this feature with shortcomings if any and have an explicit test case in the source code to make the contract explicit. Thought's welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13115) MetaStore Direct SQL calls fail when the columns schema for a partition is null
Ratandeep Ratti created HIVE-13115: -- Summary: MetaStore Direct SQL calls fail when the columns schema for a partition is null Key: HIVE-13115 URL: https://issues.apache.org/jira/browse/HIVE-13115 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Ratandeep Ratti We are seeing the following exception in our MetaStore logs {noformat} 2016-02-11 00:00:19,002 DEBUG metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:timingTrace(602)) - Direct SQL query in 5.842372ms + 1.066728ms, the query is [select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PART ITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? order by "PART_NAME" asc] 2016-02-11 00:00:19,021 ERROR metastore.ObjectStore (ObjectStore.java:handleDirectSqlError(2243)) - Direct SQL failed, falling back to ORM MetaException(message:Unexpected null for one of the IDs, SD 6437, column null, serde 6437 for a non- view) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:360) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:224) at org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1563) at org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1559) at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1570) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1553) at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108) at com.sun.proxy.$Proxy5.getPartitions(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2526) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8747) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8731) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1591) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} This direct SQL call fails for every {{getPartitions}} call and then falls back to ORM. The query which fails is {code} select PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS, SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB from PARTITIONS left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID left outer join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in ( ? ) order by PART_NAME asc; {code} By looking at the source {{MetaStoreDirectSql.java}}, the third column in the query ( SDS.CD_ID), the column descriptor ID, is null, which triggers the exception. This exception is not thrown from the ORM layer since it is more forgiving to the null column descriptor. See ObjectStore.java:1197 {code} List mFieldSchemas = msd.getCD() == null ? null : msd.getCD().getCols(); {code} I verified that this exception gets trigger
[jira] [Created] (HIVE-14351) Minor improvement in genUnionPlan method
Ratandeep Ratti created HIVE-14351: -- Summary: Minor improvement in genUnionPlan method Key: HIVE-14351 URL: https://issues.apache.org/jira/browse/HIVE-14351 Project: Hive Issue Type: Improvement Affects Versions: 2.1.0 Reporter: Ratandeep Ratti Assignee: Ratandeep Ratti {{org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#genUnionPlan}} method can trip new users reading the code. Specifically on line 8979 {code} HashMap<String, ColumnInfo> leftmap = leftRR.getFieldMap(leftalias); HashMap<String, ColumnInfo> rightmap = rightRR.getFieldMap(rightalias); {code} These column maps are actually LinkedHashMaps and the code relies on this fact when iterating the two union branches in order. This was not clear immediately and left me wondering how is it that traversal order is consistent. I've updated the code with this simple fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15107) HiveLexer can throw NPE in allowQuoteId
Ratandeep Ratti created HIVE-15107: -- Summary: HiveLexer can throw NPE in allowQuoteId Key: HIVE-15107 URL: https://issues.apache.org/jira/browse/HIVE-15107 Project: Hive Issue Type: Bug Affects Versions: 1.1.1 Reporter: Ratandeep Ratti Assignee: Ratandeep Ratti In HiveLexer.allowQuoteId we reference the HiveConf field, which may be null. The configuration field is set in ParseDriver only if the hive.ql.Context variable is not null. ParseDriver exposes API such as org.apache.hadoop.hive.ql.parse.ParseDriver#parse(java.lang.String) which can result in the hive.ql.Context field to be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field in a row
Ratandeep Ratti created HIVE-17394: -- Summary: AvroSerde is regenerating TypeInfo objects for each nullable Avro field in a row Key: HIVE-17394 URL: https://issues.apache.org/jira/browse/HIVE-17394 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Ratandeep Ratti The following methods in {{AvroDeserializer}} keep regenerating TypeInfo objects for every nullable field in a row. This is happening in the following methods. {code} private Object deserializeNullableUnion(Object datum, Schema fileSchema, Schema recordSchema) throws AvroSerdeException { // elided line 312: return worker(datum, fileSchema, newRecordSchema, SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null)); } .. private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema recordSchema) // elided line 357: return worker(datum, currentFileSchema, schema, SchemaToTypeInfo.generateTypeInfo(schema, null)); {code} This is really bad in terms of performance. I'm not sure why didn't we use the TypeInfo we already have instead of generating again for each nullable field. If you look at the {{worker}} method which calls the method {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field column is already determined. Not sure why we have to determine that information again. More the cache in SchmaToTypeInfo does not help in nullable Avro records case as checking if an Avro record schema object already exists in the cache requires traversing the all the fields in the record schema. I've attached profiling snapshot which shows maximum time is being spent in the cache. One way of fixing this IMO is to make use of the column TypeInfo which is already passed in the worker method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
Ratandeep Ratti created HIVE-18410: -- Summary: [Performance][Avro] Reading flat Avro tables is very expensive in Hive Key: HIVE-18410 URL: https://issues.apache.org/jira/browse/HIVE-18410 Project: Hive Issue Type: Improvement Reporter: Ratandeep Ratti Assignee: Ratandeep Ratti There's a performance penalty when reading flat [no nested fields] Avro tables. When reading the same flat dataset in Pig, it takes half the time. On profiling, a lot of time is spent in {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the time is spent in GenericData.get().resolveUnion(), which calls GenericData.getSchemaName(Object datum), which does a lot of instanceof checks. This could be simplified with performance benefits. A approach is described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-19256) UDF which shapes the input data according to the specified schema
Ratandeep Ratti created HIVE-19256: -- Summary: UDF which shapes the input data according to the specified schema Key: HIVE-19256 URL: https://issues.apache.org/jira/browse/HIVE-19256 Project: Hive Issue Type: New Feature Reporter: Ratandeep Ratti Assignee: Ratandeep Ratti We use this UDF a lot in our org. This UDF takes an object and a Hive schema and make sure the output object matches the schema completely. In some respects it is similar to {{named _struct}} UDF which can be used to select columns from a struct, but it is more general since it can work not only on structs, but all Hive data types (expect union). Also the schema can provide certain valid type conversions (int -> double etc) One scenario where this is quite useful is making sure that the Hive view created with a specific schema will have columns which will always match that schema. In Hive today when a view is created, new nested columns from the underlying table can leak out from the view, even though the user never wanted this behavior. Note that this leaking of columns is only for nested columns and not for top level columns, so in that regard this behavior of Hive is inconsistent. Sample usage of the UDF {code} generic_project(col, "struct<a:array<struct<c:int,d:string>>>") // Returning data which matches the input schema. Here extra columns which are not part of the input will be removed generic_project(col, "struct") // If the input column had a struct with col a as int . It would type cast 'a' to double. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)