[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445339#comment-16445339 ] Vineet Garg commented on HIVE-18410: Pushed to branch-3 > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.2.1, 2.1.0, 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 3.1.0 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, > profiling_with_patch.png, profiling_without_patch.nps, > profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441754#comment-16441754 ] Hive QA commented on HIVE-18410: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12909083/HIVE-18410_3.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 39 failed/errored test(s), 14239 tests executed *Failed tests:* {noformat} TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed out) (batchId=247) TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed out) (batchId=247) TestNonCatCallsWithCatalog - did not produce a TEST-*.xml file (likely timed out) (batchId=217) TestSequenceFileReadWrite - did not produce a TEST-*.xml file (likely timed out) (batchId=247) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_smb] (batchId=92) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_0] (batchId=17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[results_cache_invalidation2] (batchId=39) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[tez_join_hash] (batchId=54) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[results_cache_invalidation2] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_1] (batchId=171) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=105) org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_dyn_part] (batchId=93) org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_map_operators] (batchId=93) org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_num_buckets] (batchId=93) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_notnull_constraint_violation] (batchId=96) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[default_constraint_invalid_default_value_type] (batchId=96) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_into_acid_notnull] (batchId=95) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_into_notnull_constraint] (batchId=95) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_multi_into_notnull] (batchId=96) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_overwrite_notnull_constraint] (batchId=96) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insertsel_fail] (batchId=95) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[update_notnull_constraint] (batchId=95) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=225) org.apache.hadoop.hive.druid.TestDruidStorageHandler.testCommitMultiInsertOverwriteTable (batchId=261) org.apache.hadoop.hive.ql.TestAcidOnTez.testAcidInsertWithRemoveUnion (batchId=228) org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=228) org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 (batchId=228) org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 (batchId=232) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testCancelRenewTokenFlow (batchId=254) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testConnection (batchId=254) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testIsValid (batchId=254) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testIsValidNeg (batchId=254) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testNegativeProxyAuth (batchId=254) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testNegativeTokenAuth (batchId=254) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testProxyAuth (batchId=254) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testRenewDelegationToken (batchId=254) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testTokenAuth (batchId=254) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/10282/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/10282/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-10282/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 39 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12909083 - PreCommit-HIVE-Build > [Performance][Avro] Reading flat Avro
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441733#comment-16441733 ] Hive QA commented on HIVE-18410: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 6s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 48m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-10282/dev-support/hive-personality.sh | | git revision | master / 4cfec3e | | Default Java | 1.8.0_111 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-10282/yetus/patch-asflicense-problems.txt | | modules | C: . U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-10282/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.2.1, 2.1.0, 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 2.3.2, 3.1.0 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, > profiling_with_patch.png, profiling_without_patch.nps, > profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440167#comment-16440167 ] Ashutosh Chauhan commented on HIVE-18410: - +1 > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.2.1, 2.1.0, 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 2.3.2, 3.1.0 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, > profiling_with_patch.png, profiling_without_patch.nps, > profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431272#comment-16431272 ] Vineet Garg commented on HIVE-18410: Deferring this to 3.1.0 since the branch for 3.0.0 has been cut off. Please update the JIRA if you would like to get your patch in 3.0.0. > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.2.1, 2.1.0, 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 2.3.2, 3.1.0 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, > profiling_with_patch.png, profiling_without_patch.nps, > profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351368#comment-16351368 ] Hive QA commented on HIVE-18410: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12909083/HIVE-18410_3.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 26 failed/errored test(s), 12972 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=49) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_cttas] (batchId=47) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mm_cttas] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=163) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat] (batchId=180) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.ql.TestTxnNoBuckets.testCTAS (batchId=280) org.apache.hadoop.hive.ql.TestTxnNoBucketsVectorized.testCTAS (batchId=280) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/9000/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/9000/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-9000/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 26 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12909083 - PreCommit-HIVE-Build > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 3.0.0, 2.3.2 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, > profiling_with_patch.png, profiling_without_patch.nps, > profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351363#comment-16351363 ] Hive QA commented on HIVE-18410: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 24s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 57s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 4a33ec8 | | Default Java | 1.8.0_111 | | modules | C: . U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-9000/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 3.0.0, 2.3.2 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, > profiling_with_patch.png, profiling_without_patch.nps, > profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349813#comment-16349813 ] Hive QA commented on HIVE-18410: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908852/HIVE-18410_2.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 12967 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby_empty] (batchId=82) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=163) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8981/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8981/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8981/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 21 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12908852 - PreCommit-HIVE-Build > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 3.0.0, 2.3.2 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, profiling_with_patch.nps, profiling_with_patch.png, > profiling_without_patch.nps, profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349791#comment-16349791 ] Hive QA commented on HIVE-18410: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 35s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | modules | C: . U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8981/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 3.0.0, 2.3.2 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, profiling_with_patch.nps, profiling_with_patch.png, > profiling_without_patch.nps, profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324915#comment-16324915 ] Hive QA commented on HIVE-18410: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12905898/HIVE-18410_1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11568 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_binary_storage_queries] (batchId=99) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=168) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=159) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part] (batchId=93) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1] (batchId=93) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=120) org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation (batchId=213) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=225) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8603/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8603/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8603/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 19 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12905898 - PreCommit-HIVE-Build > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Fix For: 2.3.2 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > profiling_with_patch.nps, profiling_with_patch.png, > profiling_without_patch.nps, profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324886#comment-16324886 ] Hive QA commented on HIVE-18410: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s{color} | {color:red} serde: The patch generated 2 new + 58 unchanged - 2 fixed = 60 total (was 60) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 8m 20s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 0a62507 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8603/yetus/diff-checkstyle-serde.txt | | modules | C: serde U: serde | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8603/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Fix For: 2.3.2 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > profiling_with_patch.nps, profiling_with_patch.png, > profiling_without_patch.nps, profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324342#comment-16324342 ] Ratandeep Ratti commented on HIVE-18410: https://reviews.apache.org/r/65036 > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Fix For: 2.3.2 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > profiling_with_patch.nps, profiling_with_patch.png, > profiling_without_patch.nps, profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318173#comment-16318173 ] Hive QA commented on HIVE-18410: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12905197/HIVE-18410.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8518/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8518/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8518/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2018-01-09 10:02:07.208 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-8518/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2018-01-09 10:02:07.211 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 8412748 HIVE-18330 : Fix TestMsgBusConnection - doesn't test tests the original intention (Zoltan Haindrich via Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 8412748 HIVE-18330 : Fix TestMsgBusConnection - doesn't test tests the original intention (Zoltan Haindrich via Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2018-01-09 10:02:10.579 + rm -rf ../yetus + mkdir ../yetus + cp -R . ../yetus + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-8518/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java:198 Falling back to three-way merge... Applied patch to 'serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java' with conflicts. Going to apply patch with: git apply -p0 error: patch failed: serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java:198 Falling back to three-way merge... Applied patch to 'serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java' with conflicts. U serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12905197 - PreCommit-HIVE-Build > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Fix For: 2.3.2 > > Attachments: HIVE-18410.patch, profiling_with_patch.nps, > profiling_with_patch.png, profiling_without_patch.nps, > profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317521#comment-16317521 ] Ratandeep Ratti commented on HIVE-18410: Attached profiling data > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Fix For: 2.3.2 > > Attachments: profiling_with_patch.nps, profiling_with_patch.png, > profiling_without_patch.nps, profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029)