subject:"\[jira\] \[Commented\] \(HIVE\-18410\) \[Performance\]\[Avro\] Reading flat Avro tables is very expensive in Hive"

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-04-19 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445339#comment-16445339
 ] 

Vineet Garg commented on HIVE-18410:


Pushed to branch-3

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 2.1.0, 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, 
> profiling_with_patch.png, profiling_without_patch.nps, 
> profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-04-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441754#comment-16441754
 ] 

Hive QA commented on HIVE-18410:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12909083/HIVE-18410_3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 39 failed/errored test(s), 14239 tests 
executed
*Failed tests:*
{noformat}
TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed 
out) (batchId=247)
TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed 
out) (batchId=247)
TestNonCatCallsWithCatalog - did not produce a TEST-*.xml file (likely timed 
out) (batchId=217)
TestSequenceFileReadWrite - did not produce a TEST-*.xml file (likely timed 
out) (batchId=247)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_smb] (batchId=92)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_0] 
(batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[results_cache_invalidation2]
 (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[tez_join_hash] 
(batchId=54)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[results_cache_invalidation2]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_1] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=105)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_dyn_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_map_operators]
 (batchId=93)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_num_buckets]
 (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_notnull_constraint_violation]
 (batchId=96)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[default_constraint_invalid_default_value_type]
 (batchId=96)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_into_acid_notnull]
 (batchId=95)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_into_notnull_constraint]
 (batchId=95)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_multi_into_notnull]
 (batchId=96)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_overwrite_notnull_constraint]
 (batchId=96)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insertsel_fail] 
(batchId=95)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[update_notnull_constraint]
 (batchId=95)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=225)
org.apache.hadoop.hive.druid.TestDruidStorageHandler.testCommitMultiInsertOverwriteTable
 (batchId=261)
org.apache.hadoop.hive.ql.TestAcidOnTez.testAcidInsertWithRemoveUnion 
(batchId=228)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=228)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=228)
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 (batchId=232)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testCancelRenewTokenFlow 
(batchId=254)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testConnection 
(batchId=254)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testIsValid (batchId=254)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testIsValidNeg 
(batchId=254)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testNegativeProxyAuth 
(batchId=254)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testNegativeTokenAuth 
(batchId=254)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testProxyAuth 
(batchId=254)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testRenewDelegationToken 
(batchId=254)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testTokenAuth 
(batchId=254)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/10282/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/10282/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-10282/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 39 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12909083 - PreCommit-HIVE-Build

> [Performance][Avro] Reading flat Avro

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-04-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441733#comment-16441733
 ] 

Hive QA commented on HIVE-18410:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m  
6s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 48m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-10282/dev-support/hive-personality.sh
 |
| git revision | master / 4cfec3e |
| Default Java | 1.8.0_111 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-10282/yetus/patch-asflicense-problems.txt
 |
| modules | C: . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-10282/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 2.1.0, 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 2.3.2, 3.1.0
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, 
> profiling_with_patch.png, profiling_without_patch.nps, 
> profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-04-16 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440167#comment-16440167
 ] 

Ashutosh Chauhan commented on HIVE-18410:
-

+1

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 2.1.0, 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 2.3.2, 3.1.0
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, 
> profiling_with_patch.png, profiling_without_patch.nps, 
> profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-04-09 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431272#comment-16431272
 ] 

Vineet Garg commented on HIVE-18410:


Deferring this to 3.1.0 since the branch for 3.0.0 has been cut off. Please 
update the JIRA if you would like to get your patch in 3.0.0.

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 2.1.0, 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 2.3.2, 3.1.0
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, 
> profiling_with_patch.png, profiling_without_patch.nps, 
> profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-02-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351368#comment-16351368
 ] 

Hive QA commented on HIVE-18410:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12909083/HIVE-18410_3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 26 failed/errored test(s), 12972 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_cttas] (batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mm_cttas] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat]
 (batchId=180)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=122)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.ql.TestTxnNoBuckets.testCTAS (batchId=280)
org.apache.hadoop.hive.ql.TestTxnNoBucketsVectorized.testCTAS (batchId=280)
org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap 
(batchId=282)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/9000/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/9000/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-9000/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 26 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12909083 - PreCommit-HIVE-Build

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 3.0.0, 2.3.2
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, 
> profiling_with_patch.png, profiling_without_patch.nps, 
> profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-02-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351363#comment-16351363
 ] 

Hive QA commented on HIVE-18410:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
57s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 4a33ec8 |
| Default Java | 1.8.0_111 |
| modules | C: . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9000/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 3.0.0, 2.3.2
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, HIVE-18410_3.patch, profiling_with_patch.nps, 
> profiling_with_patch.png, profiling_without_patch.nps, 
> profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-02-01 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349813#comment-16349813
 ] 

Hive QA commented on HIVE-18410:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12908852/HIVE-18410_2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 12967 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby_empty] 
(batchId=82)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=163)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=122)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap 
(batchId=282)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8981/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8981/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8981/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 21 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12908852 - PreCommit-HIVE-Build

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 3.0.0, 2.3.2
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-02-01 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349791#comment-16349791
 ] 

Hive QA commented on HIVE-18410:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
35s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 32b8994 |
| Default Java | 1.8.0_111 |
| modules | C: . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8981/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 3.0.0, 2.3.2
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324915#comment-16324915
 ] 

Hive QA commented on HIVE-18410:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905898/HIVE-18410_1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11568 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_binary_storage_queries]
 (batchId=99)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] 
(batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8603/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8603/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8603/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12905898 - PreCommit-HIVE-Build

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Fix For: 2.3.2
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324886#comment-16324886
 ] 

Hive QA commented on HIVE-18410:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} serde: The patch generated 2 new + 58 unchanged - 2 
fixed = 60 total (was 60) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
11s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  8m 20s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 0a62507 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8603/yetus/diff-checkstyle-serde.txt
 |
| modules | C: serde U: serde |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8603/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Fix For: 2.3.2
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-12 Thread Ratandeep Ratti (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324342#comment-16324342
 ] 

Ratandeep Ratti commented on HIVE-18410:


https://reviews.apache.org/r/65036

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Fix For: 2.3.2
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318173#comment-16318173
 ] 

Hive QA commented on HIVE-18410:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905197/HIVE-18410.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8518/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8518/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8518/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2018-01-09 10:02:07.208
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-8518/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-01-09 10:02:07.211
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 8412748 HIVE-18330 : Fix TestMsgBusConnection - doesn't test 
tests the original intention (Zoltan Haindrich via Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 8412748 HIVE-18330 : Fix TestMsgBusConnection - doesn't test 
tests the original intention (Zoltan Haindrich via Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-01-09 10:02:10.579
+ rm -rf ../yetus
+ mkdir ../yetus
+ cp -R . ../yetus
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-8518/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java:198
Falling back to three-way merge...
Applied patch to 
'serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java' with 
conflicts.
Going to apply patch with: git apply -p0
error: patch failed: 
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java:198
Falling back to three-way merge...
Applied patch to 
'serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java' with 
conflicts.
U serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12905197 - PreCommit-HIVE-Build

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Fix For: 2.3.2
>
> Attachments: HIVE-18410.patch, profiling_with_patch.nps, 
> profiling_with_patch.png, profiling_without_patch.nps, 
> profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-08 Thread Ratandeep Ratti (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317521#comment-16317521
 ] 

Ratandeep Ratti commented on HIVE-18410:


Attached profiling data

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Fix For: 2.3.2
>
> Attachments: profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

14 matches

Site Navigation

Mail list logo

Footer information