[jira] [Commented] (HIVE-12362) Hive's Parquet SerDe ignores 'serialization.null.format' property
[ https://issues.apache.org/jira/browse/HIVE-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453014#comment-15453014 ] Lenni Kuff commented on HIVE-12362: --- I don't have a test case available to confirm this, it was only done by looking at the code so have not confirmed. Seems that there is extra working happening for each column value in each row, so could have a possible performance impact. > Hive's Parquet SerDe ignores 'serialization.null.format' property > - > > Key: HIVE-12362 > URL: https://issues.apache.org/jira/browse/HIVE-12362 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-12362.2.patch, HIVE-12362.patch > > > {code} > create table src (a string); > insert into table src values (NULL), (''), (''); > 0: jdbc:hive2://localhost:1/default> select * from src; > +---+--+ > | src.a | > +---+--+ > | NULL | > || > || > +---+--+ > create table dest (a string) row format serde > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' stored as > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; > alter table dest set SERDEPROPERTIES ('serialization.null.format' = ''); > alter table dest set TBLPROPERTIES ('serialization.null.format' = ''); > insert overwrite table dest select * from src; > 0: jdbc:hive2://localhost:1/default> select * from test11; > +---+--+ > | test11.a | > +---+--+ > | NULL | > || > || > +---+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12362) Hive's Parquet SerDe ignores 'serialization.null.format' property
[ https://issues.apache.org/jira/browse/HIVE-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452968#comment-15452968 ] Hive QA commented on HIVE-12362: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12771364/HIVE-12362.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1058/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1058/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1058/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.8.0_25 ]] + export JAVA_HOME=/usr/java/jdk1.8.0_25 + JAVA_HOME=/usr/java/jdk1.8.0_25 + export PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-1058/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 1f6949f HIVE-14233 - Improve vectorization for ACID by eliminating row-by-row stitching (Saket Saurabh via Eugene Koifman) + git clean -f -d Removing common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig Removing common/src/test/org/apache/hadoop/hive/common/TestLogUtils.java Removing ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java.orig + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 1f6949f HIVE-14233 - Improve vectorization for ACID by eliminating row-by-row stitching (Saket Saurabh via Eugene Koifman) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12771364 - PreCommit-HIVE-MASTER-Build > Hive's Parquet SerDe ignores 'serialization.null.format' property > - > > Key: HIVE-12362 > URL: https://issues.apache.org/jira/browse/HIVE-12362 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-12362.2.patch, HIVE-12362.patch > > > {code} > create table src (a string); > insert into table src values (NULL), (''), (''); > 0: jdbc:hive2://localhost:1/default> select * from src; > +---+--+ > | src.a | > +---+--+ > | NULL | > || > || > +---+--+ > create table dest (a string) row format serde > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' stored as > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; > alter table dest set SERDEPROPERTIES ('serialization.null.format' = ''); > alter table dest set TBLPROPERTIES ('serialization.null.format' = ''); > insert overwrite table dest select * from src; > 0: jdbc:hive2://localhost:1/default> select * from test11; > +---+--+ > | test11.a | > +---+--+ > | NULL | > || > || > +---+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12362) Hive's Parquet SerDe ignores 'serialization.null.format' property
[ https://issues.apache.org/jira/browse/HIVE-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452648#comment-15452648 ] Lenni Kuff commented on HIVE-12362: --- [~ngangam] - Looking at the patch it appears there may be some significant performance impact with this change. Have you done any performance testing with this patch? > Hive's Parquet SerDe ignores 'serialization.null.format' property > - > > Key: HIVE-12362 > URL: https://issues.apache.org/jira/browse/HIVE-12362 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-12362.2.patch, HIVE-12362.patch > > > {code} > create table src (a string); > insert into table src values (NULL), (''), (''); > 0: jdbc:hive2://localhost:1/default> select * from src; > +---+--+ > | src.a | > +---+--+ > | NULL | > || > || > +---+--+ > create table dest (a string) row format serde > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' stored as > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; > alter table dest set SERDEPROPERTIES ('serialization.null.format' = ''); > alter table dest set TBLPROPERTIES ('serialization.null.format' = ''); > insert overwrite table dest select * from src; > 0: jdbc:hive2://localhost:1/default> select * from test11; > +---+--+ > | test11.a | > +---+--+ > | NULL | > || > || > +---+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12362) Hive's Parquet SerDe ignores 'serialization.null.format' property
[ https://issues.apache.org/jira/browse/HIVE-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996531#comment-14996531 ] Naveen Gangam commented on HIVE-12362: -- The test failures are unrelated to the attached patch. > Hive's Parquet SerDe ignores 'serialization.null.format' property > - > > Key: HIVE-12362 > URL: https://issues.apache.org/jira/browse/HIVE-12362 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-12362.patch > > > {code} > create table src (a string); > insert into table src values (NULL), (''), (''); > 0: jdbc:hive2://localhost:1/default> select * from src; > +---+--+ > | src.a | > +---+--+ > | NULL | > || > || > +---+--+ > create table dest (a string) row format serde > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' stored as > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; > alter table dest set SERDEPROPERTIES ('serialization.null.format' = ''); > alter table dest set TBLPROPERTIES ('serialization.null.format' = ''); > insert overwrite table dest select * from src; > 0: jdbc:hive2://localhost:1/default> select * from test11; > +---+--+ > | test11.a | > +---+--+ > | NULL | > || > || > +---+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12362) Hive's Parquet SerDe ignores 'serialization.null.format' property
[ https://issues.apache.org/jira/browse/HIVE-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997027#comment-14997027 ] Hive QA commented on HIVE-12362: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12771364/HIVE-12362.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9777 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_null_format org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5972/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5972/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5972/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12771364 - PreCommit-HIVE-TRUNK-Build > Hive's Parquet SerDe ignores 'serialization.null.format' property > - > > Key: HIVE-12362 > URL: https://issues.apache.org/jira/browse/HIVE-12362 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-12362.2.patch, HIVE-12362.patch > > > {code} > create table src (a string); > insert into table src values (NULL), (''), (''); > 0: jdbc:hive2://localhost:1/default> select * from src; > +---+--+ > | src.a | > +---+--+ > | NULL | > || > || > +---+--+ > create table dest (a string) row format serde > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' stored as > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; > alter table dest set SERDEPROPERTIES ('serialization.null.format' = ''); > alter table dest set TBLPROPERTIES ('serialization.null.format' = ''); > insert overwrite table dest select * from src; > 0: jdbc:hive2://localhost:1/default> select * from test11; > +---+--+ > | test11.a | > +---+--+ > | NULL | > || > || > +---+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12362) Hive's Parquet SerDe ignores 'serialization.null.format' property
[ https://issues.apache.org/jira/browse/HIVE-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995053#comment-14995053 ] Hive QA commented on HIVE-12362: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12771125/HIVE-12362.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9762 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-auto_sortmerge_join_13.q-tez_self_join.q-orc_vectorization_ppd.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5958/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5958/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5958/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12771125 - PreCommit-HIVE-TRUNK-Build > Hive's Parquet SerDe ignores 'serialization.null.format' property > - > > Key: HIVE-12362 > URL: https://issues.apache.org/jira/browse/HIVE-12362 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-12362.patch > > > {code} > create table src (a string); > insert into table src values (NULL), (''), (''); > 0: jdbc:hive2://localhost:1/default> select * from src; > +---+--+ > | src.a | > +---+--+ > | NULL | > || > || > +---+--+ > create table dest (a string) row format serde > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' stored as > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; > alter table dest set SERDEPROPERTIES ('serialization.null.format' = ''); > alter table dest set TBLPROPERTIES ('serialization.null.format' = ''); > insert overwrite table dest select * from src; > 0: jdbc:hive2://localhost:1/default> select * from test11; > +---+--+ > | test11.a | > +---+--+ > | NULL | > || > || > +---+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)