[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773949#comment-13773949 ] Hudson commented on HIVE-4732: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #110 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/110/]) HIVE-4732 : Reduce or eliminate the expensive Schema equals() check for AvroSerde (Mohammad Kamrul Islam via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525290) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Fix For: 0.13.0 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, HIVE-4732.6.patch, HIVE-4732.7.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773959#comment-13773959 ] Hudson commented on HIVE-4732: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #178 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/178/]) HIVE-4732 : Reduce or eliminate the expensive Schema equals() check for AvroSerde (Mohammad Kamrul Islam via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525290) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Fix For: 0.13.0 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, HIVE-4732.6.patch, HIVE-4732.7.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773964#comment-13773964 ] Hudson commented on HIVE-4732: -- FAILURE: Integrated in Hive-trunk-hadoop2 #449 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/449/]) HIVE-4732 : Reduce or eliminate the expensive Schema equals() check for AvroSerde (Mohammad Kamrul Islam via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525290) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Fix For: 0.13.0 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, HIVE-4732.6.patch, HIVE-4732.7.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772844#comment-13772844 ] Hive QA commented on HIVE-4732: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12604184/HIVE-4732.7.patch {color:green}SUCCESS:{color} +1 3128 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/836/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/836/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, HIVE-4732.6.patch, HIVE-4732.7.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772337#comment-13772337 ] Hive QA commented on HIVE-4732: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12604084/HIVE-4732.6.patch {color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1242 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5 org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769275#comment-13769275 ] Hive QA commented on HIVE-4732: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12603500/HIVE-4732.5.patch {color:green}SUCCESS:{color} +1 3126 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/774/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/774/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770092#comment-13770092 ] Ashutosh Chauhan commented on HIVE-4732: I agree with [~kamrul] analysis. Lets not complicate the code for highly obscure case. [~appodictic] Let us know if you disagree. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770288#comment-13770288 ] Edward Capriolo commented on HIVE-4732: --- I do disagree, because it is not complex to generate a GUID that will never collide. http://www.javapractices.com/topic/TopicAction.do?Id=56 An implementation would likely replace 1 line of code with between 2 to 4. It is not a complex task and there are probably hundreds of references on how to do this on the internet. {code} import java.rmi.server.UID; public class UniqueId { /** * Build and display some UID objects. */ public static void main (String... arguments) { for (int idx=0; idx10; ++idx){ UID userId = new UID(); System.out.println(User Id: + userId); } } } {code} Would you rather have: 1) a parachute that very very rarely does not work 2) a parachute that always works :) Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770291#comment-13770291 ] Edward Capriolo commented on HIVE-4732: --- If you do not want to do it just file another Jira issue and assign it to me and Ill do it. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769049#comment-13769049 ] Mohammad Kamrul Islam commented on HIVE-4732: - [~appodictic]: I can see your point. Indeed a very informative link. As the link mentioned, the probability of ID collisions are very very rare. Pasted from wikipedia: To put these numbers into perspective, the annual risk of someone being hit by a meteorite is estimated to be one chance in 17 billion,[38] which means the probability is about 0.006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs. With these probability, will it be necessary to make thing complex. Moreover, these IDs are often few in one hive session. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766543#comment-13766543 ] Ashutosh Chauhan commented on HIVE-4732: Thanks, [~kamrul] for addressing concern. Overall looks good. One quick question: I see in AvroGenericRecordWritable::write(Dataoutput out) this line out.writeUTF(recordReaderID.toString()); Doesn't this mean id is now persisted on-disk? I thought id is generated by reader at read time and than added to record, but I don't get why while writing record we need to write it. Seems like I am missing something obvious. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766816#comment-13766816 ] Mark Wagner commented on HIVE-4732: --- The write and readFields methods are used when serializing the writable, but not when persisting to disk. We'll still want to maintain that id if the record is serialized and deserialized so we can do the equality comparison on the other side. I don't believe that those methods are ever actually used in Hive (the ORC equivalent of AvroGenericRecordWritable doesn't even implement it), but for completeness they are included. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766986#comment-13766986 ] Ashutosh Chauhan commented on HIVE-4732: Thanks, [~mwagner] for explanation. +1 Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767007#comment-13767007 ] Ashutosh Chauhan commented on HIVE-4732: Can one of you upload the patch on jira, so that HIVE QA have a run on it? Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767043#comment-13767043 ] Edward Capriolo commented on HIVE-4732: --- Patch is not ready for commit. {quote} + /** + * A unique ID for each record reader. + */ + final private UUID recordReaderID; +this.recordReaderID = UUID.randomUUID(); } {quote} Your comment conflicts your code randomUUID() is not unique. See the link above. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728931#comment-13728931 ] Edward Capriolo commented on HIVE-4732: --- See: http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706299#comment-13706299 ] Mohammad Kamrul Islam commented on HIVE-4732: - Thanks Edward for the comments. We are now trying to take a different approach to address the same issue. A new patch is coming soon. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706360#comment-13706360 ] Mohammad Kamrul Islam commented on HIVE-4732: - New patch is uploaded in RB: https://reviews.apache.org/r/12480/ Description copied from RB: From our performance analysis, we found AvroSerde's schema.equals() call consumed a substantial amount ( nearly 40%) of time. This patch intends to minimize the number schema.equals() calls by pushing the check as late/fewer as possible. At first, we added a unique id for each record reader which is then included in every AvroGenericRecordWritable. Then, we introduce two new data structures (one hashset and one hashmap) to store intermediate data to avoid duplicates checkings. Hashset contains all the record readers' IDs that don't need any re-encoding. On the other hand, HashMap contains the already used re-encoders. It works as cache and allows re-encoders reuse. With this change, our test shows nearly 40% reduction in Avro record reading time. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira