[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773949#comment-13773949
 ] 

Hudson commented on HIVE-4732:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #110 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/110/])
HIVE-4732 : Reduce or eliminate the expensive Schema equals() check for 
AvroSerde (Mohammad Kamrul Islam via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525290)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java


 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Fix For: 0.13.0

 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.6.patch, HIVE-4732.7.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773959#comment-13773959
 ] 

Hudson commented on HIVE-4732:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #178 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/178/])
HIVE-4732 : Reduce or eliminate the expensive Schema equals() check for 
AvroSerde (Mohammad Kamrul Islam via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525290)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java


 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Fix For: 0.13.0

 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.6.patch, HIVE-4732.7.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773964#comment-13773964
 ] 

Hudson commented on HIVE-4732:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #449 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/449/])
HIVE-4732 : Reduce or eliminate the expensive Schema equals() check for 
AvroSerde (Mohammad Kamrul Islam via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525290)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java


 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Fix For: 0.13.0

 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.6.patch, HIVE-4732.7.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772844#comment-13772844
 ] 

Hive QA commented on HIVE-4732:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12604184/HIVE-4732.7.patch

{color:green}SUCCESS:{color} +1 3128 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/836/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/836/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.6.patch, HIVE-4732.7.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772337#comment-13772337
 ] 

Hive QA commented on HIVE-4732:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12604084/HIVE-4732.6.patch

{color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1242 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables
org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2

[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769275#comment-13769275
 ] 

Hive QA commented on HIVE-4732:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603500/HIVE-4732.5.patch

{color:green}SUCCESS:{color} +1 3126 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/774/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/774/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770092#comment-13770092
 ] 

Ashutosh Chauhan commented on HIVE-4732:


I agree with [~kamrul] analysis. Lets not complicate the code for highly 
obscure case. [~appodictic] Let us know if you disagree.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770288#comment-13770288
 ] 

Edward Capriolo commented on HIVE-4732:
---

I do disagree, because it is not complex to generate a GUID that will never 
collide. 

http://www.javapractices.com/topic/TopicAction.do?Id=56

An implementation would likely replace 1 line of code with between 2 to 4. It 
is not a complex task and there are probably hundreds of references on how to 
do this on the internet.

{code}
import java.rmi.server.UID;

public class UniqueId {

  /**
  * Build and display some UID objects.
  */
  public static void main (String... arguments) {
for (int idx=0; idx10; ++idx){
  UID userId = new UID();
  System.out.println(User Id:  + userId);
}
  }
} 
{code}

Would you rather have: 
1) a parachute that very very rarely does not work
2) a parachute that always works
:)

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770291#comment-13770291
 ] 

Edward Capriolo commented on HIVE-4732:
---

If you do not want to do it just file another Jira issue and assign it to me 
and Ill do it.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-16 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769049#comment-13769049
 ] 

Mohammad Kamrul Islam commented on HIVE-4732:
-

[~appodictic]: I can see your point. Indeed a very informative link.
As the link mentioned, the probability of ID collisions are very very rare. 
Pasted from wikipedia:
To put these numbers into perspective, the annual risk of someone being hit by 
a meteorite is estimated to be one chance in 17 billion,[38] which means the 
probability is about 0.006 (6 × 10−11), equivalent to the odds of 
creating a few tens of trillions of UUIDs in a year and having one duplicate. 
In other words, only after generating 1 billion UUIDs every second for the next 
100 years, the probability of creating just one duplicate would be about 50%. 
The probability of one duplicate would be about 50% if every person on earth 
owns 600 million UUIDs.

With these probability, will it be necessary to make thing complex. Moreover, 
these IDs are often few in one hive session.




 

 

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, 
 HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766543#comment-13766543
 ] 

Ashutosh Chauhan commented on HIVE-4732:


Thanks, [~kamrul] for addressing concern. Overall looks good. One quick 
question: I see in AvroGenericRecordWritable::write(Dataoutput out) this line 
out.writeUTF(recordReaderID.toString()); Doesn't this mean id is now persisted 
on-disk? I thought id is generated by reader at read time and than added to 
record, but I don't get why while writing record we need to write it. Seems 
like I am missing something obvious.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-13 Thread Mark Wagner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766816#comment-13766816
 ] 

Mark Wagner commented on HIVE-4732:
---

The write and readFields methods are used when serializing the writable, but 
not when persisting to disk. We'll still want to maintain that id if the record 
is serialized and deserialized so we can do the equality comparison on the 
other side. I don't believe that those methods are ever actually used in Hive 
(the ORC equivalent of AvroGenericRecordWritable doesn't even implement it), 
but for completeness they are included.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766986#comment-13766986
 ] 

Ashutosh Chauhan commented on HIVE-4732:


Thanks, [~mwagner] for explanation. +1

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767007#comment-13767007
 ] 

Ashutosh Chauhan commented on HIVE-4732:


Can one of you upload the patch on jira, so that HIVE QA have a run on it?

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767043#comment-13767043
 ] 

Edward Capriolo commented on HIVE-4732:
---

Patch is not ready for commit.

{quote}
+  /**
+   * A unique ID for each record reader.
+   */
+  final private UUID recordReaderID;

+this.recordReaderID = UUID.randomUUID();
   }
{quote}

Your comment conflicts your code randomUUID() is not unique. See the link above.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, 
 HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-08-04 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728931#comment-13728931
 ] 

Edward Capriolo commented on HIVE-4732:
---

See:
http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706299#comment-13706299
 ] 

Mohammad Kamrul Islam commented on HIVE-4732:
-

Thanks Edward for the comments.
We are now trying to take a different approach to address the same issue.
A new patch is coming soon.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706360#comment-13706360
 ] 

Mohammad Kamrul Islam commented on HIVE-4732:
-

New patch is uploaded in RB: https://reviews.apache.org/r/12480/

Description copied from RB:
From our performance analysis, we found AvroSerde's schema.equals() call 
consumed a substantial amount ( nearly 40%) of time. This patch intends to 
minimize the number schema.equals() calls by pushing the check as late/fewer 
as possible.

At first, we added a unique id for each record reader which is then included in 
every AvroGenericRecordWritable. Then, we introduce two new data structures 
(one hashset and one hashmap) to store intermediate data to avoid duplicates 
checkings. Hashset contains all the record readers' IDs that don't need any 
re-encoding. On the other hand, HashMap contains the already used re-encoders. 
It works as cache and allows re-encoders reuse. With this change, our test 
shows nearly 40% reduction in Avro record reading time.

   

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira