[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243250#comment-17243250 ] David Mollitor commented on HIVE-21737: --- Also, some of the work I've done: # AVRO-2335: Drop dependency on JODA Time # AVRO-2333: Drop commons-codec dependency # AVRO-2333: Drop commons-logging dependency # AVRO-2061: Better error messages # AVRO-2056: Better performance with Double types # AVRO-2696: Better performance for Doubles and Floats # AVRO-2801: Better performance when using Strings in Maps # Lots of other small improvements In particular, AVRO-2335, AVRO-2333, AVRO-2061 was based on my experience with Hive and Avro integration. > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223855#comment-17223855 ] Ismaël Mejía commented on HIVE-21737: - Thanks for the ref, it was my local branch that was bonkers. 2.3.8-SNAPSHOT looks right, I am going to try to test the Spark upgrade from it in the meantime. (It should work even without the patch). > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223853#comment-17223853 ] Chao Sun commented on HIVE-21737: - [~iemejia] Yes we'd need a new release (perhaps 2.3.8) with these changes at least: 1) upgrade Avro to 1.8.2, and 2) replace the deprecated APIs. We can start preparing the release once these are ready. {quote} but branch-2.3 points to 4.0.0 version {quote} Hmm what do you mean? branch-2.3 [is using 2.3.8|https://github.com/apache/hive/blob/branch-2.3/pom.xml#L24] > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223850#comment-17223850 ] Ismaël Mejía commented on HIVE-21737: - Since things are never easy, I wanted to reproduce locally an upgrade of Hive to Spark only to an eventual 2.3.8-SNAPSHOT but branch-2.3 points to 4.0.0 version and `branch-2` is still on Avro 1.7.7 :S [https://github.com/apache/hive/compare/branch-2.3..branch-2] Seems like branch-2.3 is where the action is but I still don't get why the 4.0.0 name, maybe a mistake > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223828#comment-17223828 ] Ismaël Mejía commented on HIVE-21737: - It seems we will still need to do a Hive release even with the proposed solution because `hive-exec` in Hive 2.3.7 depends on Avro 1.7.7 instead of 1.8.2 like in the branch-2.3 :S. The question now is in your side, how long would it take to get 2.3.8 out? and wouldn't in that case maybe make sense to try to backport the full update? (Both seem that could work if we get the patch in). > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223747#comment-17223747 ] Ismaël Mejía commented on HIVE-21737: - [~csun] The current patch already contains the `getObjectProp()`. I tweaked it a bit to follow your idea to make it source compatible with Avro 1.8.x-1.10.x. So can you please check it I think we can get this into master and even backported to 2.3 pretty easily before getting the version bumped (when Avro 1.10.1 is out and the tests won't break because of the defaults verification in AVRO-2817) > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220351#comment-17220351 ] Chao Sun commented on HIVE-21737: - [~fokko] I'm not proposing to restore the API. Instead, I'm proposing to replace the API {{JsonProperties#getJsonProp}} with {{JsonProperties#getObjectProp}} (which is available since Avro 1.8) and then cast the returned object to the desired type in Hive. There are only 7 usages for {{getJsonProp}} in Hive and they are just used to retrieve scale/precision/maxLength for Decimal/Char types. > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220294#comment-17220294 ] Fokko Driesprong commented on HIVE-21737: - Hi Chao, unfortunately, that's not possible. The getJsonProps would return a JsonNode: [https://github.com/apache/avro/pull/135/files#diff-e86ec7c2ab127130c9faf2786059caad4b257aecbee571c3f9ad0b136935c43cR151] This JsonNode is from Jackson 1.0: org.codehaus.jackson.JsonNode And this library has been replaced by Jackson 2.x. Therefore we can't restore the function. The API shouldn't expose third party classes in the first place. > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219858#comment-17219858 ] Chao Sun commented on HIVE-21737: - [~iemejia] instead of upgrading Avro in Hive, I think alternatively we can replace the usage of API that was removed (and was marked as deprecated from Avro 1.8) since Avro 1.9 by [AVRO-1605|https://issues.apache.org/jira/browse/AVRO-1605] - in particular, {{JsonProperties#getJsonProp}}. This could be an easier approach. > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213206#comment-17213206 ] Chao Sun commented on HIVE-21737: - Sounds good to me [~iemejia]. Thanks! > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213014#comment-17213014 ] Ismaël Mejía commented on HIVE-21737: - [~csun] I think the first thing I will do is probably to update the Hive patch to refer temporally to the Avro SNAPSHOT that fixes the validation issue to be sure it passes the full Hive suite before doing the release of Avro. WDYT? I hope Hive tests let us just put the SNAPSHOTs deps only for the tests and I will update the patch once the release is out. > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212317#comment-17212317 ] Ismaël Mejía commented on HIVE-21737: - Thanks [~chinnalalam] for re running the patch. It is strange, where did you disable validate defaults ? (I did not do that on my patch yet). Any chance you can share the detailed failure logs for both errors. I have struggle to reproduce the CI runs locally so any help is greatly appreciated. > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211061#comment-17211061 ] Chinna Rao Lalam commented on HIVE-21737: - Hi [~iemejia], Verified this patch and found these 2 test failures with below exception {quote}avro_deserialize_map_null.q parquet_map_null.q {quote} {quote}Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Invalid default for field avreau_col_1: null not a [] {quote} It looks these exceptions are because of breaking backword compatability of avro version. https://issues.apache.org/jira/browse/AVRO-2817 We tried setting *Schema.Parser.setValidateDefaults(false)* to turn of defaults validation Ex. org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils#getSchemaFor(java.io.File) it did not work. [~iemejia] any idea/workarond for this issue? > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)