[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-12-03 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243250#comment-17243250
 ] 

David Mollitor commented on HIVE-21737:
---

Also, some of the work I've done:

 
 # AVRO-2335: Drop dependency on JODA Time
 # AVRO-2333: Drop commons-codec dependency
 # AVRO-2333: Drop commons-logging dependency
 # AVRO-2061: Better error messages
 # AVRO-2056: Better performance with Double types
 # AVRO-2696: Better performance for Doubles and Floats
 # AVRO-2801: Better performance when using Strings in Maps
 # Lots of other small improvements

 

In particular, AVRO-2335, AVRO-2333, AVRO-2061 was based on my experience with 
Hive and Avro integration.

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223855#comment-17223855
 ] 

Ismaël Mejía commented on HIVE-21737:
-

Thanks for the ref, it was my local branch that was bonkers. 2.3.8-SNAPSHOT 
looks right, I am going to try to test the Spark upgrade from it in the 
meantime. (It should work even without the patch).

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-30 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223853#comment-17223853
 ] 

Chao Sun commented on HIVE-21737:
-

[~iemejia] Yes we'd need a new release (perhaps 2.3.8) with these changes at 
least: 1) upgrade Avro to 1.8.2, and 2) replace the deprecated APIs. We can 
start preparing the release once these are ready.

{quote}
but branch-2.3 points to 4.0.0 version
{quote}

Hmm what do you mean? branch-2.3 [is using 
2.3.8|https://github.com/apache/hive/blob/branch-2.3/pom.xml#L24] 



> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223850#comment-17223850
 ] 

Ismaël Mejía commented on HIVE-21737:
-

Since things are never easy, I wanted to reproduce locally an upgrade of Hive 
to Spark only to an eventual 2.3.8-SNAPSHOT but branch-2.3 points to 4.0.0 
version and `branch-2` is still on Avro 1.7.7 :S
[https://github.com/apache/hive/compare/branch-2.3..branch-2]
Seems like branch-2.3 is where the action is but I still don't get why the 
4.0.0 name, maybe a mistake

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223828#comment-17223828
 ] 

Ismaël Mejía commented on HIVE-21737:
-

It seems we will still need to do a Hive release even with the proposed 
solution because `hive-exec` in Hive 2.3.7 depends on Avro 1.7.7 instead of 
1.8.2 like in the branch-2.3 :S.

The question now is in your side, how long would it take to get 2.3.8 out? and 
wouldn't in that case maybe make sense to try to backport the full update? 
(Both seem that could work if we get the patch in).

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223747#comment-17223747
 ] 

Ismaël Mejía commented on HIVE-21737:
-

[~csun] The current patch already contains the `getObjectProp()`. I tweaked it 
a bit to follow your idea to make it source compatible with Avro 1.8.x-1.10.x. 
So can you please check it I think we can get this into master and even 
backported to 2.3 pretty easily before getting the version bumped (when Avro 
1.10.1 is out and the tests won't break because of the defaults verification in 
AVRO-2817)

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-25 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220351#comment-17220351
 ] 

Chao Sun commented on HIVE-21737:
-

[~fokko] I'm not proposing to restore the API. Instead, I'm proposing to 
replace the API {{JsonProperties#getJsonProp}} with 
{{JsonProperties#getObjectProp}} (which is available since Avro 1.8) and then 
cast the returned object to the desired type in Hive. There are only 7 usages 
for {{getJsonProp}} in Hive and they are just used to retrieve 
scale/precision/maxLength for Decimal/Char types.

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-25 Thread Fokko Driesprong (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220294#comment-17220294
 ] 

Fokko Driesprong commented on HIVE-21737:
-

Hi Chao, unfortunately, that's not possible. The getJsonProps would return a 
JsonNode: 
[https://github.com/apache/avro/pull/135/files#diff-e86ec7c2ab127130c9faf2786059caad4b257aecbee571c3f9ad0b136935c43cR151]

This JsonNode is from Jackson 1.0: org.codehaus.jackson.JsonNode And this 
library has been replaced by Jackson 2.x. Therefore we can't restore the 
function. The API shouldn't expose third party classes in the first place.

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-23 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219858#comment-17219858
 ] 

Chao Sun commented on HIVE-21737:
-

[~iemejia] instead of upgrading Avro in Hive, I think alternatively we can 
replace the usage of API that was removed (and was marked as deprecated from 
Avro 1.8) since Avro 1.9 by 
[AVRO-1605|https://issues.apache.org/jira/browse/AVRO-1605] - in particular, 
{{JsonProperties#getJsonProp}}. This could be an easier approach.

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-13 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213206#comment-17213206
 ] 

Chao Sun commented on HIVE-21737:
-

Sounds good to me [~iemejia]. Thanks!

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-13 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213014#comment-17213014
 ] 

Ismaël Mejía commented on HIVE-21737:
-

[~csun] I think the first thing I will do is probably to update the Hive patch 
to refer temporally to the Avro SNAPSHOT that fixes the validation issue to be 
sure it passes the full Hive suite before doing the release of Avro. WDYT? I 
hope Hive tests let us just put the SNAPSHOTs deps only for the tests and I 
will update the patch once the release is out.

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212317#comment-17212317
 ] 

Ismaël Mejía commented on HIVE-21737:
-

Thanks [~chinnalalam] for re running the patch.
It is strange, where did you disable validate defaults ? (I did not do that on 
my patch yet).
Any chance you can share the detailed failure logs for both errors.
I have struggle to reproduce the CI runs locally so any help is greatly 
appreciated.

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-09 Thread Chinna Rao Lalam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211061#comment-17211061
 ] 

Chinna Rao Lalam commented on HIVE-21737:
-

Hi [~iemejia],

Verified this patch and found these 2 test failures with below exception
{quote}avro_deserialize_map_null.q
 parquet_map_null.q
{quote}
{quote}Failed with exception 
java.io.IOException:org.apache.avro.AvroTypeException: Invalid default for 
field avreau_col_1: null not a []
{quote}
It looks these exceptions are because of breaking backword compatability of 
avro version. https://issues.apache.org/jira/browse/AVRO-2817

We tried setting *Schema.Parser.setValidateDefaults(false)* to turn of defaults 
validation

Ex. 
org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils#getSchemaFor(java.io.File) it 
did not work.

[~iemejia] any idea/workarond for this issue?

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)