Github user busbey commented on the issue:

    https://github.com/apache/spark/pull/17163
  
    > > AVRO-1502 requires recompilation of some generated classes.
    > This sounds bad, because it basically means it's not backwards compatible 
(which is the concern  @steveloughran raised). Spark also has modules that 
depend on Flume, which is called out explicitly in AVRO-1502 as being affected. 
So at this point, let me ask: is this upgrade necessary for parquet, as in, 
what breaks if we don't upgrade?
    
    No, it is definitely not backwards compatible. Avro 1.8 is, generally 
speaking, not backwards compatible with Avro 1.7 (for runtime libraries / api 
stuff). Avro version numbers are %data format% dot %major version% dot %minor 
version%. [there were a total of 4 changes in Avro 1.8.0 marked as 
incompatible](https://issues.apache.org/jira/issues/?jql=project%20%3D%20AVRO%20AND%20fixVersion%20%3D%201.8.0%20AND%20%22Hadoop%20Flags%22%20%3D%20%22Incompatible%20change%22),
 these two were just the ones I think are most likely to impact folks.
    
    I don't know if it's necessary for the parquet upgrade. If there's some 
feature of Avro 1.8 that Parquet needs we could also talk about backporting 
that to Avro 1.7, presuming it doesn't require some non-backwards compatible 
change. AFAIK the Avro community is still (trying to) make releases for the 1.7 
line.
    
    > oh, this sucks. Find anyone who experienced "The great protobuf update of 
2012" and ask them if they want to do it again.
    
    I agree that it's unfortunate, but libraries need the ability to make 
non-backwards compatible changes. Avro at least makes the effort to do it on 
major versions and flag when it happens, so I don't think comparisons to 
protobuf are fair here. FWIW, I made the case for backwards compatibility on 
both of these issues.
    
    > Looking at the issues, AVRO-997 catches out "wrong" use of an API: if 
that can be identified at compile time, it could be corrected in uses of 
Spark's dependencies, then that sounds like something they should do 
pre-emptively.
    
    Yes, this is something folks should do proactively. I don't know of any 
static analysis that will flag it for us, unfortunately.
    
    > Returning to Hadoop, it's use is in in bits of the YARN API, as far as I 
Can see.
    > 
    > 1. Bumping up to a later avro version for Hadoop 3 will help sync things 
up, at the cost of "yet another piece of pain for everyone who upgrades".
    
    I am a big +1 on bumping to Avro 1.8 in Hadoop 3. IMHO it's been so long 
since the Hadoop 2 release that saving this one bit of pain is minor in 
comparison to the risk of sticking to the 1.7 line through to Hadoop 4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to