Re: Upgrade Avro dependency in Spark to 1.9.2

2020-02-17 Thread Driesprong, Fokko
Awesome work Ismaël, let me know if I can help somewhere.

Chees, Fokko

Op ma 17 feb. 2020 om 16:00 schreef Ismaël Mejía :

> I had not tried to do the upgrade Spark since I assumed it will fail
> because of the transitive dependencies of Hive.
>
> But I decided to give it a shot today. Luckily the Spark code base is quite
> Avro friendly so codewise it was 'easy'.
>
> Of course it is still failing, but you can use that to refer on the other
> PR.
> And if you can find any fixes to the pending things that would be great.
>
> https://github.com/apache/spark/pull/27609
>
> Regards,
> Ismaël
>
>
> On Fri, Feb 14, 2020 at 5:42 PM Michael Heuer  wrote:
>
> > Hello Ismaël,
> >
> > Might you be able to share a link to your patch for Spark?  I would like
> > to try to apply it on top of
> >
> > https://github.com/apache/spark/pull/26804 <
> > https://github.com/apache/spark/pull/26804>
> >
> > which attempts to upgrade the Parquet dependency for Spark to 1.11.0.
> >
> > Thank you,
> >
> >michael
> >
> >
> > > On Feb 14, 2020, at 10:30 AM, Ismaël Mejía  wrote:
> > >
> > > Ah lovely question.
> > >
> > > tldr; version
> > > Spark depends on Hive so Hive should be upgraded first
> > > Spark depends on two versions of Hive a fork by Spark of 1.x and
> upstream
> > > Hive 2.x
> > > Upgrading the first is not even discussed at the moment, for the
> second I
> > > added a patch that passes all tests if you run it against Spark
> > 2.4/master,
> > > but Hive uses a forked version of Spark 2.3 to run its tests (YES
> > CIRCULAR
> > > DEPENDENCY!!!)
> > >
> > > One extra point that is pushing things in the right direction is that
> > > Parquet and Iceberg already moved to Avro 1.9.x so pressure is growing
> > for
> > > things to move, but it is still is a mess, but we want to give the
> fight,
> > > one thing is sure it won't be for Spark 3.0.0, best case 3.1.x and that
> > > also depends on the good will of the Hive contributors that have
> ignored
> > my
> > > emails + patches for some time.
> > >
> >
> https://lists.apache.org/thread.html/rc6c672ad4a5e255957d54d80ff83bf48eacece2828a86bc6cedd9c4c%40%3Cdev.hive.apache.org%3E
> > >
> > > For the detailed details on the saga:
> > > https://issues.apache.org/jira/browse/SPARK-27733
> > > https://issues.apache.org/jira/browse/HIVE-21737
> > >
> > >
> > > On Fri, Feb 14, 2020 at 5:04 PM Michael Heuer 
> wrote:
> > >
> > >> Hello,
> > >>
> > >> I wonder if any Avro devs might be willing to help push a PR for
> Apache
> > >> Spark to update the Avro dependency from 1.8.2 to 1.9.2?
> > >>
> > >> I foresee some trouble with binary incompatible code changes and
> > >> dependency version conflicts, and could use some additional support.
> > >>
> > >> Thank you in advance,
> > >>
> > >>   michael
> >
> >
>


Re: Upgrade Avro dependency in Spark to 1.9.2

2020-02-17 Thread Ismaël Mejía
I had not tried to do the upgrade Spark since I assumed it will fail
because of the transitive dependencies of Hive.

But I decided to give it a shot today. Luckily the Spark code base is quite
Avro friendly so codewise it was 'easy'.

Of course it is still failing, but you can use that to refer on the other
PR.
And if you can find any fixes to the pending things that would be great.

https://github.com/apache/spark/pull/27609

Regards,
Ismaël


On Fri, Feb 14, 2020 at 5:42 PM Michael Heuer  wrote:

> Hello Ismaël,
>
> Might you be able to share a link to your patch for Spark?  I would like
> to try to apply it on top of
>
> https://github.com/apache/spark/pull/26804 <
> https://github.com/apache/spark/pull/26804>
>
> which attempts to upgrade the Parquet dependency for Spark to 1.11.0.
>
> Thank you,
>
>michael
>
>
> > On Feb 14, 2020, at 10:30 AM, Ismaël Mejía  wrote:
> >
> > Ah lovely question.
> >
> > tldr; version
> > Spark depends on Hive so Hive should be upgraded first
> > Spark depends on two versions of Hive a fork by Spark of 1.x and upstream
> > Hive 2.x
> > Upgrading the first is not even discussed at the moment, for the second I
> > added a patch that passes all tests if you run it against Spark
> 2.4/master,
> > but Hive uses a forked version of Spark 2.3 to run its tests (YES
> CIRCULAR
> > DEPENDENCY!!!)
> >
> > One extra point that is pushing things in the right direction is that
> > Parquet and Iceberg already moved to Avro 1.9.x so pressure is growing
> for
> > things to move, but it is still is a mess, but we want to give the fight,
> > one thing is sure it won't be for Spark 3.0.0, best case 3.1.x and that
> > also depends on the good will of the Hive contributors that have ignored
> my
> > emails + patches for some time.
> >
> https://lists.apache.org/thread.html/rc6c672ad4a5e255957d54d80ff83bf48eacece2828a86bc6cedd9c4c%40%3Cdev.hive.apache.org%3E
> >
> > For the detailed details on the saga:
> > https://issues.apache.org/jira/browse/SPARK-27733
> > https://issues.apache.org/jira/browse/HIVE-21737
> >
> >
> > On Fri, Feb 14, 2020 at 5:04 PM Michael Heuer  wrote:
> >
> >> Hello,
> >>
> >> I wonder if any Avro devs might be willing to help push a PR for Apache
> >> Spark to update the Avro dependency from 1.8.2 to 1.9.2?
> >>
> >> I foresee some trouble with binary incompatible code changes and
> >> dependency version conflicts, and could use some additional support.
> >>
> >> Thank you in advance,
> >>
> >>   michael
>
>


Re: Upgrade Avro dependency in Spark to 1.9.2

2020-02-14 Thread Michael Heuer
Hello Ismaël,

Might you be able to share a link to your patch for Spark?  I would like to try 
to apply it on top of

https://github.com/apache/spark/pull/26804 


which attempts to upgrade the Parquet dependency for Spark to 1.11.0.

Thank you,

   michael


> On Feb 14, 2020, at 10:30 AM, Ismaël Mejía  wrote:
> 
> Ah lovely question.
> 
> tldr; version
> Spark depends on Hive so Hive should be upgraded first
> Spark depends on two versions of Hive a fork by Spark of 1.x and upstream
> Hive 2.x
> Upgrading the first is not even discussed at the moment, for the second I
> added a patch that passes all tests if you run it against Spark 2.4/master,
> but Hive uses a forked version of Spark 2.3 to run its tests (YES CIRCULAR
> DEPENDENCY!!!)
> 
> One extra point that is pushing things in the right direction is that
> Parquet and Iceberg already moved to Avro 1.9.x so pressure is growing for
> things to move, but it is still is a mess, but we want to give the fight,
> one thing is sure it won't be for Spark 3.0.0, best case 3.1.x and that
> also depends on the good will of the Hive contributors that have ignored my
> emails + patches for some time.
> https://lists.apache.org/thread.html/rc6c672ad4a5e255957d54d80ff83bf48eacece2828a86bc6cedd9c4c%40%3Cdev.hive.apache.org%3E
> 
> For the detailed details on the saga:
> https://issues.apache.org/jira/browse/SPARK-27733
> https://issues.apache.org/jira/browse/HIVE-21737
> 
> 
> On Fri, Feb 14, 2020 at 5:04 PM Michael Heuer  wrote:
> 
>> Hello,
>> 
>> I wonder if any Avro devs might be willing to help push a PR for Apache
>> Spark to update the Avro dependency from 1.8.2 to 1.9.2?
>> 
>> I foresee some trouble with binary incompatible code changes and
>> dependency version conflicts, and could use some additional support.
>> 
>> Thank you in advance,
>> 
>>   michael



Re: Upgrade Avro dependency in Spark to 1.9.2

2020-02-14 Thread Ismaël Mejía
Ah lovely question.

tldr; version
Spark depends on Hive so Hive should be upgraded first
Spark depends on two versions of Hive a fork by Spark of 1.x and upstream
Hive 2.x
Upgrading the first is not even discussed at the moment, for the second I
added a patch that passes all tests if you run it against Spark 2.4/master,
but Hive uses a forked version of Spark 2.3 to run its tests (YES CIRCULAR
DEPENDENCY!!!)

One extra point that is pushing things in the right direction is that
Parquet and Iceberg already moved to Avro 1.9.x so pressure is growing for
things to move, but it is still is a mess, but we want to give the fight,
one thing is sure it won't be for Spark 3.0.0, best case 3.1.x and that
also depends on the good will of the Hive contributors that have ignored my
emails + patches for some time.
https://lists.apache.org/thread.html/rc6c672ad4a5e255957d54d80ff83bf48eacece2828a86bc6cedd9c4c%40%3Cdev.hive.apache.org%3E

For the detailed details on the saga:
https://issues.apache.org/jira/browse/SPARK-27733
https://issues.apache.org/jira/browse/HIVE-21737


On Fri, Feb 14, 2020 at 5:04 PM Michael Heuer  wrote:

> Hello,
>
> I wonder if any Avro devs might be willing to help push a PR for Apache
> Spark to update the Avro dependency from 1.8.2 to 1.9.2?
>
> I foresee some trouble with binary incompatible code changes and
> dependency version conflicts, and could use some additional support.
>
> Thank you in advance,
>
>michael


Upgrade Avro dependency in Spark to 1.9.2

2020-02-14 Thread Michael Heuer
Hello,

I wonder if any Avro devs might be willing to help push a PR for Apache Spark 
to update the Avro dependency from 1.8.2 to 1.9.2?

I foresee some trouble with binary incompatible code changes and dependency 
version conflicts, and could use some additional support.

Thank you in advance,

   michael