Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-20 Thread Ryan Blue
Great question. I don't have a good idea of who is on JDK 8 still. Maybe we
should start another thread?

On Thu, Apr 20, 2023 at 1:05 PM Anton Okolnychyi
 wrote:

> What about JDK 8? If I remember Spark 2 was holding us, do we want to
> consider switching to JDK 11 for releases?
>
> - Anton
>
> On Apr 20, 2023, at 2:10 AM, Driesprong, Fokko 
> wrote:
>
> Thanks all for the response, much appreciated.
>
> That said, I'd love to hear from more people on this. I think it would be
>> great to drop support, but I don't know how many people still use it. Is
>> upgrading Hadoop a good reason to drop support for an engine? Hadoop seems
>> like a minor concern to me unless it is blocking something.
>
>
> I noticed that we needed to bump Hadoop when we wanted to upgrade to
> Parquet 1.13.0 . It would be
> nice to get this in since it allows for removing a workaround from the
> Iceberg codebase (see PR for details).
>
> Netflix is still on Spark-2.4.4 with Iceberg-0.9. We are
>> actively migrating to Spark-3.x and Iceberg 1.1 (or later). I do not
>> anticipate us using Spark-2.4.4 with newer versions of Iceberg (>0.9).
>
>
> For Spark 2.4 Iceberg up to 1.2.1 is available:
> https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-2.4
>
> As for the Hadoop upgrade, I think that could be problematic for us if
>> there's any non-backwards compatible API change required at compile time
>> since we're still running a 2.8.x version.
>
>
> Thanks for raising this. I took some time today to dig into this. There is
> an effort to upgrade Hadoop  in
> Iceberg, but that's stuck on incompatibilities with Tez. Unfortunately, 
> Parquet
> 1.13.0
> 
>  doesn't
> compile against Hadoop 2.8.5 and also bringing back support Hadoop 2.8.x
> is going to be hard . For
> Parquet, I've created a PR to run the CI against Hadoop 2.9.2
>  so we know when we're
> breaking compatibility.
>
> TLDR: It looks like if we want to upgrade Parquet, and other libraries in
> the future, we need to drop Hadoop 2. I'm hesitant to do that right now
> because we might exclude users that are still on older versions of Hadoop
> (such as Airbnb). Spark has announced that Spark 3.5 Hadoop 2 will be
> dropped .
> I'll create a PR for removing Spark 2.4 shortly because I see a consensus
> for removing that.
>
> Kind regards,
> Fokko
>
> Op wo 19 apr 2023 om 19:02 schreef Anton Okolnychyi <
> [email protected]>:
>
>> Yes, yes, yes!
>>
>> - Anton
>>
>> On Apr 19, 2023, at 8:17 AM, Ryan Blue  wrote:
>>
>> Sounds like we have consensus for removing Spark 2.4.
>>
>> Thanks, everyone!
>>
>> On Wed, Apr 19, 2023 at 12:36 AM Ajantha Bhat 
>> wrote:
>>
>>> +1,
>>> Spark-2.4 has reached EOL (
>>> https://lists.apache.org/thread/tdk7r5gx3nwrds3fg7qmp5h2jnqgc6tb and
>>> https://spark.apache.org/versioning-policy.html)
>>>
>>> Thanks,
>>> Ajantha
>>>
>>> On Wed, Apr 19, 2023 at 3:52 AM Edgar Rodriguez <
>>> [email protected]> wrote:
>>>
 I'm generally +1 on dropping Spark 2.4 - mostly everyone is moving to
 Spark 3.x, if not already moved.

 As for the Hadoop upgrade, I think that could be problematic for us if
 there's any non-backwards compatible API change required at compile time
 since we're still running a 2.8.x version.

 Cheers,

 On Mon, Apr 17, 2023 at 3:50 PM Steve Zhang <
 [email protected]> wrote:

> +1 for dropping Spark 2.4 support and we can clean up doc as well such
> as https://iceberg.apache.org/docs/latest/spark-queries/#spark-24
>
> Thanks,
> Steve Zhang
>
>
>
> On Apr 13, 2023, at 12:53 PM, Jack Ye  wrote:
>
> +1 for dropping 2.4 support
>
>
>

 --
 Edgar R
 Data Warehouse Infrastructure

>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>>
>>
>

-- 
Ryan Blue
Tabular


Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-20 Thread Anton Okolnychyi
What about JDK 8? If I remember Spark 2 was holding us, do we want to consider 
switching to JDK 11 for releases?

- Anton

> On Apr 20, 2023, at 2:10 AM, Driesprong, Fokko  wrote:
> 
> Thanks all for the response, much appreciated.
> 
> That said, I'd love to hear from more people on this. I think it would be 
> great to drop support, but I don't know how many people still use it. Is 
> upgrading Hadoop a good reason to drop support for an engine? Hadoop seems 
> like a minor concern to me unless it is blocking something.
> 
> I noticed that we needed to bump Hadoop when we wanted to upgrade to Parquet 
> 1.13.0 . It would be nice to get 
> this in since it allows for removing a workaround from the Iceberg codebase 
> (see PR for details).
> 
> Netflix is still on Spark-2.4.4 with Iceberg-0.9. We are actively migrating 
> to Spark-3.x and Iceberg 1.1 (or later). I do not anticipate us using 
> Spark-2.4.4 with newer versions of Iceberg (>0.9). 
> 
> For Spark 2.4 Iceberg up to 1.2.1 is available: 
> https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-2.4 
> 
> 
> As for the Hadoop upgrade, I think that could be problematic for us if 
> there's any non-backwards compatible API change required at compile time 
> since we're still running a 2.8.x version.
> 
> Thanks for raising this. I took some time today to dig into this. There is an 
> effort to upgrade Hadoop  in 
> Iceberg, but that's stuck on incompatibilities with Tez. Unfortunately, 
> Parquet 1.13.0 
> 
>  doesn't compile against Hadoop 2.8.5 and also bringing back support Hadoop 
> 2.8.x is going to be hard . 
> For Parquet, I've created a PR to run the CI against Hadoop 2.9.2 
>  so we know when we're 
> breaking compatibility.
> 
> TLDR: It looks like if we want to upgrade Parquet, and other libraries in the 
> future, we need to drop Hadoop 2. I'm hesitant to do that right now because 
> we might exclude users that are still on older versions of Hadoop (such as 
> Airbnb). Spark has announced that Spark 3.5 Hadoop 2 will be dropped 
> . I'll 
> create a PR for removing Spark 2.4 shortly because I see a consensus for 
> removing that.
> 
> Kind regards,
> Fokko
> 
> Op wo 19 apr 2023 om 19:02 schreef Anton Okolnychyi 
> :
> Yes, yes, yes!
> 
> - Anton
> 
>> On Apr 19, 2023, at 8:17 AM, Ryan Blue > > wrote:
>> 
>> Sounds like we have consensus for removing Spark 2.4.
>> 
>> Thanks, everyone!
>> 
>> On Wed, Apr 19, 2023 at 12:36 AM Ajantha Bhat > > wrote:
>> +1, 
>> Spark-2.4 has reached EOL 
>> (https://lists.apache.org/thread/tdk7r5gx3nwrds3fg7qmp5h2jnqgc6tb 
>>  and 
>> https://spark.apache.org/versioning-policy.html 
>> ) 
>> 
>> Thanks, 
>> Ajantha
>> 
>> On Wed, Apr 19, 2023 at 3:52 AM Edgar Rodriguez 
>> > > wrote:
>> I'm generally +1 on dropping Spark 2.4 - mostly everyone is moving to Spark 
>> 3.x, if not already moved.
>> 
>> As for the Hadoop upgrade, I think that could be problematic for us if 
>> there's any non-backwards compatible API change required at compile time 
>> since we're still running a 2.8.x version.
>> 
>> Cheers,
>> 
>> On Mon, Apr 17, 2023 at 3:50 PM Steve Zhang > > wrote:
>> +1 for dropping Spark 2.4 support and we can clean up doc as well such as 
>> https://iceberg.apache.org/docs/latest/spark-queries/#spark-24 
>> 
>> 
>> Thanks,
>> Steve Zhang
>> 
>> 
>> 
>>> On Apr 13, 2023, at 12:53 PM, Jack Ye >> > wrote:
>>> 
>>> +1 for dropping 2.4 support
>>> 
>> 
>> 
>> 
>> -- 
>> Edgar R
>> Data Warehouse Infrastructure
>> 
>> 
>> -- 
>> Ryan Blue
>> Tabular
> 



Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-20 Thread Driesprong, Fokko
Thanks all for the response, much appreciated.

That said, I'd love to hear from more people on this. I think it would be
> great to drop support, but I don't know how many people still use it. Is
> upgrading Hadoop a good reason to drop support for an engine? Hadoop seems
> like a minor concern to me unless it is blocking something.


I noticed that we needed to bump Hadoop when we wanted to upgrade to
Parquet 1.13.0 . It would be
nice to get this in since it allows for removing a workaround from the
Iceberg codebase (see PR for details).

Netflix is still on Spark-2.4.4 with Iceberg-0.9. We are actively migrating
> to Spark-3.x and Iceberg 1.1 (or later). I do not anticipate us
> using Spark-2.4.4 with newer versions of Iceberg (>0.9).


For Spark 2.4 Iceberg up to 1.2.1 is available:
https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-2.4

As for the Hadoop upgrade, I think that could be problematic for us if
> there's any non-backwards compatible API change required at compile time
> since we're still running a 2.8.x version.


Thanks for raising this. I took some time today to dig into this. There is
an effort to upgrade Hadoop  in
Iceberg, but that's stuck on incompatibilities with Tez. Unfortunately, Parquet
1.13.0

doesn't
compile against Hadoop 2.8.5 and also bringing back support Hadoop 2.8.x is
going to be hard . For
Parquet, I've created a PR to run the CI against Hadoop 2.9.2
 so we know when we're
breaking compatibility.

TLDR: It looks like if we want to upgrade Parquet, and other libraries in
the future, we need to drop Hadoop 2. I'm hesitant to do that right now
because we might exclude users that are still on older versions of Hadoop
(such as Airbnb). Spark has announced that Spark 3.5 Hadoop 2 will be
dropped .
I'll create a PR for removing Spark 2.4 shortly because I see a consensus
for removing that.

Kind regards,
Fokko

Op wo 19 apr 2023 om 19:02 schreef Anton Okolnychyi
:

> Yes, yes, yes!
>
> - Anton
>
> On Apr 19, 2023, at 8:17 AM, Ryan Blue  wrote:
>
> Sounds like we have consensus for removing Spark 2.4.
>
> Thanks, everyone!
>
> On Wed, Apr 19, 2023 at 12:36 AM Ajantha Bhat 
> wrote:
>
>> +1,
>> Spark-2.4 has reached EOL (
>> https://lists.apache.org/thread/tdk7r5gx3nwrds3fg7qmp5h2jnqgc6tb and
>> https://spark.apache.org/versioning-policy.html)
>>
>> Thanks,
>> Ajantha
>>
>> On Wed, Apr 19, 2023 at 3:52 AM Edgar Rodriguez <
>> [email protected]> wrote:
>>
>>> I'm generally +1 on dropping Spark 2.4 - mostly everyone is moving to
>>> Spark 3.x, if not already moved.
>>>
>>> As for the Hadoop upgrade, I think that could be problematic for us if
>>> there's any non-backwards compatible API change required at compile time
>>> since we're still running a 2.8.x version.
>>>
>>> Cheers,
>>>
>>> On Mon, Apr 17, 2023 at 3:50 PM Steve Zhang <
>>> [email protected]> wrote:
>>>
 +1 for dropping Spark 2.4 support and we can clean up doc as well such
 as https://iceberg.apache.org/docs/latest/spark-queries/#spark-24

 Thanks,
 Steve Zhang



 On Apr 13, 2023, at 12:53 PM, Jack Ye  wrote:

 +1 for dropping 2.4 support



>>>
>>> --
>>> Edgar R
>>> Data Warehouse Infrastructure
>>>
>>
>
> --
> Ryan Blue
> Tabular
>
>
>


Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-19 Thread Anton Okolnychyi
Yes, yes, yes!

- Anton

> On Apr 19, 2023, at 8:17 AM, Ryan Blue  wrote:
> 
> Sounds like we have consensus for removing Spark 2.4.
> 
> Thanks, everyone!
> 
> On Wed, Apr 19, 2023 at 12:36 AM Ajantha Bhat  > wrote:
> +1, 
> Spark-2.4 has reached EOL 
> (https://lists.apache.org/thread/tdk7r5gx3nwrds3fg7qmp5h2jnqgc6tb 
>  and 
> https://spark.apache.org/versioning-policy.html 
> ) 
> 
> Thanks, 
> Ajantha
> 
> On Wed, Apr 19, 2023 at 3:52 AM Edgar Rodriguez 
>  wrote:
> I'm generally +1 on dropping Spark 2.4 - mostly everyone is moving to Spark 
> 3.x, if not already moved.
> 
> As for the Hadoop upgrade, I think that could be problematic for us if 
> there's any non-backwards compatible API change required at compile time 
> since we're still running a 2.8.x version.
> 
> Cheers,
> 
> On Mon, Apr 17, 2023 at 3:50 PM Steve Zhang  
> wrote:
> +1 for dropping Spark 2.4 support and we can clean up doc as well such as 
> https://iceberg.apache.org/docs/latest/spark-queries/#spark-24 
> 
> 
> Thanks,
> Steve Zhang
> 
> 
> 
>> On Apr 13, 2023, at 12:53 PM, Jack Ye > > wrote:
>> 
>> +1 for dropping 2.4 support
>> 
> 
> 
> 
> -- 
> Edgar R
> Data Warehouse Infrastructure
> 
> 
> -- 
> Ryan Blue
> Tabular



Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-19 Thread Ryan Blue
Sounds like we have consensus for removing Spark 2.4.

Thanks, everyone!

On Wed, Apr 19, 2023 at 12:36 AM Ajantha Bhat  wrote:

> +1,
> Spark-2.4 has reached EOL (
> https://lists.apache.org/thread/tdk7r5gx3nwrds3fg7qmp5h2jnqgc6tb and
> https://spark.apache.org/versioning-policy.html)
>
> Thanks,
> Ajantha
>
> On Wed, Apr 19, 2023 at 3:52 AM Edgar Rodriguez
>  wrote:
>
>> I'm generally +1 on dropping Spark 2.4 - mostly everyone is moving to
>> Spark 3.x, if not already moved.
>>
>> As for the Hadoop upgrade, I think that could be problematic for us if
>> there's any non-backwards compatible API change required at compile time
>> since we're still running a 2.8.x version.
>>
>> Cheers,
>>
>> On Mon, Apr 17, 2023 at 3:50 PM Steve Zhang
>>  wrote:
>>
>>> +1 for dropping Spark 2.4 support and we can clean up doc as well such
>>> as https://iceberg.apache.org/docs/latest/spark-queries/#spark-24
>>>
>>> Thanks,
>>> Steve Zhang
>>>
>>>
>>>
>>> On Apr 13, 2023, at 12:53 PM, Jack Ye  wrote:
>>>
>>> +1 for dropping 2.4 support
>>>
>>>
>>>
>>
>> --
>> Edgar R
>> Data Warehouse Infrastructure
>>
>

-- 
Ryan Blue
Tabular


Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-19 Thread Ajantha Bhat
+1,
Spark-2.4 has reached EOL (
https://lists.apache.org/thread/tdk7r5gx3nwrds3fg7qmp5h2jnqgc6tb and
https://spark.apache.org/versioning-policy.html)

Thanks,
Ajantha

On Wed, Apr 19, 2023 at 3:52 AM Edgar Rodriguez
 wrote:

> I'm generally +1 on dropping Spark 2.4 - mostly everyone is moving to
> Spark 3.x, if not already moved.
>
> As for the Hadoop upgrade, I think that could be problematic for us if
> there's any non-backwards compatible API change required at compile time
> since we're still running a 2.8.x version.
>
> Cheers,
>
> On Mon, Apr 17, 2023 at 3:50 PM Steve Zhang
>  wrote:
>
>> +1 for dropping Spark 2.4 support and we can clean up doc as well such as
>> https://iceberg.apache.org/docs/latest/spark-queries/#spark-24
>>
>> Thanks,
>> Steve Zhang
>>
>>
>>
>> On Apr 13, 2023, at 12:53 PM, Jack Ye  wrote:
>>
>> +1 for dropping 2.4 support
>>
>>
>>
>
> --
> Edgar R
> Data Warehouse Infrastructure
>


Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-18 Thread Edgar Rodriguez
I'm generally +1 on dropping Spark 2.4 - mostly everyone is moving to Spark
3.x, if not already moved.

As for the Hadoop upgrade, I think that could be problematic for us if
there's any non-backwards compatible API change required at compile time
since we're still running a 2.8.x version.

Cheers,

On Mon, Apr 17, 2023 at 3:50 PM Steve Zhang 
wrote:

> +1 for dropping Spark 2.4 support and we can clean up doc as well such as
> https://iceberg.apache.org/docs/latest/spark-queries/#spark-24
>
> Thanks,
> Steve Zhang
>
>
>
> On Apr 13, 2023, at 12:53 PM, Jack Ye  wrote:
>
> +1 for dropping 2.4 support
>
>
>

-- 
Edgar R
Data Warehouse Infrastructure


Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-17 Thread Steve Zhang
+1 for dropping Spark 2.4 support and we can clean up doc as well such as 
https://iceberg.apache.org/docs/latest/spark-queries/#spark-24

Thanks,
Steve Zhang



> On Apr 13, 2023, at 12:53 PM, Jack Ye  wrote:
> 
> +1 for dropping 2.4 support
> 



Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-14 Thread John Zhuge
+1 on removing 2.4 support

On Fri, Apr 14, 2023 at 5:31 PM John Zhuge  wrote:

> Netflix internal Spark 2.4 is different from OSS. It is closer to OSS 3.0
> or 3.1 because it has DataSourceV2 and catalog support. So we don't rely on
> Iceberg Spark 2.4 code.
>
> On Fri, Apr 14, 2023 at 3:12 PM Russell Spitzer 
> wrote:
>
>> +1, Spark 2.4 is very out of sync with current developments as noted
>> above. It's almost impossible for us to get any newer features to be
>> compatible with it.
>>
>> On Fri, Apr 14, 2023 at 4:52 PM Anjali Norwood
>>  wrote:
>>
>>> Hi Fokko, Ryan,
>>>
>>> Netflix is still on Spark-2.4.4 with Iceberg-0.9. We are
>>> actively migrating to Spark-3.x and Iceberg 1.1 (or later). I do not
>>> anticipate us using Spark-2.4.4 with newer versions of Iceberg (>0.9).
>>> If the plan is to not support Spark-2.4.4 with Iceberg >= 1.X, that
>>> should be ok.
>>> @John Zhuge  can you please chime in?
>>>
>>> thanks,
>>> Anjali
>>>
>>>
>>> On Fri, Apr 14, 2023 at 10:56 AM Ryan Blue  wrote:
>>>
 Overall I'm +1, but could be convinced otherwise.

 Spark 2.4 is old and doesn't really function properly because the Spark
 Catalog API was missing at the time. And people can still use older
 versions of Iceberg that support Spark 2.4 if they need it because the
 Iceberg spec guarantees forward compatibility.

 That said, I'd love to hear from more people on this. I think it
 would be great to drop support, but I don't know how many people still use
 it. Is upgrading Hadoop a good reason to drop support for an engine? Hadoop
 seems like a minor concern to me unless it is blocking something.

 Ryan

 On Thu, Apr 13, 2023 at 12:54 PM Jack Ye  wrote:

> +1 for dropping 2.4 support
>
> Best,
> Jack Ye
>
> On Thu, Apr 13, 2023 at 10:59 AM Fokko Driesprong 
> wrote:
>
>> Hi all,
>>
>> I'm working on moving to Hadoop 3.x
>> , and one thing is that
>> it seems to be incompatible with Spark 2.4. I wanted to ask if people are
>> still on Spark 2.4 and what we think of dropping the support. The last
>> release of Spark 2.4.8 was on 2021-05-17 and it also looks like the 2.4
>> branch on the Spark Github repository is stale, so I don't expect any
>> further releases.
>>
>> Before creating a PR I would like to check on the mail-list if anyone
>> has any objections. If so, please let us know.
>>
>> Thanks,
>> Fokko Driesprong
>>
>

 --
 Ryan Blue
 Tabular

>>>
>
> --
> John Zhuge
>


-- 
John Zhuge


Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-14 Thread John Zhuge
Netflix internal Spark 2.4 is different from OSS. It is closer to OSS 3.0
or 3.1 because it has DataSourceV2 and catalog support. So we don't rely on
Iceberg Spark 2.4 code.

On Fri, Apr 14, 2023 at 3:12 PM Russell Spitzer 
wrote:

> +1, Spark 2.4 is very out of sync with current developments as noted
> above. It's almost impossible for us to get any newer features to be
> compatible with it.
>
> On Fri, Apr 14, 2023 at 4:52 PM Anjali Norwood
>  wrote:
>
>> Hi Fokko, Ryan,
>>
>> Netflix is still on Spark-2.4.4 with Iceberg-0.9. We are
>> actively migrating to Spark-3.x and Iceberg 1.1 (or later). I do not
>> anticipate us using Spark-2.4.4 with newer versions of Iceberg (>0.9).
>> If the plan is to not support Spark-2.4.4 with Iceberg >= 1.X, that
>> should be ok.
>> @John Zhuge  can you please chime in?
>>
>> thanks,
>> Anjali
>>
>>
>> On Fri, Apr 14, 2023 at 10:56 AM Ryan Blue  wrote:
>>
>>> Overall I'm +1, but could be convinced otherwise.
>>>
>>> Spark 2.4 is old and doesn't really function properly because the Spark
>>> Catalog API was missing at the time. And people can still use older
>>> versions of Iceberg that support Spark 2.4 if they need it because the
>>> Iceberg spec guarantees forward compatibility.
>>>
>>> That said, I'd love to hear from more people on this. I think it
>>> would be great to drop support, but I don't know how many people still use
>>> it. Is upgrading Hadoop a good reason to drop support for an engine? Hadoop
>>> seems like a minor concern to me unless it is blocking something.
>>>
>>> Ryan
>>>
>>> On Thu, Apr 13, 2023 at 12:54 PM Jack Ye  wrote:
>>>
 +1 for dropping 2.4 support

 Best,
 Jack Ye

 On Thu, Apr 13, 2023 at 10:59 AM Fokko Driesprong 
 wrote:

> Hi all,
>
> I'm working on moving to Hadoop 3.x
> , and one thing is that
> it seems to be incompatible with Spark 2.4. I wanted to ask if people are
> still on Spark 2.4 and what we think of dropping the support. The last
> release of Spark 2.4.8 was on 2021-05-17 and it also looks like the 2.4
> branch on the Spark Github repository is stale, so I don't expect any
> further releases.
>
> Before creating a PR I would like to check on the mail-list if anyone
> has any objections. If so, please let us know.
>
> Thanks,
> Fokko Driesprong
>

>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>

-- 
John Zhuge


Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-14 Thread Russell Spitzer
+1, Spark 2.4 is very out of sync with current developments as noted above.
It's almost impossible for us to get any newer features to be compatible
with it.

On Fri, Apr 14, 2023 at 4:52 PM Anjali Norwood 
wrote:

> Hi Fokko, Ryan,
>
> Netflix is still on Spark-2.4.4 with Iceberg-0.9. We are
> actively migrating to Spark-3.x and Iceberg 1.1 (or later). I do not
> anticipate us using Spark-2.4.4 with newer versions of Iceberg (>0.9).
> If the plan is to not support Spark-2.4.4 with Iceberg >= 1.X, that should
> be ok.
> @John Zhuge  can you please chime in?
>
> thanks,
> Anjali
>
>
> On Fri, Apr 14, 2023 at 10:56 AM Ryan Blue  wrote:
>
>> Overall I'm +1, but could be convinced otherwise.
>>
>> Spark 2.4 is old and doesn't really function properly because the Spark
>> Catalog API was missing at the time. And people can still use older
>> versions of Iceberg that support Spark 2.4 if they need it because the
>> Iceberg spec guarantees forward compatibility.
>>
>> That said, I'd love to hear from more people on this. I think it would be
>> great to drop support, but I don't know how many people still use it. Is
>> upgrading Hadoop a good reason to drop support for an engine? Hadoop seems
>> like a minor concern to me unless it is blocking something.
>>
>> Ryan
>>
>> On Thu, Apr 13, 2023 at 12:54 PM Jack Ye  wrote:
>>
>>> +1 for dropping 2.4 support
>>>
>>> Best,
>>> Jack Ye
>>>
>>> On Thu, Apr 13, 2023 at 10:59 AM Fokko Driesprong 
>>> wrote:
>>>
 Hi all,

 I'm working on moving to Hadoop 3.x
 , and one thing is that
 it seems to be incompatible with Spark 2.4. I wanted to ask if people are
 still on Spark 2.4 and what we think of dropping the support. The last
 release of Spark 2.4.8 was on 2021-05-17 and it also looks like the 2.4
 branch on the Spark Github repository is stale, so I don't expect any
 further releases.

 Before creating a PR I would like to check on the mail-list if anyone
 has any objections. If so, please let us know.

 Thanks,
 Fokko Driesprong

>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>


Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-14 Thread Anjali Norwood
Hi Fokko, Ryan,

Netflix is still on Spark-2.4.4 with Iceberg-0.9. We are actively migrating
to Spark-3.x and Iceberg 1.1 (or later). I do not anticipate us using
Spark-2.4.4 with newer versions of Iceberg (>0.9).
If the plan is to not support Spark-2.4.4 with Iceberg >= 1.X, that should
be ok.
@John Zhuge  can you please chime in?

thanks,
Anjali


On Fri, Apr 14, 2023 at 10:56 AM Ryan Blue  wrote:

> Overall I'm +1, but could be convinced otherwise.
>
> Spark 2.4 is old and doesn't really function properly because the Spark
> Catalog API was missing at the time. And people can still use older
> versions of Iceberg that support Spark 2.4 if they need it because the
> Iceberg spec guarantees forward compatibility.
>
> That said, I'd love to hear from more people on this. I think it would be
> great to drop support, but I don't know how many people still use it. Is
> upgrading Hadoop a good reason to drop support for an engine? Hadoop seems
> like a minor concern to me unless it is blocking something.
>
> Ryan
>
> On Thu, Apr 13, 2023 at 12:54 PM Jack Ye  wrote:
>
>> +1 for dropping 2.4 support
>>
>> Best,
>> Jack Ye
>>
>> On Thu, Apr 13, 2023 at 10:59 AM Fokko Driesprong 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'm working on moving to Hadoop 3.x
>>> , and one thing is that it
>>> seems to be incompatible with Spark 2.4. I wanted to ask if people are
>>> still on Spark 2.4 and what we think of dropping the support. The last
>>> release of Spark 2.4.8 was on 2021-05-17 and it also looks like the 2.4
>>> branch on the Spark Github repository is stale, so I don't expect any
>>> further releases.
>>>
>>> Before creating a PR I would like to check on the mail-list if anyone
>>> has any objections. If so, please let us know.
>>>
>>> Thanks,
>>> Fokko Driesprong
>>>
>>
>
> --
> Ryan Blue
> Tabular
>


Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-14 Thread Ryan Blue
Overall I'm +1, but could be convinced otherwise.

Spark 2.4 is old and doesn't really function properly because the Spark
Catalog API was missing at the time. And people can still use older
versions of Iceberg that support Spark 2.4 if they need it because the
Iceberg spec guarantees forward compatibility.

That said, I'd love to hear from more people on this. I think it would be
great to drop support, but I don't know how many people still use it. Is
upgrading Hadoop a good reason to drop support for an engine? Hadoop seems
like a minor concern to me unless it is blocking something.

Ryan

On Thu, Apr 13, 2023 at 12:54 PM Jack Ye  wrote:

> +1 for dropping 2.4 support
>
> Best,
> Jack Ye
>
> On Thu, Apr 13, 2023 at 10:59 AM Fokko Driesprong 
> wrote:
>
>> Hi all,
>>
>> I'm working on moving to Hadoop 3.x
>> , and one thing is that it
>> seems to be incompatible with Spark 2.4. I wanted to ask if people are
>> still on Spark 2.4 and what we think of dropping the support. The last
>> release of Spark 2.4.8 was on 2021-05-17 and it also looks like the 2.4
>> branch on the Spark Github repository is stale, so I don't expect any
>> further releases.
>>
>> Before creating a PR I would like to check on the mail-list if anyone has
>> any objections. If so, please let us know.
>>
>> Thanks,
>> Fokko Driesprong
>>
>

-- 
Ryan Blue
Tabular


Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-13 Thread Jack Ye
+1 for dropping 2.4 support

Best,
Jack Ye

On Thu, Apr 13, 2023 at 10:59 AM Fokko Driesprong  wrote:

> Hi all,
>
> I'm working on moving to Hadoop 3.x
> , and one thing is that it
> seems to be incompatible with Spark 2.4. I wanted to ask if people are
> still on Spark 2.4 and what we think of dropping the support. The last
> release of Spark 2.4.8 was on 2021-05-17 and it also looks like the 2.4
> branch on the Spark Github repository is stale, so I don't expect any
> further releases.
>
> Before creating a PR I would like to check on the mail-list if anyone has
> any objections. If so, please let us know.
>
> Thanks,
> Fokko Driesprong
>