Re: [DISCUSS] Spark version support strategy

2021-09-16 Thread Peter Vary
Since you mentioned Hive, I chime in with what we do there. You might find it useful: - metastore module - only small differences - DynConstructor solves for us - mr module - some bigger differences, but still manageable for Hive 2-3. Need some new classes, but most of the code is reused - extra

Re: [DISCUSS] Spark version support strategy

2021-09-16 Thread Anton Okolnychyi
Okay, looks like there is consensus around supporting multiple Spark versions at the same time. There are folks who mentioned this on this thread and there were folks who brought this up during the sync. Let’s think through Option 2 and 3 in more detail then. Option 2 In Option 2, there will

Re: [DISCUSS] Spark configuration

2021-09-16 Thread Anton Okolnychyi
Here is the PR: https://github.com/apache/iceberg/pull/3132 - Anton > On 16 Sep 2021, at 10:45, Anton Okolnychyi > wrote: > > Hey devs, > > I think we can improve the way we handle our Spark configuration right now. > Specifically, I see a few

[DISCUSS] Spark configuration

2021-09-16 Thread Anton Okolnychyi
Hey devs, I think we can improve the way we handle our Spark configuration right now. Specifically, I see a few issues. Our SQL configs are scattered across a number of classes. For example, we have some SQL configs in SparkUtil, IcebergSource, SparkWriteBuilder, Spark3Util. I think having a

Re: [DISCUSS] Spark version support strategy

2021-09-16 Thread Ryan Blue
I'd support the option that Jack suggests if we can set a few expectations for keeping it clean. First, I'd like to avoid refactoring code to share it across Spark versions -- that introduces risk because we're relying on compiling against one version and running in another and both Spark and

Re: [DISCUSS] Spark version support strategy

2021-09-16 Thread Jack Ye
I think in Ryan's proposal we will create a ton of modules anyway, as Wing listed we are just using git branch as an additional dimension, but my understanding is that you will still have 1 core, 1 extension, 1 runtime artifact published for each Spark version in either approach. In that case,

Re: Snapshot tagging, branching and retention

2021-09-16 Thread Eduard Tudenhoefner
Nice work Jack, the proposal looks really good. On Sun, Aug 29, 2021 at 9:20 AM Jack Ye wrote: > Hi everyone, > > Recently I have published PR 2961 - add snapshot tags interface ( > https://github.com/apache/iceberg/pull/2961) and received a lot of great > feedback. I have summarized everything