Re: [Discuss] Merge spark-3 branch into master

2020-04-21 Thread Saisai Shao
We're still facing the version constraint problem by gradle plugins :(


jiantao yu  于2020年4月22日周三 下午12:08写道:

> Hi saisai,
> Would you please share your progress on merging spark-3 branch into
> master?
> We  are trying iceberg with spark sql, which is only supported in spark 3.
>
> On 2020/03/27 01:53:09, Saisai Shao  wrote:
> > Thanks Ryan, let me take a try.>
> >
> > Best regards,>
> > Saisai>
> >
> > Ryan Blue  于2020年3月27日周五 上午12:15写道:>
> >
> > > Here’s how it was done before:>
> > >
> https://github.com/apache/incubator-iceberg/blob/867ec79a5c2f7619cb10546b5cc7f7bbc7d61621/build.gradle#L225-L244>
>
> > >>
> > > That defines a set of projects called baselineProjects and applies>
> > > baseline like this:>
> > >>
> > > configure(baselineProjects) {>
> > >   apply plugin: 'com.palantir.baseline-checkstyle'>
> > >   ...>
> > > }>
> > >>
> > > The baseline config has since been moved into baseline.gradle>
> > > <
> https://github.com/apache/incubator-iceberg/blob/master/baseline.gradle>>
> > > so changes should probably go into that file. Thanks for looking into
> this!>
> > >>
> > > On Thu, Mar 26, 2020 at 6:23 AM Mass Dosage  wrote:>
> > >>
> > >> We'd like to know how to do this too. We're working on the Hive>
> > >> integration and Hive requires older versions of many of the libraries
> that>
> > >> Iceberg uses (Guava, Calcite and Avro are being the most
> problematic).>
> > >> We're going to need to shade some of these in the iceberg modules we
> depend>
> > >> on but it would also be very useful to be able to override the
> versions in>
> > >> the iceberg-hive and iceberg-mr modules so that they aren't locked to
> the>
> > >> same versions as the rest of the projects.>
> > >>>
> > >> On Thu, 26 Mar 2020 at 01:53, Saisai Shao  wrote:>
> > >>>
> > >>> Hi Ryan,>
> > 
> > >>> As mentioned in the meeting, would you please point me out the way
> to>
> > >>> make some submodules excluded from consistent-versions plugin.>
> > 
> > >>> Thanks>
> > >>> Saisai>
> > 
> > >>> Anton Okolnychyi  于2020年3月18日周三 上午4:14写道:>
> > 
> >  I am +1 on having spark-2 and spark-3 modules as well.>
> > >
> >  On 7 Mar 2020, at 15:03, RD  wrote:>
> > >
> >  I'm +1 to separate modules for spark-2 and spark-3, after the 0.8>
> >  release.>
> >  I think it would be a big change in organizations to adopt Spark-3>
> >  since that brings in Scala-2.12 which is binary incompatible to
> previous>
> >  Scala versions. Hence this adoption could take a lot of time. I
> know in our>
> >  company we have no near term plans to move to Spark 3.>
> > >
> >  -Best,>
> >  R.>
> > >
> >  On Thu, Mar 5, 2020 at 6:33 PM Saisai Shao >
> >  wrote:>
> > >
> > > I was thinking that if it is possible to limit version lock plugin
> to>
> > > only iceberg core related subprojects., seems like current>
> > > consistent-versions plugin doesn't allow to do so. So not sure if
> there're>
> > > some other plugins which could provide similar functionality with
> more>
> > > flexibility?>
> > >>
> > >  Any suggestions on this?>
> > >>
> > > Best regards,>
> > > Saisai>
> > >>
> > > Saisai Shao  于2020年3月5日周四 下午3:12写道:>
> > >>
> > >> I think the requirement of supporting different version should
> be>
> > >> quite common. As Iceberg is a table format which should be
> adapted to>
> > >> different engines like Hive, Flink, Spark. To support different
> versions is>
> > >> a real problem, Spark is just one case, Hive, Flink could also be
> the case>
> > >> if the interface is changed across major versions. Also version
> lock may>
> > >> have problems when several engines coexisted in the same build,
> as they>
> > >> will transiently introduce lots of dependencies which may be
> conflicted, it>
> > >> may be hard to figure out one version which could satisfy all,
> and usually>
> > >> they only confined to a single module.>
> > >>>
> > >>  So I think we should figure out a way to support such scenario,
> not>
> > >> just maintaining branches one by one.>
> > >>>
> > >> Ryan Blue  于2020年3月5日周四 上午2:53写道:>
> > >>>
> > >>> I think the key is that this wouldn't be using the same
> published>
> > >>> artifacts. This work would create a spark-2.4 artifact and a
> spark-3.0>
> > >>> artifact. (And possibly a spark-common artifact.)>
> > 
> > >>> It seems reasonable to me to have those in the same build
> instead of>
> > >>> in separate branches, as long as the Spark dependencies are not
> leaked>
> > >>> outside of the modules. That said, I'd rather have the
> additional checks>
> > >>> that baseline provides in general since this is a short-term
> problem. It>
> > >>> would just be nice if we could have versions that are confined
> to a single>
> > >>> module. The Nebula plugin that baseline uses claims to support
> that, but I>
> > >>>

Re: [Discuss] Merge spark-3 branch into master

2020-04-21 Thread jiantao yu
Hi saisai,
Would you please share your progress on merging spark-3 branch into master? 
We  are trying iceberg with spark sql, which is only supported in spark 3. 

On 2020/03/27 01:53:09, Saisai Shao  wrote: 
> Thanks Ryan, let me take a try.> 
> 
> Best regards,> 
> Saisai> 
> 
> Ryan Blue  于2020年3月27日周五 上午12:15写道:> 
> 
> > Here’s how it was done before:> 
> > https://github.com/apache/incubator-iceberg/blob/867ec79a5c2f7619cb10546b5cc7f7bbc7d61621/build.gradle#L225-L244>
> >  
> >> 
> > That defines a set of projects called baselineProjects and applies> 
> > baseline like this:> 
> >> 
> > configure(baselineProjects) {> 
> >   apply plugin: 'com.palantir.baseline-checkstyle'> 
> >   ...> 
> > }> 
> >> 
> > The baseline config has since been moved into baseline.gradle> 
> > > 
> > so changes should probably go into that file. Thanks for looking into 
> > this!> 
> >> 
> > On Thu, Mar 26, 2020 at 6:23 AM Mass Dosage  wrote:> 
> >> 
> >> We'd like to know how to do this too. We're working on the Hive> 
> >> integration and Hive requires older versions of many of the libraries 
> >> that> 
> >> Iceberg uses (Guava, Calcite and Avro are being the most problematic).> 
> >> We're going to need to shade some of these in the iceberg modules we 
> >> depend> 
> >> on but it would also be very useful to be able to override the versions 
> >> in> 
> >> the iceberg-hive and iceberg-mr modules so that they aren't locked to the> 
> >> same versions as the rest of the projects.> 
> >>> 
> >> On Thu, 26 Mar 2020 at 01:53, Saisai Shao  wrote:> 
> >>> 
> >>> Hi Ryan,> 
>  
> >>> As mentioned in the meeting, would you please point me out the way to> 
> >>> make some submodules excluded from consistent-versions plugin.> 
>  
> >>> Thanks> 
> >>> Saisai> 
>  
> >>> Anton Okolnychyi  于2020年3月18日周三 上午4:14写道:> 
>  
>  I am +1 on having spark-2 and spark-3 modules as well.> 
> > 
>  On 7 Mar 2020, at 15:03, RD  wrote:> 
> > 
>  I'm +1 to separate modules for spark-2 and spark-3, after the 0.8> 
>  release.> 
>  I think it would be a big change in organizations to adopt Spark-3> 
>  since that brings in Scala-2.12 which is binary incompatible to 
>  previous> 
>  Scala versions. Hence this adoption could take a lot of time. I know in 
>  our> 
>  company we have no near term plans to move to Spark 3.> 
> > 
>  -Best,> 
>  R.> 
> > 
>  On Thu, Mar 5, 2020 at 6:33 PM Saisai Shao > 
>  wrote:> 
> > 
> > I was thinking that if it is possible to limit version lock plugin to> 
> > only iceberg core related subprojects., seems like current> 
> > consistent-versions plugin doesn't allow to do so. So not sure if 
> > there're> 
> > some other plugins which could provide similar functionality with more> 
> > flexibility?> 
> >> 
> >  Any suggestions on this?> 
> >> 
> > Best regards,> 
> > Saisai> 
> >> 
> > Saisai Shao  于2020年3月5日周四 下午3:12写道:> 
> >> 
> >> I think the requirement of supporting different version should be> 
> >> quite common. As Iceberg is a table format which should be adapted to> 
> >> different engines like Hive, Flink, Spark. To support different 
> >> versions is> 
> >> a real problem, Spark is just one case, Hive, Flink could also be the 
> >> case> 
> >> if the interface is changed across major versions. Also version lock 
> >> may> 
> >> have problems when several engines coexisted in the same build, as 
> >> they> 
> >> will transiently introduce lots of dependencies which may be 
> >> conflicted, it> 
> >> may be hard to figure out one version which could satisfy all, and 
> >> usually> 
> >> they only confined to a single module.> 
> >>> 
> >>  So I think we should figure out a way to support such scenario, not> 
> >> just maintaining branches one by one.> 
> >>> 
> >> Ryan Blue  于2020年3月5日周四 上午2:53写道:> 
> >>> 
> >>> I think the key is that this wouldn't be using the same published> 
> >>> artifacts. This work would create a spark-2.4 artifact and a 
> >>> spark-3.0> 
> >>> artifact. (And possibly a spark-common artifact.)> 
>  
> >>> It seems reasonable to me to have those in the same build instead of> 
> >>> in separate branches, as long as the Spark dependencies are not 
> >>> leaked> 
> >>> outside of the modules. That said, I'd rather have the additional 
> >>> checks> 
> >>> that baseline provides in general since this is a short-term problem. 
> >>> It> 
> >>> would just be nice if we could have versions that are confined to a 
> >>> single> 
> >>> module. The Nebula plugin that baseline uses claims to support that, 
> >>> but I> 
> >>> couldn't get it to work.> 
>  
> >>> On Wed, Mar 4, 2020 at 6:38 AM Saisai Shao > 
>