Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-15 Thread Jungtaek Lim
Once we decided to cancel the RC1, what about including SPARK-29450 (
https://github.com/apache/spark/pull/27209) into RC2?

SPARK-29450 was merged into master, and Xiao figured out it fixed a
regression, long lasting one (broken at 2.3.0). The link refers the PR for
2.4 branch.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Thu, Jan 16, 2020 at 12:56 PM Dongjoon Hyun 
wrote:

> Sure. Wenchen and Hyukjin.
>
> I observed all of the above reported issues and have been waiting to
> collect more information before cancelling RC1 vote.
>
> The other stuff I've observed is that Marcelo and Sean also requested
> reverting the existing commit.
> - https://github.com/apache/spark/pull/24732 (spark.shuffle.io.backLog
> change)
>
> To All.
> We want your explicit feedbacks. Please reply on this thread.
>
> Although we get enough positive feedbacks here, I'll cancel this RC1.
> I want to address at least the above negative feedbacks and roll RC2 next
> Monday.
>
> Bests,
> Dongjoon.
>
>
> On Wed, Jan 15, 2020 at 7:47 PM Hyukjin Kwon  wrote:
>
>> If we go for RC2, we should include both:
>>
>> https://github.com/apache/spark/pull/27210
>> https://github.com/apache/spark/pull/27184
>>
>> just for the sake of being complete and making the maintenance simple.
>>
>>
>> 2020년 1월 16일 (목) 오후 12:38, Wenchen Fan 님이 작성:
>>
>>> Recently we merged several fixes to 2.4:
>>> https://issues.apache.org/jira/browse/SPARK-30325   a driver hang issue
>>> https://issues.apache.org/jira/browse/SPARK-30246   a memory leak issue
>>> https://issues.apache.org/jira/browse/SPARK-29708   a correctness
>>> issue(for a rarely used feature, so not merged to 2.4 yet)
>>>
>>> Shall we include them?
>>>
>>>
>>> On Wed, Jan 15, 2020 at 9:51 PM Hyukjin Kwon 
>>> wrote:
>>>
 +1

 On Wed, 15 Jan 2020, 08:24 Takeshi Yamamuro, 
 wrote:

> +1;
>
> I checked the links and materials, then I run the tests with
> `-Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pkubernetes
> -Psparkr`
> on macOS (Java 8).
> All the things look fine and I didn't see the error on my env
> that Sean said above.
>
> Thanks, Dongjoon!
>
> Bests,
> Takeshi
>
> On Wed, Jan 15, 2020 at 4:09 AM DB Tsai  wrote:
>
>> +1 Thanks.
>>
>> Sincerely,
>>
>> DB Tsai
>> --
>> Web: https://www.dbtsai.com
>> PGP Key ID: 42E5B25A8F7A82C1
>>
>> On Tue, Jan 14, 2020 at 11:08 AM Sean Owen  wrote:
>> >
>> > Yeah it's something about the env I spun up, but I don't know what.
>> It
>> > happens frequently when I test, but not on Jenkins.
>> > The Kafka error comes up every now and then and a clean rebuild
>> fixes
>> > it, but not in my case. I don't know why.
>> > But if nobody else sees it, I'm pretty sure it's just an artifact of
>> > the local VM.
>> >
>> > On Tue, Jan 14, 2020 at 12:57 PM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>> > >
>> > > Thank you, Sean.
>> > >
>> > > First of all, the `Ubuntu` job on Amplab Jenkins farm is green.
>> > >
>> > >
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.4-test-sbt-hadoop-2.7-ubuntu-testing/
>> > >
>> > > For the failures,
>> > >1. Yes, the `HiveExternalCatalogVersionsSuite` flakiness is a
>> known one.
>> > >2. For `HDFSMetadataLogSuite` failure, I also observed a few
>> time before in CentOS too.
>> > >3. Kafka build error is new to me. Does it happen on `Maven`
>> clean build?
>> > >
>> > > Bests,
>> > > Dongjoon.
>> > >
>> > >
>> > > On Tue, Jan 14, 2020 at 6:40 AM Sean Owen 
>> wrote:
>> > >>
>> > >> +1 from me. I checked sigs/licenses, and built/tested from
>> source on
>> > >> Java 8 + Ubuntu 18.04 with " -Pyarn -Phive -Phive-thriftserver
>> > >> -Phadoop-2.7 -Pmesos -Pkubernetes -Psparkr -Pkinesis-asl". I do
>> get
>> > >> test failures, but, these are some I have always seen on Ubuntu,
>> and I
>> > >> do not know why they happen. They don't seem to affect others,
>> but,
>> > >> let me know if anyone else sees these?
>> > >>
>> > >>
>> > >> Always happens for me:
>> > >>
>> > >> - HDFSMetadataLog: metadata directory collision *** FAILED ***
>> > >>   The await method on Waiter timed out.
>> (HDFSMetadataLogSuite.scala:178)
>> > >>
>> > >> This one has been flaky at times due to external dependencies:
>> > >>
>> > >> org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite ***
>> ABORTED ***
>> > >>   Exception encountered when invoking run on a nested suite -
>> > >> spark-submit returned with exit code 1.
>> > >>   Command line: './bin/spark-submit' '--name' 'prepare testing
>> tables'
>> > >> '--master' 'local[2]' '--conf' 'spark.ui.enabled=false' '--conf'
>> > >> 

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-15 Thread Dongjoon Hyun
Sure. Wenchen and Hyukjin.

I observed all of the above reported issues and have been waiting to
collect more information before cancelling RC1 vote.

The other stuff I've observed is that Marcelo and Sean also requested
reverting the existing commit.
- https://github.com/apache/spark/pull/24732 (spark.shuffle.io.backLog
change)

To All.
We want your explicit feedbacks. Please reply on this thread.

Although we get enough positive feedbacks here, I'll cancel this RC1.
I want to address at least the above negative feedbacks and roll RC2 next
Monday.

Bests,
Dongjoon.


On Wed, Jan 15, 2020 at 7:47 PM Hyukjin Kwon  wrote:

> If we go for RC2, we should include both:
>
> https://github.com/apache/spark/pull/27210
> https://github.com/apache/spark/pull/27184
>
> just for the sake of being complete and making the maintenance simple.
>
>
> 2020년 1월 16일 (목) 오후 12:38, Wenchen Fan 님이 작성:
>
>> Recently we merged several fixes to 2.4:
>> https://issues.apache.org/jira/browse/SPARK-30325   a driver hang issue
>> https://issues.apache.org/jira/browse/SPARK-30246   a memory leak issue
>> https://issues.apache.org/jira/browse/SPARK-29708   a correctness
>> issue(for a rarely used feature, so not merged to 2.4 yet)
>>
>> Shall we include them?
>>
>>
>> On Wed, Jan 15, 2020 at 9:51 PM Hyukjin Kwon  wrote:
>>
>>> +1
>>>
>>> On Wed, 15 Jan 2020, 08:24 Takeshi Yamamuro, 
>>> wrote:
>>>
 +1;

 I checked the links and materials, then I run the tests with
 `-Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pkubernetes
 -Psparkr`
 on macOS (Java 8).
 All the things look fine and I didn't see the error on my env
 that Sean said above.

 Thanks, Dongjoon!

 Bests,
 Takeshi

 On Wed, Jan 15, 2020 at 4:09 AM DB Tsai  wrote:

> +1 Thanks.
>
> Sincerely,
>
> DB Tsai
> --
> Web: https://www.dbtsai.com
> PGP Key ID: 42E5B25A8F7A82C1
>
> On Tue, Jan 14, 2020 at 11:08 AM Sean Owen  wrote:
> >
> > Yeah it's something about the env I spun up, but I don't know what.
> It
> > happens frequently when I test, but not on Jenkins.
> > The Kafka error comes up every now and then and a clean rebuild fixes
> > it, but not in my case. I don't know why.
> > But if nobody else sees it, I'm pretty sure it's just an artifact of
> > the local VM.
> >
> > On Tue, Jan 14, 2020 at 12:57 PM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> > >
> > > Thank you, Sean.
> > >
> > > First of all, the `Ubuntu` job on Amplab Jenkins farm is green.
> > >
> > >
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.4-test-sbt-hadoop-2.7-ubuntu-testing/
> > >
> > > For the failures,
> > >1. Yes, the `HiveExternalCatalogVersionsSuite` flakiness is a
> known one.
> > >2. For `HDFSMetadataLogSuite` failure, I also observed a few
> time before in CentOS too.
> > >3. Kafka build error is new to me. Does it happen on `Maven`
> clean build?
> > >
> > > Bests,
> > > Dongjoon.
> > >
> > >
> > > On Tue, Jan 14, 2020 at 6:40 AM Sean Owen 
> wrote:
> > >>
> > >> +1 from me. I checked sigs/licenses, and built/tested from source
> on
> > >> Java 8 + Ubuntu 18.04 with " -Pyarn -Phive -Phive-thriftserver
> > >> -Phadoop-2.7 -Pmesos -Pkubernetes -Psparkr -Pkinesis-asl". I do
> get
> > >> test failures, but, these are some I have always seen on Ubuntu,
> and I
> > >> do not know why they happen. They don't seem to affect others,
> but,
> > >> let me know if anyone else sees these?
> > >>
> > >>
> > >> Always happens for me:
> > >>
> > >> - HDFSMetadataLog: metadata directory collision *** FAILED ***
> > >>   The await method on Waiter timed out.
> (HDFSMetadataLogSuite.scala:178)
> > >>
> > >> This one has been flaky at times due to external dependencies:
> > >>
> > >> org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite ***
> ABORTED ***
> > >>   Exception encountered when invoking run on a nested suite -
> > >> spark-submit returned with exit code 1.
> > >>   Command line: './bin/spark-submit' '--name' 'prepare testing
> tables'
> > >> '--master' 'local[2]' '--conf' 'spark.ui.enabled=false' '--conf'
> > >> 'spark.master.rest.enabled=false' '--conf'
> > >>
> 'spark.sql.warehouse.dir=/data/spark-2.4.5/sql/hive/target/tmp/warehouse-c2f762fd-688e-42b7-a822-06823a6bbd98'
> > >> '--conf' 'spark.sql.test.version.index=0' '--driver-java-options'
> > >>
> '-Dderby.system.home=/data/spark-2.4.5/sql/hive/target/tmp/warehouse-c2f762fd-688e-42b7-a822-06823a6bbd98'
> > >> '/data/spark-2.4.5/sql/hive/target/tmp/test7297526474581770293.py'
> > >>
> > >> Kafka doesn't build with this weird error. I tried a clean build

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-15 Thread Hyukjin Kwon
If we go for RC2, we should include both:

https://github.com/apache/spark/pull/27210
https://github.com/apache/spark/pull/27184

just for the sake of being complete and making the maintenance simple.


2020년 1월 16일 (목) 오후 12:38, Wenchen Fan 님이 작성:

> Recently we merged several fixes to 2.4:
> https://issues.apache.org/jira/browse/SPARK-30325   a driver hang issue
> https://issues.apache.org/jira/browse/SPARK-30246   a memory leak issue
> https://issues.apache.org/jira/browse/SPARK-29708   a correctness
> issue(for a rarely used feature, so not merged to 2.4 yet)
>
> Shall we include them?
>
>
> On Wed, Jan 15, 2020 at 9:51 PM Hyukjin Kwon  wrote:
>
>> +1
>>
>> On Wed, 15 Jan 2020, 08:24 Takeshi Yamamuro, 
>> wrote:
>>
>>> +1;
>>>
>>> I checked the links and materials, then I run the tests with
>>> `-Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pkubernetes
>>> -Psparkr`
>>> on macOS (Java 8).
>>> All the things look fine and I didn't see the error on my env
>>> that Sean said above.
>>>
>>> Thanks, Dongjoon!
>>>
>>> Bests,
>>> Takeshi
>>>
>>> On Wed, Jan 15, 2020 at 4:09 AM DB Tsai  wrote:
>>>
 +1 Thanks.

 Sincerely,

 DB Tsai
 --
 Web: https://www.dbtsai.com
 PGP Key ID: 42E5B25A8F7A82C1

 On Tue, Jan 14, 2020 at 11:08 AM Sean Owen  wrote:
 >
 > Yeah it's something about the env I spun up, but I don't know what. It
 > happens frequently when I test, but not on Jenkins.
 > The Kafka error comes up every now and then and a clean rebuild fixes
 > it, but not in my case. I don't know why.
 > But if nobody else sees it, I'm pretty sure it's just an artifact of
 > the local VM.
 >
 > On Tue, Jan 14, 2020 at 12:57 PM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:
 > >
 > > Thank you, Sean.
 > >
 > > First of all, the `Ubuntu` job on Amplab Jenkins farm is green.
 > >
 > >
 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.4-test-sbt-hadoop-2.7-ubuntu-testing/
 > >
 > > For the failures,
 > >1. Yes, the `HiveExternalCatalogVersionsSuite` flakiness is a
 known one.
 > >2. For `HDFSMetadataLogSuite` failure, I also observed a few
 time before in CentOS too.
 > >3. Kafka build error is new to me. Does it happen on `Maven`
 clean build?
 > >
 > > Bests,
 > > Dongjoon.
 > >
 > >
 > > On Tue, Jan 14, 2020 at 6:40 AM Sean Owen 
 wrote:
 > >>
 > >> +1 from me. I checked sigs/licenses, and built/tested from source
 on
 > >> Java 8 + Ubuntu 18.04 with " -Pyarn -Phive -Phive-thriftserver
 > >> -Phadoop-2.7 -Pmesos -Pkubernetes -Psparkr -Pkinesis-asl". I do get
 > >> test failures, but, these are some I have always seen on Ubuntu,
 and I
 > >> do not know why they happen. They don't seem to affect others, but,
 > >> let me know if anyone else sees these?
 > >>
 > >>
 > >> Always happens for me:
 > >>
 > >> - HDFSMetadataLog: metadata directory collision *** FAILED ***
 > >>   The await method on Waiter timed out.
 (HDFSMetadataLogSuite.scala:178)
 > >>
 > >> This one has been flaky at times due to external dependencies:
 > >>
 > >> org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite ***
 ABORTED ***
 > >>   Exception encountered when invoking run on a nested suite -
 > >> spark-submit returned with exit code 1.
 > >>   Command line: './bin/spark-submit' '--name' 'prepare testing
 tables'
 > >> '--master' 'local[2]' '--conf' 'spark.ui.enabled=false' '--conf'
 > >> 'spark.master.rest.enabled=false' '--conf'
 > >>
 'spark.sql.warehouse.dir=/data/spark-2.4.5/sql/hive/target/tmp/warehouse-c2f762fd-688e-42b7-a822-06823a6bbd98'
 > >> '--conf' 'spark.sql.test.version.index=0' '--driver-java-options'
 > >>
 '-Dderby.system.home=/data/spark-2.4.5/sql/hive/target/tmp/warehouse-c2f762fd-688e-42b7-a822-06823a6bbd98'
 > >> '/data/spark-2.4.5/sql/hive/target/tmp/test7297526474581770293.py'
 > >>
 > >> Kafka doesn't build with this weird error. I tried a clean build. I
 > >> think we've seen this before.
 > >>
 > >> [error] This symbol is required by 'method
 > >> org.apache.spark.metrics.MetricsSystem.getServletHandlers'.
 > >> [error] Make sure that term eclipse is in your classpath and check
 for
 > >> conflicting dependencies with `-Ylog-classpath`.
 > >> [error] A full rebuild may help if 'MetricsSystem.class' was
 compiled
 > >> against an incompatible version of org.
 > >> [error] testUtils.sendMessages(topic, data.toArray)
 > >> [error]
 > >>
 > >> On Mon, Jan 13, 2020 at 6:28 AM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:
 > >> >
 > >> > Please vote on releasing the following candidate as Apache Spark
 version 2.4.5.
 > >> >

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-15 Thread Wenchen Fan
Recently we merged several fixes to 2.4:
https://issues.apache.org/jira/browse/SPARK-30325   a driver hang issue
https://issues.apache.org/jira/browse/SPARK-30246   a memory leak issue
https://issues.apache.org/jira/browse/SPARK-29708   a correctness issue(for
a rarely used feature, so not merged to 2.4 yet)

Shall we include them?


On Wed, Jan 15, 2020 at 9:51 PM Hyukjin Kwon  wrote:

> +1
>
> On Wed, 15 Jan 2020, 08:24 Takeshi Yamamuro, 
> wrote:
>
>> +1;
>>
>> I checked the links and materials, then I run the tests with
>> `-Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pkubernetes
>> -Psparkr`
>> on macOS (Java 8).
>> All the things look fine and I didn't see the error on my env
>> that Sean said above.
>>
>> Thanks, Dongjoon!
>>
>> Bests,
>> Takeshi
>>
>> On Wed, Jan 15, 2020 at 4:09 AM DB Tsai  wrote:
>>
>>> +1 Thanks.
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> --
>>> Web: https://www.dbtsai.com
>>> PGP Key ID: 42E5B25A8F7A82C1
>>>
>>> On Tue, Jan 14, 2020 at 11:08 AM Sean Owen  wrote:
>>> >
>>> > Yeah it's something about the env I spun up, but I don't know what. It
>>> > happens frequently when I test, but not on Jenkins.
>>> > The Kafka error comes up every now and then and a clean rebuild fixes
>>> > it, but not in my case. I don't know why.
>>> > But if nobody else sees it, I'm pretty sure it's just an artifact of
>>> > the local VM.
>>> >
>>> > On Tue, Jan 14, 2020 at 12:57 PM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>> > >
>>> > > Thank you, Sean.
>>> > >
>>> > > First of all, the `Ubuntu` job on Amplab Jenkins farm is green.
>>> > >
>>> > >
>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.4-test-sbt-hadoop-2.7-ubuntu-testing/
>>> > >
>>> > > For the failures,
>>> > >1. Yes, the `HiveExternalCatalogVersionsSuite` flakiness is a
>>> known one.
>>> > >2. For `HDFSMetadataLogSuite` failure, I also observed a few time
>>> before in CentOS too.
>>> > >3. Kafka build error is new to me. Does it happen on `Maven`
>>> clean build?
>>> > >
>>> > > Bests,
>>> > > Dongjoon.
>>> > >
>>> > >
>>> > > On Tue, Jan 14, 2020 at 6:40 AM Sean Owen  wrote:
>>> > >>
>>> > >> +1 from me. I checked sigs/licenses, and built/tested from source on
>>> > >> Java 8 + Ubuntu 18.04 with " -Pyarn -Phive -Phive-thriftserver
>>> > >> -Phadoop-2.7 -Pmesos -Pkubernetes -Psparkr -Pkinesis-asl". I do get
>>> > >> test failures, but, these are some I have always seen on Ubuntu,
>>> and I
>>> > >> do not know why they happen. They don't seem to affect others, but,
>>> > >> let me know if anyone else sees these?
>>> > >>
>>> > >>
>>> > >> Always happens for me:
>>> > >>
>>> > >> - HDFSMetadataLog: metadata directory collision *** FAILED ***
>>> > >>   The await method on Waiter timed out.
>>> (HDFSMetadataLogSuite.scala:178)
>>> > >>
>>> > >> This one has been flaky at times due to external dependencies:
>>> > >>
>>> > >> org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite ***
>>> ABORTED ***
>>> > >>   Exception encountered when invoking run on a nested suite -
>>> > >> spark-submit returned with exit code 1.
>>> > >>   Command line: './bin/spark-submit' '--name' 'prepare testing
>>> tables'
>>> > >> '--master' 'local[2]' '--conf' 'spark.ui.enabled=false' '--conf'
>>> > >> 'spark.master.rest.enabled=false' '--conf'
>>> > >>
>>> 'spark.sql.warehouse.dir=/data/spark-2.4.5/sql/hive/target/tmp/warehouse-c2f762fd-688e-42b7-a822-06823a6bbd98'
>>> > >> '--conf' 'spark.sql.test.version.index=0' '--driver-java-options'
>>> > >>
>>> '-Dderby.system.home=/data/spark-2.4.5/sql/hive/target/tmp/warehouse-c2f762fd-688e-42b7-a822-06823a6bbd98'
>>> > >> '/data/spark-2.4.5/sql/hive/target/tmp/test7297526474581770293.py'
>>> > >>
>>> > >> Kafka doesn't build with this weird error. I tried a clean build. I
>>> > >> think we've seen this before.
>>> > >>
>>> > >> [error] This symbol is required by 'method
>>> > >> org.apache.spark.metrics.MetricsSystem.getServletHandlers'.
>>> > >> [error] Make sure that term eclipse is in your classpath and check
>>> for
>>> > >> conflicting dependencies with `-Ylog-classpath`.
>>> > >> [error] A full rebuild may help if 'MetricsSystem.class' was
>>> compiled
>>> > >> against an incompatible version of org.
>>> > >> [error] testUtils.sendMessages(topic, data.toArray)
>>> > >> [error]
>>> > >>
>>> > >> On Mon, Jan 13, 2020 at 6:28 AM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>> > >> >
>>> > >> > Please vote on releasing the following candidate as Apache Spark
>>> version 2.4.5.
>>> > >> >
>>> > >> > The vote is open until January 16th 5AM PST and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> > >> >
>>> > >> > [ ] +1 Release this package as Apache Spark 2.4.5
>>> > >> > [ ] -1 Do not release this package because ...
>>> > >> >
>>> > >> > To learn more about Apache Spark, please see
>>> http://spark.apache.org/
>>> > >> >
>>> > >> > The tag to b

Re: More publicly documenting the options under spark.sql.*

2020-01-15 Thread Nicholas Chammas
So do we want to repurpose
SPARK-30510 as an SQL config refactor?

Alternatively, what’s the smallest step forward I can take to publicly
document partitionOverwriteMode (which was my impetus for looking into this
in the first place)?

2020년 1월 15일 (수) 오전 8:49, Hyukjin Kwon 님이 작성:

> Resending to the dev list for archive purpose:
>
> I think automatically creating a configuration page isn't a bad idea
> because I think we deprecate and remove configurations which are not
> created via .internal() in SQLConf anyway.
>
> I already tried this automatic generation from the codes at SQL built-in
> functions and I'm pretty sure we can do the similar thing for
> configurations as well.
>
> We could perhaps mimic what hadoop does
> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/core-default.xml
>
> On Wed, 15 Jan 2020, 22:46 Hyukjin Kwon,  wrote:
>
>> I think automatically creating a configuration page isn't a bad idea
>> because I think we deprecate and remove configurations which are not
>> created via .internal() in SQLConf anyway.
>>
>> I already tried this automatic generation from the codes at SQL built-in
>> functions and I'm pretty sure we can do the similar thing for
>> configurations as well.
>>
>> We could perhaps mimic what hadoop does
>> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/core-default.xml
>>
>> On Wed, 15 Jan 2020, 10:46 Sean Owen,  wrote:
>>
>>> Some of it is intentionally undocumented, as far as I know, as an
>>> experimental option that may change, or legacy, or safety valve flag.
>>> Certainly anything that's marked an internal conf. (That does raise
>>> the question of who it's for, if you have to read source to find it.)
>>>
>>> I don't know if we need to overhaul the conf system, but there may
>>> indeed be some confs that could legitimately be documented. I don't
>>> know which.
>>>
>>> On Tue, Jan 14, 2020 at 7:32 PM Nicholas Chammas
>>>  wrote:
>>> >
>>> > I filed SPARK-30510 thinking that we had forgotten to document an
>>> option, but it turns out that there's a whole bunch of stuff under
>>> SQLConf.scala that has no public documentation under
>>> http://spark.apache.org/docs.
>>> >
>>> > Would it be appropriate to somehow automatically generate a
>>> documentation page from SQLConf.scala, as Hyukjin suggested on that ticket?
>>> >
>>> > Another thought that comes to mind is moving the config definitions
>>> out of Scala and into a data format like YAML or JSON, and then sourcing
>>> that both for SQLConf as well as for whatever documentation page we want to
>>> generate. What do you think of that idea?
>>> >
>>> > Nick
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-15 Thread Hyukjin Kwon
+1

On Wed, 15 Jan 2020, 08:24 Takeshi Yamamuro,  wrote:

> +1;
>
> I checked the links and materials, then I run the tests with
> `-Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pkubernetes
> -Psparkr`
> on macOS (Java 8).
> All the things look fine and I didn't see the error on my env
> that Sean said above.
>
> Thanks, Dongjoon!
>
> Bests,
> Takeshi
>
> On Wed, Jan 15, 2020 at 4:09 AM DB Tsai  wrote:
>
>> +1 Thanks.
>>
>> Sincerely,
>>
>> DB Tsai
>> --
>> Web: https://www.dbtsai.com
>> PGP Key ID: 42E5B25A8F7A82C1
>>
>> On Tue, Jan 14, 2020 at 11:08 AM Sean Owen  wrote:
>> >
>> > Yeah it's something about the env I spun up, but I don't know what. It
>> > happens frequently when I test, but not on Jenkins.
>> > The Kafka error comes up every now and then and a clean rebuild fixes
>> > it, but not in my case. I don't know why.
>> > But if nobody else sees it, I'm pretty sure it's just an artifact of
>> > the local VM.
>> >
>> > On Tue, Jan 14, 2020 at 12:57 PM Dongjoon Hyun 
>> wrote:
>> > >
>> > > Thank you, Sean.
>> > >
>> > > First of all, the `Ubuntu` job on Amplab Jenkins farm is green.
>> > >
>> > >
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.4-test-sbt-hadoop-2.7-ubuntu-testing/
>> > >
>> > > For the failures,
>> > >1. Yes, the `HiveExternalCatalogVersionsSuite` flakiness is a
>> known one.
>> > >2. For `HDFSMetadataLogSuite` failure, I also observed a few time
>> before in CentOS too.
>> > >3. Kafka build error is new to me. Does it happen on `Maven` clean
>> build?
>> > >
>> > > Bests,
>> > > Dongjoon.
>> > >
>> > >
>> > > On Tue, Jan 14, 2020 at 6:40 AM Sean Owen  wrote:
>> > >>
>> > >> +1 from me. I checked sigs/licenses, and built/tested from source on
>> > >> Java 8 + Ubuntu 18.04 with " -Pyarn -Phive -Phive-thriftserver
>> > >> -Phadoop-2.7 -Pmesos -Pkubernetes -Psparkr -Pkinesis-asl". I do get
>> > >> test failures, but, these are some I have always seen on Ubuntu, and
>> I
>> > >> do not know why they happen. They don't seem to affect others, but,
>> > >> let me know if anyone else sees these?
>> > >>
>> > >>
>> > >> Always happens for me:
>> > >>
>> > >> - HDFSMetadataLog: metadata directory collision *** FAILED ***
>> > >>   The await method on Waiter timed out.
>> (HDFSMetadataLogSuite.scala:178)
>> > >>
>> > >> This one has been flaky at times due to external dependencies:
>> > >>
>> > >> org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite ***
>> ABORTED ***
>> > >>   Exception encountered when invoking run on a nested suite -
>> > >> spark-submit returned with exit code 1.
>> > >>   Command line: './bin/spark-submit' '--name' 'prepare testing
>> tables'
>> > >> '--master' 'local[2]' '--conf' 'spark.ui.enabled=false' '--conf'
>> > >> 'spark.master.rest.enabled=false' '--conf'
>> > >>
>> 'spark.sql.warehouse.dir=/data/spark-2.4.5/sql/hive/target/tmp/warehouse-c2f762fd-688e-42b7-a822-06823a6bbd98'
>> > >> '--conf' 'spark.sql.test.version.index=0' '--driver-java-options'
>> > >>
>> '-Dderby.system.home=/data/spark-2.4.5/sql/hive/target/tmp/warehouse-c2f762fd-688e-42b7-a822-06823a6bbd98'
>> > >> '/data/spark-2.4.5/sql/hive/target/tmp/test7297526474581770293.py'
>> > >>
>> > >> Kafka doesn't build with this weird error. I tried a clean build. I
>> > >> think we've seen this before.
>> > >>
>> > >> [error] This symbol is required by 'method
>> > >> org.apache.spark.metrics.MetricsSystem.getServletHandlers'.
>> > >> [error] Make sure that term eclipse is in your classpath and check
>> for
>> > >> conflicting dependencies with `-Ylog-classpath`.
>> > >> [error] A full rebuild may help if 'MetricsSystem.class' was compiled
>> > >> against an incompatible version of org.
>> > >> [error] testUtils.sendMessages(topic, data.toArray)
>> > >> [error]
>> > >>
>> > >> On Mon, Jan 13, 2020 at 6:28 AM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>> > >> >
>> > >> > Please vote on releasing the following candidate as Apache Spark
>> version 2.4.5.
>> > >> >
>> > >> > The vote is open until January 16th 5AM PST and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> > >> >
>> > >> > [ ] +1 Release this package as Apache Spark 2.4.5
>> > >> > [ ] -1 Do not release this package because ...
>> > >> >
>> > >> > To learn more about Apache Spark, please see
>> http://spark.apache.org/
>> > >> >
>> > >> > The tag to be voted on is v2.4.5-rc1 (commit
>> 33bd2beee5e3772a9af1d782f195e6a678c54cf0):
>> > >> > https://github.com/apache/spark/tree/v2.4.5-rc1
>> > >> >
>> > >> > The release files, including signatures, digests, etc. can be
>> found at:
>> > >> > https://dist.apache.org/repos/dist/dev/spark/v2.4.5-rc1-bin/
>> > >> >
>> > >> > Signatures used for Spark RCs can be found in this file:
>> > >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> > >> >
>> > >> > The staging repository for this release can be found at:
>> > >> >
>> https://repositor

Re: More publicly documenting the options under spark.sql.*

2020-01-15 Thread Hyukjin Kwon
Resending to the dev list for archive purpose:

I think automatically creating a configuration page isn't a bad idea
because I think we deprecate and remove configurations which are not
created via .internal() in SQLConf anyway.

I already tried this automatic generation from the codes at SQL built-in
functions and I'm pretty sure we can do the similar thing for
configurations as well.

We could perhaps mimic what hadoop does
https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/core-default.xml

On Wed, 15 Jan 2020, 22:46 Hyukjin Kwon,  wrote:

> I think automatically creating a configuration page isn't a bad idea
> because I think we deprecate and remove configurations which are not
> created via .internal() in SQLConf anyway.
>
> I already tried this automatic generation from the codes at SQL built-in
> functions and I'm pretty sure we can do the similar thing for
> configurations as well.
>
> We could perhaps mimic what hadoop does
> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/core-default.xml
>
> On Wed, 15 Jan 2020, 10:46 Sean Owen,  wrote:
>
>> Some of it is intentionally undocumented, as far as I know, as an
>> experimental option that may change, or legacy, or safety valve flag.
>> Certainly anything that's marked an internal conf. (That does raise
>> the question of who it's for, if you have to read source to find it.)
>>
>> I don't know if we need to overhaul the conf system, but there may
>> indeed be some confs that could legitimately be documented. I don't
>> know which.
>>
>> On Tue, Jan 14, 2020 at 7:32 PM Nicholas Chammas
>>  wrote:
>> >
>> > I filed SPARK-30510 thinking that we had forgotten to document an
>> option, but it turns out that there's a whole bunch of stuff under
>> SQLConf.scala that has no public documentation under
>> http://spark.apache.org/docs.
>> >
>> > Would it be appropriate to somehow automatically generate a
>> documentation page from SQLConf.scala, as Hyukjin suggested on that ticket?
>> >
>> > Another thought that comes to mind is moving the config definitions out
>> of Scala and into a data format like YAML or JSON, and then sourcing that
>> both for SQLConf as well as for whatever documentation page we want to
>> generate. What do you think of that idea?
>> >
>> > Nick
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: More publicly documenting the options under spark.sql.*

2020-01-15 Thread Hyukjin Kwon
I think automatically creating a configuration page isn't a bad idea
because I think we deprecate and remove configurations which are not
created via .internal() in SQLConf anyway.

I already tried this automatic generation from the codes at SQL built-in
functions and I'm pretty sure we can do the similar thing for
configurations as well.

We could perhaps mimic what hadoop does
https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/core-default.xml

On Wed, 15 Jan 2020, 10:46 Sean Owen,  wrote:

> Some of it is intentionally undocumented, as far as I know, as an
> experimental option that may change, or legacy, or safety valve flag.
> Certainly anything that's marked an internal conf. (That does raise
> the question of who it's for, if you have to read source to find it.)
>
> I don't know if we need to overhaul the conf system, but there may
> indeed be some confs that could legitimately be documented. I don't
> know which.
>
> On Tue, Jan 14, 2020 at 7:32 PM Nicholas Chammas
>  wrote:
> >
> > I filed SPARK-30510 thinking that we had forgotten to document an
> option, but it turns out that there's a whole bunch of stuff under
> SQLConf.scala that has no public documentation under
> http://spark.apache.org/docs.
> >
> > Would it be appropriate to somehow automatically generate a
> documentation page from SQLConf.scala, as Hyukjin suggested on that ticket?
> >
> > Another thought that comes to mind is moving the config definitions out
> of Scala and into a data format like YAML or JSON, and then sourcing that
> both for SQLConf as well as for whatever documentation page we want to
> generate. What do you think of that idea?
> >
> > Nick
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>