FYI - filed bunch of issues for flaky tests in recent CI builds

2019-09-17 Thread Jungtaek Lim
Hi devs,

I've found bunch of test failures (intermittently) in both CI build for
master branch as well as PR builder (only checked for mine) yesterday. I
just filed issues which I observed, but I guess there's more as I only
checked my PR.

https://issues.apache.org/jira/browse/SPARK-29129
https://issues.apache.org/jira/browse/SPARK-29130
https://issues.apache.org/jira/browse/SPARK-29131
https://issues.apache.org/jira/browse/SPARK-29132
https://issues.apache.org/jira/browse/SPARK-29133
https://issues.apache.org/jira/browse/SPARK-29134
https://issues.apache.org/jira/browse/SPARK-29135
https://issues.apache.org/jira/browse/SPARK-29136
https://issues.apache.org/jira/browse/SPARK-29137
https://issues.apache.org/jira/browse/SPARK-29138
https://issues.apache.org/jira/browse/SPARK-29139
https://issues.apache.org/jira/browse/SPARK-29140

Other than that, there're another lots of failures with below message:

java.util.concurrent.ExecutionException: java.lang.IllegalStateException:
> Cannot call methods on a stopped SparkContext.


Even some of them above might be affected as well.

I couldn't check whether these issues have been resolved (not for PR
builder as these failures were yesterday, but for master build) so any
helps are appreciated.

Thanks,
Jungtaek Lim (HeartSaVioR)


Re: Welcoming some new committers and PMC members

2019-09-17 Thread Li Jin
Congrats to all!

On Tue, Sep 17, 2019 at 6:51 PM Bryan Cutler  wrote:

> Congratulations, all well deserved!
>
> On Thu, Sep 12, 2019, 3:32 AM Jacek Laskowski  wrote:
>
>> Hi,
>>
>> What a great news! Congrats to all awarded and the community for voting
>> them in!
>>
>> p.s. I think it should go to the user mailing list too.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://about.me/JacekLaskowski
>> The Internals of Spark SQL https://bit.ly/spark-sql-internals
>> The Internals of Spark Structured Streaming
>> https://bit.ly/spark-structured-streaming
>> The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>>
>> On Tue, Sep 10, 2019 at 2:32 AM Matei Zaharia 
>> wrote:
>>
>>> Hi all,
>>>
>>> The Spark PMC recently voted to add several new committers and one PMC
>>> member. Join me in welcoming them to their new roles!
>>>
>>> New PMC member: Dongjoon Hyun
>>>
>>> New committers: Ryan Blue, Liang-Chi Hsieh, Gengliang Wang, Yuming Wang,
>>> Weichen Xu, Ruifeng Zheng
>>>
>>> The new committers cover lots of important areas including ML, SQL, and
>>> data sources, so it’s great to have them here. All the best,
>>>
>>> Matei and the Spark PMC
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: Welcoming some new committers and PMC members

2019-09-17 Thread Bryan Cutler
Congratulations, all well deserved!

On Thu, Sep 12, 2019, 3:32 AM Jacek Laskowski  wrote:

> Hi,
>
> What a great news! Congrats to all awarded and the community for voting
> them in!
>
> p.s. I think it should go to the user mailing list too.
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> The Internals of Spark SQL https://bit.ly/spark-sql-internals
> The Internals of Spark Structured Streaming
> https://bit.ly/spark-structured-streaming
> The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
> Follow me at https://twitter.com/jaceklaskowski
>
>
>
> On Tue, Sep 10, 2019 at 2:32 AM Matei Zaharia 
> wrote:
>
>> Hi all,
>>
>> The Spark PMC recently voted to add several new committers and one PMC
>> member. Join me in welcoming them to their new roles!
>>
>> New PMC member: Dongjoon Hyun
>>
>> New committers: Ryan Blue, Liang-Chi Hsieh, Gengliang Wang, Yuming Wang,
>> Weichen Xu, Ruifeng Zheng
>>
>> The new committers cover lots of important areas including ML, SQL, and
>> data sources, so it’s great to have them here. All the best,
>>
>> Matei and the Spark PMC
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [build system] weird mvn errors post-cache cleaning

2019-09-17 Thread Dongjoon Hyun
Oh, thank you for fixing that! :)

Bests,
Dongjoon.

On Tue, Sep 17, 2019 at 12:57 PM Shane Knapp  wrote:

> > ah, i found this sucker on amp-jenkins-worker-02:
>
> s/02/06
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [build system] weird mvn errors post-cache cleaning

2019-09-17 Thread Shane Knapp
> ah, i found this sucker on amp-jenkins-worker-02:

s/02/06

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [build system] weird mvn errors post-cache cleaning

2019-09-17 Thread Shane Knapp
ah, i found this sucker on amp-jenkins-worker-02:
-bash-4.1$ cat 
/home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml


  org.apache.spark
  spark-parent_2.12
  
spark-155410

  spark-155410

20190917185916
  

tUpdated>  <--- not good!
  


i rmed that file and hopefully it will repopulate w/o issue.  if not,
i'll kill builds on that worker and wipe all local caches.  again.

On Tue, Sep 17, 2019 at 12:53 PM Shane Knapp  wrote:
>
> that's what i literally just did!  i wiped the .m2, .ivy and
> per-executor sbt caches on all of these machines.
>
> maybe it was just a network burp.
>
> also, it's only 2 builds:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110809/console
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110800/console
>
> i'll keep an eye on things and re-ping the list if i see more failures
> like this.
>
> devs, if you see weird failures like this in your builds reply here
> and i'll take a closer look.
>
> On Tue, Sep 17, 2019 at 12:52 PM Sean Owen  wrote:
> >
> > That's super weird; can you just delete ~/.m2 and let it download the
> > internet again? or at least blow away the downloaded Kafka dir?
> > Turning it on and off, so to speak, often works.
> >
> > On Tue, Sep 17, 2019 at 2:41 PM Shane Knapp  wrote:
> > >
> > > a bunch of the PRB builds are now failing w/various permutations of
> > > the following:
> > >
> > > [ERROR] Failed to execute goal
> > > org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install
> > > (default-cli) on project spark-sql-kafka-0-10_2.12:
> > > ArtifactInstallerException: Failed to install metadata
> > > org.apache.spark:spark-sql-kafka-0-10_2.12/maven-metadata.xml: Could
> > > not parse metadata
> > > /home/jenkins/.m2/repository/org/apache/spark/spark-sql-kafka-0-10_2.12/maven-metadata-local.xml:
> > > in epilog non whitespace content is not allowed but got t (position:
> > > END_TAG seen ...\nt... @13:2) -> [Help 1]
> > >
> > > when looking at the file on this worker:
> > > -bash-4.1$ cat -Avet
> > > /home/jenkins/.m2/repository/org/apache/spark/spark-sql-kafka-0-10_2.12/maven-metadata-local.xml
> > > $
> > > $
> > >   org.apache.spark$
> > >   spark-sql-kafka-0-10_2.12$
> > >   $
> > > spark-346687$
> > > $
> > >   spark-346687$
> > > $
> > > 20190917192919$
> > >   $
> > > $
> > >
> > > (sorry for the terminal font setup, but there are no erroneous control
> > > characters popping up, and -e shows a $ at EOL)
> > >
> > > i'm confused that this is happening.  anyone have any ideas?
> > > --
> > > Shane Knapp
> > > UC Berkeley EECS Research / RISELab Staff Technical Lead
> > > https://rise.cs.berkeley.edu
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >
>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu



-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [build system] weird mvn errors post-cache cleaning

2019-09-17 Thread Shane Knapp
that's what i literally just did!  i wiped the .m2, .ivy and
per-executor sbt caches on all of these machines.

maybe it was just a network burp.

also, it's only 2 builds:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110809/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110800/console

i'll keep an eye on things and re-ping the list if i see more failures
like this.

devs, if you see weird failures like this in your builds reply here
and i'll take a closer look.

On Tue, Sep 17, 2019 at 12:52 PM Sean Owen  wrote:
>
> That's super weird; can you just delete ~/.m2 and let it download the
> internet again? or at least blow away the downloaded Kafka dir?
> Turning it on and off, so to speak, often works.
>
> On Tue, Sep 17, 2019 at 2:41 PM Shane Knapp  wrote:
> >
> > a bunch of the PRB builds are now failing w/various permutations of
> > the following:
> >
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install
> > (default-cli) on project spark-sql-kafka-0-10_2.12:
> > ArtifactInstallerException: Failed to install metadata
> > org.apache.spark:spark-sql-kafka-0-10_2.12/maven-metadata.xml: Could
> > not parse metadata
> > /home/jenkins/.m2/repository/org/apache/spark/spark-sql-kafka-0-10_2.12/maven-metadata-local.xml:
> > in epilog non whitespace content is not allowed but got t (position:
> > END_TAG seen ...\nt... @13:2) -> [Help 1]
> >
> > when looking at the file on this worker:
> > -bash-4.1$ cat -Avet
> > /home/jenkins/.m2/repository/org/apache/spark/spark-sql-kafka-0-10_2.12/maven-metadata-local.xml
> > $
> > $
> >   org.apache.spark$
> >   spark-sql-kafka-0-10_2.12$
> >   $
> > spark-346687$
> > $
> >   spark-346687$
> > $
> > 20190917192919$
> >   $
> > $
> >
> > (sorry for the terminal font setup, but there are no erroneous control
> > characters popping up, and -e shows a $ at EOL)
> >
> > i'm confused that this is happening.  anyone have any ideas?
> > --
> > Shane Knapp
> > UC Berkeley EECS Research / RISELab Staff Technical Lead
> > https://rise.cs.berkeley.edu
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >



-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [build system] weird mvn errors post-cache cleaning

2019-09-17 Thread Sean Owen
That's super weird; can you just delete ~/.m2 and let it download the
internet again? or at least blow away the downloaded Kafka dir?
Turning it on and off, so to speak, often works.

On Tue, Sep 17, 2019 at 2:41 PM Shane Knapp  wrote:
>
> a bunch of the PRB builds are now failing w/various permutations of
> the following:
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install
> (default-cli) on project spark-sql-kafka-0-10_2.12:
> ArtifactInstallerException: Failed to install metadata
> org.apache.spark:spark-sql-kafka-0-10_2.12/maven-metadata.xml: Could
> not parse metadata
> /home/jenkins/.m2/repository/org/apache/spark/spark-sql-kafka-0-10_2.12/maven-metadata-local.xml:
> in epilog non whitespace content is not allowed but got t (position:
> END_TAG seen ...\nt... @13:2) -> [Help 1]
>
> when looking at the file on this worker:
> -bash-4.1$ cat -Avet
> /home/jenkins/.m2/repository/org/apache/spark/spark-sql-kafka-0-10_2.12/maven-metadata-local.xml
> $
> $
>   org.apache.spark$
>   spark-sql-kafka-0-10_2.12$
>   $
> spark-346687$
> $
>   spark-346687$
> $
> 20190917192919$
>   $
> $
>
> (sorry for the terminal font setup, but there are no erroneous control
> characters popping up, and -e shows a $ at EOL)
>
> i'm confused that this is happening.  anyone have any ideas?
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[build system] weird mvn errors post-cache cleaning

2019-09-17 Thread Shane Knapp
a bunch of the PRB builds are now failing w/various permutations of
the following:

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install
(default-cli) on project spark-sql-kafka-0-10_2.12:
ArtifactInstallerException: Failed to install metadata
org.apache.spark:spark-sql-kafka-0-10_2.12/maven-metadata.xml: Could
not parse metadata
/home/jenkins/.m2/repository/org/apache/spark/spark-sql-kafka-0-10_2.12/maven-metadata-local.xml:
in epilog non whitespace content is not allowed but got t (position:
END_TAG seen ...\nt... @13:2) -> [Help 1]

when looking at the file on this worker:
-bash-4.1$ cat -Avet
/home/jenkins/.m2/repository/org/apache/spark/spark-sql-kafka-0-10_2.12/maven-metadata-local.xml
$
$
  org.apache.spark$
  spark-sql-kafka-0-10_2.12$
  $
spark-346687$
$
  spark-346687$
$
20190917192919$
  $
$

(sorry for the terminal font setup, but there are no erroneous control
characters popping up, and -e shows a $ at EOL)

i'm confused that this is happening.  anyone have any ideas?
-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [build system] short (~1hr max) downtime

2019-09-17 Thread Shane Knapp
this is done and jenkins is building again!

On Tue, Sep 17, 2019 at 10:14 AM Shane Knapp  wrote:
>
> i'm going to clean up the spark workspaces on the jenkins workers and
> clear out ivy and maven caches.
>
> this means no new builds will be started as of right now, and current
> builds cancelled.
>
> i'll wait a little bit for some nearly-complete PRB builds to finish,
> and i will restart any other builds that i manually cancel.
>
> shane
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu



-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Thoughts on Spark 3 release, or a preview release

2019-09-17 Thread Matt Cheah
I don’t know if it will be feasible to merge all of SPARK-25299 into Spark 3. 
There are a number of APIs that will be submitted for review, and I wouldn’t 
want to block the release on negotiating these changes, as the decisions we 
make for each API can be pretty involved.

 

Our original plan was to mark every API included in SPARK-25299 as private 
until the entirety was merged, sometime between the release of Spark 3 and 
Spark 3.1. Once the entire API is merged into the codebase, we’d promote all of 
them to Experimental status and ship them in Spark 3.1.

 

So, I’m -1 on blocking the Spark 3 preview release specifically on SPARK-25299.

 

-Matt Cheah

 

From: Xiao Li 
Date: Tuesday, September 17, 2019 at 12:00 AM
To: Erik Erlandson 
Cc: Sean Owen , dev 
Subject: Re: Thoughts on Spark 3 release, or a preview release

 

https://issues.apache.org/jira/browse/SPARK-28264 [issues.apache.org] 
SPARK-28264 Revisiting Python / pandas UDF sounds critical for 3.0 preview 

 

Xiao

 

On Mon, Sep 16, 2019 at 12:22 PM Erik Erlandson  wrote:

 

I'm in favor of adding SPARK-25299 [issues.apache.org] - Use remote storage for 
persisting shuffle data

https://issues.apache.org/jira/browse/SPARK-25299 [issues.apache.org]

 

If that is far enough along to get onto the roadmap.

 

 

On Wed, Sep 11, 2019 at 11:37 AM Sean Owen  wrote:

I'm curious what current feelings are about ramping down towards a
Spark 3 release. It feels close to ready. There is no fixed date,
though in the past we had informally tossed around "back end of 2019".
For reference, Spark 1 was May 2014, Spark 2 was July 2016. I'd expect
Spark 2 to last longer, so to speak, but feels like Spark 3 is coming
due.

What are the few major items that must get done for Spark 3, in your
opinion? Below are all of the open JIRAs for 3.0 (which everyone
should feel free to update with things that aren't really needed for
Spark 3; I already triaged some).

For me, it's:
- DSv2?
- Finishing touches on the Hive, JDK 11 update

What about considering a preview release earlier, as happened for
Spark 2, to get feedback much earlier than the RC cycle? Could that
even happen ... about now?

I'm also wondering what a realistic estimate of Spark 3 release is. My
guess is quite early 2020, from here.



SPARK-29014 DataSourceV2: Clean up current, default, and session catalog uses
SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
SPARK-28588 Build a SQL reference doc
SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
SPARK-28684 Hive module support JDK 11
SPARK-28548 explain() shows wrong result for persisted DataFrames
after some operations
SPARK-28372 Document Spark WEB UI
SPARK-28476 Support ALTER DATABASE SET LOCATION
SPARK-28264 Revisiting Python / pandas UDF
SPARK-28301 fix the behavior of table name resolution with multi-catalog
SPARK-28155 do not leak SaveMode to file source v2
SPARK-28103 Cannot infer filters from union table with empty local
relation table properly
SPARK-28024 Incorrect numeric values when out of range
SPARK-27936 Support local dependency uploading from --py-files
SPARK-27884 Deprecate Python 2 support in Spark 3.0
SPARK-27763 Port test cases from PostgreSQL to Spark SQL
SPARK-27780 Shuffle server & client should be versioned to enable
smoother upgrade
SPARK-27714 Support Join Reorder based on Genetic Algorithm when the #
of joined tables > 12
SPARK-27471 Reorganize public v2 catalog API
SPARK-27520 Introduce a global config system to replace hadoopConfiguration
SPARK-24625 put all the backward compatible behavior change configs
under spark.sql.legacy.*
SPARK-24640 size(null) returns null
SPARK-24702 Unable to cast to calendar interval in spark sql.
SPARK-24838 Support uncorrelated IN/EXISTS subqueries for more operators
SPARK-24941 Add RDDBarrier.coalesce() function
SPARK-25017 Add test suite for ContextBarrierState
SPARK-25083 remove the type erasure hack in data source scan
SPARK-25383 Image data source supports sample pushdown
SPARK-27272 Enable blacklisting of node/executor on fetch failures by default
SPARK-27296 User Defined Aggregating Functions (UDAFs) have a major
efficiency problem
SPARK-25128 multiple simultaneous job submissions against k8s backend
cause driver pods to hang
SPARK-26731 remove EOLed spark jobs from jenkins
SPARK-26664 Make DecimalType's minimum adjusted scale configurable
SPARK-21559 Remove Mesos fine-grained mode
SPARK-24942 Improve cluster resource management with jobs containing
barrier stage
SPARK-25914 Separate projection from grouping and aggregate in logical Aggregate
SPARK-26022 PySpark Comparison with Pandas
SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
SPARK-26221 Improve Spark SQL instrumentation and metrics
SPARK-26425 Add more constraint checks in file streaming source to
avoid checkpoint corruption
SPARK-25843 Redesign 

[build system] short (~1hr max) downtime

2019-09-17 Thread Shane Knapp
i'm going to clean up the spark workspaces on the jenkins workers and
clear out ivy and maven caches.

this means no new builds will be started as of right now, and current
builds cancelled.

i'll wait a little bit for some nearly-complete PRB builds to finish,
and i will restart any other builds that i manually cancel.

shane
-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Thoughts on Spark 3 release, or a preview release

2019-09-17 Thread Xiao Li
https://issues.apache.org/jira/browse/SPARK-28264 SPARK-28264 Revisiting
Python / pandas UDF sounds critical for 3.0 preview

Xiao

On Mon, Sep 16, 2019 at 12:22 PM Erik Erlandson  wrote:

>
> I'm in favor of adding SPARK-25299
>  - Use remote storage
> for persisting shuffle data
> https://issues.apache.org/jira/browse/SPARK-25299
>
> If that is far enough along to get onto the roadmap.
>
>
> On Wed, Sep 11, 2019 at 11:37 AM Sean Owen  wrote:
>
>> I'm curious what current feelings are about ramping down towards a
>> Spark 3 release. It feels close to ready. There is no fixed date,
>> though in the past we had informally tossed around "back end of 2019".
>> For reference, Spark 1 was May 2014, Spark 2 was July 2016. I'd expect
>> Spark 2 to last longer, so to speak, but feels like Spark 3 is coming
>> due.
>>
>> What are the few major items that must get done for Spark 3, in your
>> opinion? Below are all of the open JIRAs for 3.0 (which everyone
>> should feel free to update with things that aren't really needed for
>> Spark 3; I already triaged some).
>>
>> For me, it's:
>> - DSv2?
>> - Finishing touches on the Hive, JDK 11 update
>>
>> What about considering a preview release earlier, as happened for
>> Spark 2, to get feedback much earlier than the RC cycle? Could that
>> even happen ... about now?
>>
>> I'm also wondering what a realistic estimate of Spark 3 release is. My
>> guess is quite early 2020, from here.
>>
>>
>>
>> SPARK-29014 DataSourceV2: Clean up current, default, and session catalog
>> uses
>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>> SPARK-28588 Build a SQL reference doc
>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>> SPARK-28684 Hive module support JDK 11
>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>> after some operations
>> SPARK-28372 Document Spark WEB UI
>> SPARK-28476 Support ALTER DATABASE SET LOCATION
>> SPARK-28264 Revisiting Python / pandas UDF
>> SPARK-28301 fix the behavior of table name resolution with multi-catalog
>> SPARK-28155 do not leak SaveMode to file source v2
>> SPARK-28103 Cannot infer filters from union table with empty local
>> relation table properly
>> SPARK-28024 Incorrect numeric values when out of range
>> SPARK-27936 Support local dependency uploading from --py-files
>> SPARK-27884 Deprecate Python 2 support in Spark 3.0
>> SPARK-27763 Port test cases from PostgreSQL to Spark SQL
>> SPARK-27780 Shuffle server & client should be versioned to enable
>> smoother upgrade
>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the #
>> of joined tables > 12
>> SPARK-27471 Reorganize public v2 catalog API
>> SPARK-27520 Introduce a global config system to replace
>> hadoopConfiguration
>> SPARK-24625 put all the backward compatible behavior change configs
>> under spark.sql.legacy.*
>> SPARK-24640 size(null) returns null
>> SPARK-24702 Unable to cast to calendar interval in spark sql.
>> SPARK-24838 Support uncorrelated IN/EXISTS subqueries for more operators
>> SPARK-24941 Add RDDBarrier.coalesce() function
>> SPARK-25017 Add test suite for ContextBarrierState
>> SPARK-25083 remove the type erasure hack in data source scan
>> SPARK-25383 Image data source supports sample pushdown
>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by
>> default
>> SPARK-27296 User Defined Aggregating Functions (UDAFs) have a major
>> efficiency problem
>> SPARK-25128 multiple simultaneous job submissions against k8s backend
>> cause driver pods to hang
>> SPARK-26731 remove EOLed spark jobs from jenkins
>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>> SPARK-21559 Remove Mesos fine-grained mode
>> SPARK-24942 Improve cluster resource management with jobs containing
>> barrier stage
>> SPARK-25914 Separate projection from grouping and aggregate in logical
>> Aggregate
>> SPARK-26022 PySpark Comparison with Pandas
>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>> SPARK-26425 Add more constraint checks in file streaming source to
>> avoid checkpoint corruption
>> SPARK-25843 Redesign rangeBetween API
>> SPARK-25841 Redesign window function rangeBetween API
>> SPARK-25752 Add trait to easily whitelist logical operators that
>> produce named output from CleanupAliases
>> SPARK-23210 Introduce the concept of default value to schema
>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
>> aggregate
>> SPARK-25531 new write APIs for data source v2
>> SPARK-25547 Pluggable jdbc connection factory
>> SPARK-20845 Support specification of column names in INSERT INTO
>> SPARK-24417 Build and Run Spark on JDK11
>> SPARK-24724 Discuss necessary info and access in barrier mode + 

Re: Ask for ARM CI for spark

2019-09-17 Thread Tianhua huang
@shane knapp  thank you very much, I opened an issue
for this https://issues.apache.org/jira/browse/SPARK-29106, we can tall the
details in it :)
And we will prepare an arm instance today and will send the info to your
email later.

On Tue, Sep 17, 2019 at 4:40 AM Shane Knapp  wrote:

> @Tianhua huang  sure, i think we can get
> something sorted for the short-term.
>
> all we need is ssh access (i can provide an ssh key), and i can then have
> our jenkins master launch a remote worker on that instance.
>
> instance setup, etc, will be up to you.  my support for the time being
> will be to create the job and 'best effort' for everything else.
>
> this should get us up and running asap.
>
> is there an open JIRA for jenkins/arm test support?  we can move the
> technical details about this idea there.
>
> On Sun, Sep 15, 2019 at 9:03 PM Tianhua huang 
> wrote:
>
>> @Sean Owen  , so sorry to reply late, we had a
>> Mid-Autumn holiday:)
>>
>> If you hope to integrate ARM CI to amplab jenkins, we can offer the arm
>> instance, and then the ARM job will run together with other x86 jobs, so
>> maybe there is a guideline to do this? @shane knapp 
>> would you help us?
>>
>> On Thu, Sep 12, 2019 at 9:36 PM Sean Owen  wrote:
>>
>>> I don't know what's involved in actually accepting or operating those
>>> machines, so can't comment there, but in the meantime it's good that you
>>> are running these tests and can help report changes needed to keep it
>>> working with ARM. I would continue with that for now.
>>>
>>> On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <
>>> huangtianhua...@gmail.com> wrote:
>>>
 Hi all,

 For the whole work process of spark ARM CI, we want to make 2 things
 clear.

 The first thing is:
 About spark ARM CI, now we have two periodic jobs, one job[1] based on
 commit[2](which already fixed the replay tests failed issue[3], we made a
 new test branch based on date 09-09-2019), the other job[4] based on spark
 master.

 The first job we test on the specified branch to prove that our ARM CI
 is good and stable.
 The second job checks spark master every day, then we can find whether
 the latest commits affect the ARM CI. According to the build history and
 result, it shows that some problems are easier to find on ARM like
 SPARK-28770 , and
 it also shows that we would make efforts to trace and figure them out, till
 now we have found and fixed several problems[5][6][7], thanks everyone of
 the community :). And we believe that ARM CI is very necessary, right?

 The second thing is:
 We plan to run the jobs for a period of time, and you can see the
 result and logs from 'build history' of the jobs console, if everything
 goes well for one or two weeks could community accept the ARM CI? or how
 long the periodic jobs to run then our community could have enough
 confidence to accept the ARM CI? As you suggested before, it's good to
 integrate ARM CI to amplab jenkins, we agree that and we can donate the ARM
 instances and then maintain the ARM-related test jobs together with
 community, any thoughts?

 Thank you all!

 [1]
 http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
 [2]
 https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
 [3] https://issues.apache.org/jira/browse/SPARK-28770
 [4]
 http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
 [5] https://github.com/apache/spark/pull/25186
 [6] https://github.com/apache/spark/pull/25279
 [7] https://github.com/apache/spark/pull/25673



 On Fri, Aug 16, 2019 at 11:24 PM Sean Owen  wrote:

> Yes, I think it's just local caching. After you run the build you
> should find lots of stuff cached at ~/.m2/repository and it won't download
> every time.
>
> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo 
> wrote:
>
>> Hi Sean,
>> Thanks for reply. And very apologize for making you confused.
>> I know the dependencies will be downloaded from SBT or Maven. But the
>> Spark QA job also exec "mvn clean package", why the log didn't print
>> "downloading some jar from Maven central [1] and build very fast. Is the
>> reason that Spark Jenkins build the Spark jars in the physical machiines
>> and won't destrory the test env after job is finished? Then the other job
>> build Spark will get the dependencies jar from the local cached, as the
>> previous jobs exec "mvn package", those dependencies had been downloaded
>> already on local worker machine. Am I right? Is that the reason the job
>> log[1] didn't print any downloading information from Maven Central?
>>
>> Thank you very much.
>>
>> [1]
>>