Revise the blocker policy

2020-01-31 Thread Dongjoon Hyun
Hi, All. We discussed the correctness/dataloss policies for two weeks. According to our practice, I want to revise our policy in our website explicitly. - Correctness and data loss issues should be considered Blockers + Correctness and data loss issues should be considered Blockers for their targ

new branch-3.0 jenkins job configs are ready to be deployed...

2020-01-31 Thread shane knapp ☠
...whenever i get the word. :) FWIW they will all be identical to the current group of master builds/tests. shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu

Re: new branch-3.0 jenkins job configs are ready to be deployed...

2020-01-31 Thread Xiao Li
Thank you always, Shane! Xiao On Fri, Jan 31, 2020 at 11:19 AM shane knapp ☠ wrote: > ...whenever i get the word. :) > > FWIW they will all be identical to the current group of master > builds/tests. > > shane > -- > Shane Knapp > Computer Guy / Voice of Reason > UC Berkeley EECS Research / RI

Re: new branch-3.0 jenkins job configs are ready to be deployed...

2020-01-31 Thread Dongjoon Hyun
Thank you, Shane. BTW, we need to enable JDK11 unit run on Python and R. (Currently, it's only tested in PRBuilder.) https://issues.apache.org/jira/browse/SPARK-28900 Today, Thomas and I'm hitting Python UT failure on JDK11 environment in independent PRs. ERROR [32.750s]: test_parameter_acc

Re: new branch-3.0 jenkins job configs are ready to be deployed...

2020-01-31 Thread Dongjoon Hyun
Oops. I found this flaky test fails event in `Hadoop 2.7 with Hive 1.2`. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-1.2/lastCompletedBuild/testReport/pyspark.mllib.tests.test_streaming_algorithms/StreamingLogisticRegression

[DISCUSS] Caching SparkPlan

2020-01-31 Thread Chang Chen
I'd like to start a discussion on caching SparkPlan >From what I benchmark, if sql execution time is less than 1 second, then we cannot ignore the following overheads , especially if we cache data in memory 1. Paring, analysing, optimizing SQL 2. Generating Physical Plan (SparkPlan) 3. G