date:20180810

Re: [R] discuss: removing lint-r checks for old branches

2018-08-10 Thread shane knapp

ugh... R unit tests failed on both of these builds. https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94583/artifact/R/target/ https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94584/artifact/R/target/ On Fri, Aug 10, 2018 at 1:58 PM, Shivaram Venkataraman <

Re: [R] discuss: removing lint-r checks for old branches

2018-08-10 Thread shane knapp

/agreemsg On Fri, Aug 10, 2018 at 4:02 PM, Sean Owen wrote: > Seems OK to proceed with shutting off lintr, as it was masking those. > > On Fri, Aug 10, 2018 at 6:01 PM shane knapp wrote: > >> ugh... R unit tests failed on both of these builds. >> https://amplab.cs.berkeley.edu/jenkins//job/ >>

Re: [R] discuss: removing lint-r checks for old branches

2018-08-10 Thread Sean Owen

Seems OK to proceed with shutting off lintr, as it was masking those. On Fri, Aug 10, 2018 at 6:01 PM shane knapp wrote: > ugh... R unit tests failed on both of these builds. > > https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94583/artifact/R/target/ > > https://amplab.cs.b

Re: [R] discuss: removing lint-r checks for old branches

2018-08-10 Thread Shivaram Venkataraman

Sounds good to me as well. Thanks Shane. Shivaram On Fri, Aug 10, 2018 at 1:40 PM Reynold Xin wrote: > > SGTM > > On Fri, Aug 10, 2018 at 1:39 PM shane knapp wrote: >> >> https://issues.apache.org/jira/browse/SPARK-25089 >> >> basically since these branches are old, and there will be a greater t

Re: [R] discuss: removing lint-r checks for old branches

2018-08-10 Thread Reynold Xin

SGTM On Fri, Aug 10, 2018 at 1:39 PM shane knapp wrote: > https://issues.apache.org/jira/browse/SPARK-25089 > > basically since these branches are old, and there will be a greater than > zero amount of work to get lint-r to pass (on the new ubuntu workers), sean > and i are proposing to remove t

[R] discuss: removing lint-r checks for old branches

2018-08-10 Thread shane knapp

https://issues.apache.org/jira/browse/SPARK-25089 basically since these branches are old, and there will be a greater than zero amount of work to get lint-r to pass (on the new ubuntu workers), sean and i are proposing to remove the lint-r checks for the builds. this is super not important for th

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread shane knapp

> > > I also think it's a good idea to test against newer Python versions. But I > don't know how difficult it is and whether or not it's feasible to resolve > that between branch cut and RC cut. > > unless someone pops in to this thread and tells me w/o a doubt that all spark branches will happil

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread Li Jin

I agree with Byran. If it's acceptable to have another job to test with Python 3.5 and pyarrow 0.10.0, I am leaning towards upgrading arrow. Arrow 0.10.0 has tons of bug fixes and improves from 0.8.0, including important memory leak fixes such as https://issues.apache.org/jira/browse/ARROW-1973. I

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread shane knapp

python 3.5/pyarrow 0.10.0 build: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/ On Fri, Aug 10, 2018 at 10:44 AM, shane knapp wrote: > see: https://github.com/apache/spark/pull/21939#issuecomment-4121543

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread shane knapp

see: https://github.com/apache/spark/pull/21939#issuecomment-412154343 yes, i can set up a build. have some Qs in the PR about building the spark package before running the python tests. On Fri, Aug 10, 2018 at 10:41 AM, Bryan Cutler wrote: > I agree that we should hold off on the Arrow upgra

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread Bryan Cutler

I agree that we should hold off on the Arrow upgrade if it requires major changes to our testing. I did have another thought that maybe we could just add another job to test against Python 3.5 and pyarrow 0.10.0 and keep all current testing the same? I'm not sure how doable that is right now and do

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread shane knapp

On Fri, Aug 10, 2018 at 9:47 AM, Wenchen Fan wrote: > It seems safer to skip the arrow 0.10.0 upgrade for Spark 2.4 and leave it > to Spark 3.0, so that we have more time to test. Any objections? > none here. -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread Wenchen Fan

It seems safer to skip the arrow 0.10.0 upgrade for Spark 2.4 and leave it to Spark 3.0, so that we have more time to test. Any objections? On Fri, Aug 10, 2018 at 11:53 PM shane knapp wrote: > quick update from my end: > > SPARK-24433 (SparkR/k8s) depends on SPARK-25087 (move builds to ubuntu)

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-10 Thread Marco Gaido

Hi Makatun, I think your problem has been solved in https://issues.apache.org/jira/browse/SPARK-16406 which is going to be in Spark 2.4. Please try on the current master, where you should see the problem disappeared. Thanks, Marco 2018-08-09 12:56 GMT+02:00 makatun : > Here are the images missi

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread shane knapp

quick update from my end: SPARK-24433 (SparkR/k8s) depends on SPARK-25087 (move builds to ubuntu) SPARK-23874 (arrow -> 0.10.0) now depends on SPARK-25079 (python 3.5 upgrade) both SPARK-25087 and SPARK-25079 are in progress and i'm very very hesitant to do these upgrades before the code freeze/

Re: [DISCUSS][SQL] Control the number of output files

2018-08-10 Thread Koert Kuipers

we have found that to make shuffles reliable without OOMs we need to have spark.sql.shuffle.partitions at a high number, bigger than 2000 at least. yet this leads to a large amount of part files, which puts big pressure on spark driver programs. i tried to mitigate this with dataframe.coalesce to

Re: [R] discuss: removing lint-r checks for old branches

Re: [R] discuss: removing lint-r checks for old branches

Re: [R] discuss: removing lint-r checks for old branches

Re: [R] discuss: removing lint-r checks for old branches

Re: [R] discuss: removing lint-r checks for old branches

[R] discuss: removing lint-r checks for old branches

Re: code freeze and branch cut for Apache Spark 2.4

Re: code freeze and branch cut for Apache Spark 2.4

Re: code freeze and branch cut for Apache Spark 2.4

Re: code freeze and branch cut for Apache Spark 2.4

Re: code freeze and branch cut for Apache Spark 2.4

Re: code freeze and branch cut for Apache Spark 2.4

Re: code freeze and branch cut for Apache Spark 2.4

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

Re: code freeze and branch cut for Apache Spark 2.4

Re: [DISCUSS][SQL] Control the number of output files

16 matches

Site Navigation

Mail list logo

Footer information