Re: [VOTE] Apache Spark 2.2.0 (RC5)

2017-06-20 Thread Michael Armbrust
I will kick off the voting with a +1. On Tue, Jun 20, 2017 at 4:49 PM, Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.2.0. The vote is open until Friday, June 23rd, 2017 at 18:00 PST and > passes if a majority of at

[VOTE] Apache Spark 2.2.0 (RC5)

2017-06-20 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Friday, June 23rd, 2017 at 18:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because

[build system] rolling back R to working version

2017-06-20 Thread shane knapp
i accidentally updated R during the system update, and will be rolling everything back to the known working versions. again, i'm really sorry about this. our jenkins is old, and the new ubuntu one is almost ready to go. i really can't wait to shut down the centos boxes... they're old and

Total memory tracking: request for comments

2017-06-20 Thread Jose Soltren
https://issues.apache.org/jira/browse/SPARK-21157 Hi - often times, Spark applications are killed for overrunning available memory by YARN, Mesos, or the OS. In SPARK-21157, I propose a design for grabbing and reporting "total memory" usage for Spark executors - that is, memory usage as visible

Re: appendix

2017-06-20 Thread Wenchen Fan
you should make hbase a data source(seems we already have hbase connector?), create a dataframe from hbase, and do join in Spark SQL. > On 21 Jun 2017, at 10:17 AM, sunerhan1...@sina.com wrote: > > Hello, > My scenary is like this: > 1.val df=hivecontext/carboncontex.sql("sql") >

Re: [build system] rolling back R to working version

2017-06-20 Thread shane knapp
this is done... i backported R to 3.1.1 and reinstalled all the R packages so we're starting w/a clean slate. the workers are all restarted, and i re-triggered as many PRBs as i could find. i'll check in first thing in the morning (PDT) and see how things are going. shane On Tue, Jun 20, 2017

Re: Re: appendix

2017-06-20 Thread sunerhan1...@sina.com
you should make hbase a data source(seems we already have hbase connector?), create a dataframe from hbase, and do join in Spark SQL. > On 21 Jun 2017, at 10:17 AM, sunerhan1...@sina.com wrote: > > Hello, > My scenary is like this: > 1.val df=hivecontext/carboncontex.sql("sql") >

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-20 Thread Xiao Li
Found another bug about the case preserving of column names of persistent views. This regression was introduced in 2.2. https://issues.apache.org/jira/browse/SPARK-21150 Thanks, Xiao 2017-06-19 8:03 GMT-07:00 Liang-Chi Hsieh : > > I mean it is not a bug has been fixed before

Re: [build system] rolling back R to working version

2017-06-20 Thread Felix Cheung
Thanks Shane! From: shane knapp Sent: Tuesday, June 20, 2017 9:23:57 PM To: dev Subject: Re: [build system] rolling back R to working version this is done... i backported R to 3.1.1 and reinstalled all the R packages so we're starting w/a

Re: dataframe mappartitions problem

2017-06-20 Thread Wenchen Fan
`Dataset.mapPartitions` takes `func: Iterator[T] => Iterator[U]`, which means, spark need to deserialize the internal binary format to type `T`, and this deserialization is costly. If you do need to do some hack, you can use the internal API: `Dataset.queryExecution.toRdd.mapPartitions`, which

Re: Output Committers for S3

2017-06-20 Thread Steve Loughran
> On 20 Jun 2017, at 07:49, sririshindra wrote: > > Is there anything similar to s3 connector for Google cloud storage? > Since Google cloud Storage is also an object store rather than a file > system, I imagine the same problem that the s3 connector is trying to solve >

[build system] [fixed] system update broke symlink for pypy-2.5.1, PRB builds failing

2017-06-20 Thread shane knapp
this is currently fixed, but did cause PRB failures this afternoon. i'll go retrigger as many as i can as penance. :\ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Output Committers for S3

2017-06-20 Thread Steve Loughran
On 19 Jun 2017, at 16:55, Ryan Blue > wrote: I agree, the problem is that Spark is trying to be safe and avoid the direct committer. We also modify Spark to avoid its logic. We added a property that causes Spark to always use the

Re: [build system] immediate emergency updates and reboot to deal w/stack clash vulnerability

2017-06-20 Thread shane knapp
i have to apologize in advance, but it looks like we're going to have to do an emergency restart of jenkins -- we have two zombie jobs that aren't timing out and they're blocking new builds for those projects from starting. i've put jenkins in to quiet mode, and will do a restart in ~30 mins to

Re: [build system] immediate emergency updates and reboot to deal w/stack clash vulnerability

2017-06-20 Thread shane knapp
(hopefully this is my last email on this subject...) jenkins is back up. the ray and alluxio-master builds have been de-zombified and are happily building (as well as everything else). :) shane On Tue, Jun 20, 2017 at 12:27 PM, shane knapp wrote: > i have to apologize in

Re: [build system] immediate emergency updates and reboot to deal w/stack clash vulnerability

2017-06-20 Thread shane knapp
ok, the centos packages have been released. i've put jenkins in to quiet mode, and will be updating rpms and rebooting ASAP. updates as they come. shane On Mon, Jun 19, 2017 at 2:43 PM, shane knapp wrote: > i've updated the two ubuntu workers (amp-jenkins-staging-01 and

Re: [build system] immediate emergency updates and reboot to deal w/stack clash vulnerability

2017-06-20 Thread shane knapp
and we're back up and building! On Tue, Jun 20, 2017 at 8:23 AM, shane knapp wrote: > ok, the centos packages have been released. i've put jenkins in to > quiet mode, and will be updating rpms and rebooting ASAP. > > updates as they come. > > shane > > On Mon, Jun 19, 2017