Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Michael Armbrust
So, it looks like SPARK-21085 has been fixed and SPARK-21093 is not a regression. Last call before I cut RC5. On Wed, Jun 14, 2017 at 2:28 AM, Hyukjin Kwon wrote: > Actually, I opened - https

Re: Performance regression for partitioned parquet data

2017-06-14 Thread Mike Wheeler
I might have a similar problem: in the spark-shell: val data = spark.read.parquet("...") after hitting enter, it takes more than 30 seconds for the "read" to complete and return the command line. I am running Spark 2.1.1. But I have also tested it on 2.0.2 and encountered the same issue. thanks,

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093. 2017-06-14 17:08 GMT+09:00 Hyukjin Kwon : > For a shorter reproducer ... > > > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > > And r

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
For a shorter reproducer ... df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) collect(gapply(df, "a", function(key, x) { x }, schema(df))) And running the below multiple times (5~7): collect(gapply(df, "a", function(key, x) { x }, schema(df))) looks occasionally throwi

Re: Can I use ChannelTrafficShapingHandler to control the network read/write speed in shuffle?

2017-06-14 Thread Niu Zhaojie
Hi Shixiong: Thanks for the reply. You are right. It seems it only supports the following two types. I will retry by adding FileRegion type. protected long calculateSize(Object msg) { if (msg instanceof ByteBuf) { return ((ByteBuf) msg).readableBytes(); } if (msg instanceof B

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
Per https://github.com/apache/spark/tree/v2.1.1, 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs. I messed it up a bit while downgrading the R to 3.3.3 (It was an actual machine not a VM) so it took me a while to re-try this. I re-built this again and checked the R version is 3.3.3 at least. I hop

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Felix Cheung
Thanks! Will try to setup RHEL/CentOS to test it out _ From: Nick Pentreath mailto:nick.pentre...@gmail.com>> Sent: Tuesday, June 13, 2017 11:38 PM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: Felix Cheung mailto:felixcheun...@hotmail.com>>, Hyukjin Kwon mailto:gur