Re: Thoughts on Cloudpickle Update

2018-01-15 Thread Hyukjin Kwon
Hi Bryan, Yup, I support to match the version. I pushed it forward before to match it with https://github.com/cloudpipe/cloudpickle before few times in Spark's copy and also cloudpickle itself with few fixes. I believe our copy is closest to 0.4.1. I have been trying to follow up the changes in c

Thoughts on Cloudpickle Update

2018-01-15 Thread Bryan Cutler
Hi All, I've seen a couple issues lately related to cloudpickle, notably https://issues.apache.org/jira/browse/SPARK-22674, and would like to get some feedback on updating the version in PySpark which should fix these issues and allow us to remove some workarounds. Spark is currently using a fork

Re: Broken SQL Visualization?

2018-01-15 Thread Wenchen Fan
Hi, thanks for reporting, can you include the steps to reproduce this bug? On Tue, Jan 16, 2018 at 7:07 AM, Ted Yu wrote: > Did you include any picture ? > > Looks like the picture didn't go thru. > > Please use third party site. > > Thanks > > Original message > From: Tomasz G

Re: Broken SQL Visualization?

2018-01-15 Thread Ted Yu
Did you include any picture ? Looks like the picture didn't go thru. Please use third party site.  Thanks Original message From: Tomasz Gawęda Date: 1/15/18 2:07 PM (GMT-08:00) To: dev@spark.apache.org, u...@spark.apache.org Subject: Broken SQL Visualization? Hi, today I hav

Broken SQL Visualization?

2018-01-15 Thread Tomasz Gawęda
Hi, today I have updated my test cluster to current Spark master, after that my SQL Visualization page started to crash with following error in JS: [cid:part1.DB2FB812.D25D60D1@outlook.com] Screenshot was cut for readability and to hide internal server names ;) It may be caused by upgrade or b

Re: Join Strategies

2018-01-15 Thread Herman van Hövell tot Westerflier
Hey Marco, A Cartesian product is an inner join by definition :). The current cartesian product operator does not support outer joins, so we use the only operator that does: BroadcastNestedLoopJoinExec. This is far from great, and it does have the potential to OOM, there are some safety nets in th

Limit the block size of data received by spring streaming receiver

2018-01-15 Thread Xilang Yan
Hey, We use a customize receiver to receive data from our MQ. We used to use def store(dataItem: T) to store data however I found the block size can be very different from 0.5K to 5M size. So that data partition processing time is very different. Shuffle is an option, but I want to avoid it.

Re: Inner join with the table itself

2018-01-15 Thread Jacek Laskowski
Hi Michael, -dev +user What's the query? How do you "fool spark"? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/ma

Inner join with the table itself

2018-01-15 Thread Michael Shtelma
Hi all, If I try joining the table with itself using join columns, I am getting the following error: "Join condition is missing or trivial. Use the CROSS JOIN syntax to allow cartesian products between these relations.;" This is not true, and my join is not trivial and is not a real cross join. I