Which Spark version are you using? Can you file a JIRA for this issue?
On Thu, Jun 25, 2015 at 6:35 AM, Peter Rudenko petro.rude...@gmail.com
wrote:
Hi, i have a small but very wide dataset (2000 columns). Trying to
optimize Dataframe pipeline for it, since it behaves very poorly comparing
I'm using spark-1.4.0. Sure will try to make steps to reproduce and file
a JIRA ticket.
Thanks,
Peter Rudenko
On 2015-06-26 11:14, Josh Rosen wrote:
Which Spark version are you using? Can you file a JIRA for this issue?
On Thu, Jun 25, 2015 at 6:35 AM, Peter Rudenko
petro.rude...@gmail.com
Hi, i have a small but very wide dataset (2000 columns). Trying to
optimize Dataframe pipeline for it, since it behaves very poorly
comparing to rdd operation.
With spark.sql.codegen=true it throws StackOverflow:
15/06/25 16:27:16 INFO CacheManager: Partition rdd_12_3 not found,
computing it