Re: Shrinking the DataFrame lineage

2016-06-12 Thread Joseph Bradley
GraphFrames? Suppose, I want to >> use aggregateMessages in the iterative loop, for implementing PageRank. >> >> >> >> Best regards, Alexander >> >> >> >> *From:* Joseph Bradley [mailto:jos...@databricks.com] >> *Sent:* Friday, May 13, 20

Re: Shrinking the DataFrame lineage

2016-05-15 Thread Hamel Kothari
to > use aggregateMessages in the iterative loop, for implementing PageRank. > > > > Best regards, Alexander > > > > *From:* Joseph Bradley [mailto:jos...@databricks.com] > *Sent:* Friday, May 13, 2016 12:38 PM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apach

RE: Shrinking the DataFrame lineage

2016-05-13 Thread Ulanov, Alexander
, May 13, 2016 12:38 PM To: Ulanov, Alexander Cc: dev@spark.apache.org Subject: Re: Shrinking the DataFrame lineage Here's a JIRA for it: https://issues.apache.org/jira/browse/SPARK-13346 I don't have a great method currently, but hacks can get around it: convert the DataFrame to an RD

Re: Shrinking the DataFrame lineage

2016-05-13 Thread Joseph Bradley
Here's a JIRA for it: https://issues.apache.org/jira/browse/SPARK-13346 I don't have a great method currently, but hacks can get around it: convert the DataFrame to an RDD and back to truncate the query plan lineage. Joseph On Wed, May 11, 2016 at 12:46 PM, Ulanov, Alexander < alexander.ula...@h

Shrinking the DataFrame lineage

2016-05-11 Thread Ulanov, Alexander
Dear Spark developers, Recently, I was trying to switch my code from RDDs to DataFrames in order to compare the performance. The code computes RDD in a loop. I use RDD.persist followed by RDD.count to force Spark compute the RDD and cache it, so that it does not need to re-compute it on each it