Hi all,
I have an iterative algorithm in spark that uses each iteration as the
input for the following one, but the size of the data does not change. I am
using localCheckpoint to cut the data's lineage (and also facilitate some
computations that reuse df-s). However, this runs slower and slower a
v(df_clust, df_F), self.step_r(
df_clust, df_F)
df_clust = df_r.join(df_v, "id")
return (df_clust, self.dt)
Regards,
Kalin
On Fri, Dec 4, 2020 at 1:59 PM Kalin Stoyanov wrote:
> Hi all,
>
> I have an iterative algorithm in spark that uses each iteration a
Hi all,
I noticed something a bit strange.. When working with a cached DF, the SQL
query details graph starts from when the cache takes place, and doesn't
show the transformations before it. For example this code
>>> df = sc.parallelize([[1,2,3],[1,4,5]]).toDF(['id','a','b'])
>>> renameCols = [f"
transformations cached and spark run
> only transformations that write after the cache. This is the meaning of the
> cache in Spark.
>
> On Farvardin 26, 1400 AP, at 17:24, Kalin Stoyanov
> wrote:
>
> Hi all,
>
> I noticed something a bit strange.. When working with a ca
put [3]: [id#131, a#132, b#133]
> Arguments: 21
>
>
> HTH,
>
>
> Mich
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
Hi all,
First of all let me say that I am pretty new to Spark so this could be
entirely my fault somehow...
I noticed this when I was running a job on an amazon emr cluster with Spark
2.4.4, and it got done slower than when I had ran it locally (on Spark
2.4.1). I checked out the event logs, and t
eed to talk with them instead of posting questions
> in the Apache Spark community.
>
> Cheers,
>
> Xiao
>
> Kalin Stoyanov 于2020年1月15日周三 上午9:53写道:
>
>> Hi all,
>>
>> First of all let me say that I am pretty new to Spark so this could be
>> entirely
eries should hit such a
> major performance regression. Also, please try the 3.0 preview releases.
>
> Thanks,
>
> Xiao
>
> Kalin Stoyanov 于2020年1月15日周三 上午10:53写道:
>
>> Hi Xiao,
>>
>> Thanks, I didn't know that. This
>> https://aws.amazon.com/about-