Yeah I figured it's not something fundamental to the task or Spark. The
error is very odd, never seen that. Do you have a theory on what's going on
there? I don't!

On Fri, Apr 9, 2021 at 10:43 AM Attila Zsolt Piros <
piros.attila.zs...@gmail.com> wrote:

> Hi!
>
> I looked into the code and find a way to improve it.
>
> With the improvement your test runs just fine:
>
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 3.2.0-SNAPSHOT
>       /_/
>
> Using Python version 3.8.1 (default, Dec 30 2020 22:53:18)
> Spark context Web UI available at http://192.168.0.199:4040
> Spark context available as 'sc' (master = local, app id =
> local-1617982367872).
> SparkSession available as 'spark'.
>
> In [1]:     import pyspark
>
> In [2]:
> conf=pyspark.SparkConf().setMaster("local[64]").setAppName("Test1")
>
> In [3]:     sc=pyspark.SparkContext.getOrCreate(conf)
>
> In [4]:     rows=70000
>
> In [5]:     data=list(range(rows))
>
> In [6]:     rdd=sc.parallelize(data,rows)
>
> In [7]:     assert rdd.getNumPartitions()==rows
>
> In [8]:     rdd0=rdd.filter(lambda x:False)
>
> In [9]:     assert rdd0.getNumPartitions()==rows
>
> In [10]:     rdd00=rdd0.coalesce(1)
>
> In [11]:     data=rdd00.collect()
> 21/04/09 17:32:54 WARN TaskSetManager: Stage 0 contains a task of very
> large siz
> e (4729 KiB). The maximum recommended task size is 1000 KiB.
>
> In [12]:     assert data==[]
>
> In [13]:
>
>
> I will create a jira and need to add some unittest before opening the PR.
>
> Best Regards,
> Attila
>
>>

Reply via email to