er than the default (10mb) and trigger a dynamic partition pruning
> although I can see it may be beneficial to implement dynamic partition
> pruning for broadcast joins as well...
>
>
>> On Dec 4, 2021, at 8:41 AM, Mohamadreza Rostami
>> mailto:mohamadrezarosta...@gmail.com
Hi,
I think it’s because of locality time out. In streaming tasks you must decrease
the locality time out.
Sent from my iPhone
> On Jun 20, 2021, at 11:55 PM, Siva Tarun Ponnada wrote:
>
>
> Hi Team,
> I have a spark streaming job which I am running in a single node
> cluster. I
What kind of benchmark do you need to take? I mean, you want to benchmark Spark
many to many joins, or you want to benchmark another aspect of spark or
cluster? (such as network or disk)
If you want only to take a many-to-many join, you can use cross join or
repartitioning the data with another
I see a bug in executer memory allocation in the standalone cluster, but I
can't find which part of the spark code causes this problem. That why's I
decided to raise this issue here.
Assume you have 3 workers with 10 CPU cores and 10 Gigabyte memories. Assume
also you have 2 spark jobs that run
I have a Hadoop cluster that uses Apache Spark to query parquet files saved on
Hadoop. For example, when i'm using the following PySpark code to find a word
in parquet files:
df = spark.read.parquet("hdfs://test/parquets/*")
df.filter(df['word'] == "jhon").show()
After running this code, I go to