Can you reproduce it on master?
I can't reproduce it with the following code:
>>> t2 = sqlContext.range(50).selectExpr("concat('A', id) as id")
>>> t1 = sqlContext.range(10).selectExpr("concat('A', id) as id")
>>> t1.join(t2).where(t1.id == t2.id).explain()
ShuffledHashJoin [id#21], [id#19], Buil
Hi Hao,
Each table is created with the following python code snippet:
data = [{'id': 'A%d'%i, 'value':ceil(random()*10)} for i in range(0,50)]
with open('A.json', 'w+') as output:
json.dump(data, output)
The tables A and B containing 10 and 50 tuples respectively.
In spark shell I type
sq
Hi Gsvic, Can you please provide detail code / steps to reproduce that?
Hao
-Original Message-
From: gsvic [mailto:victora...@gmail.com]
Sent: Monday, October 19, 2015 3:55 AM
To: dev@spark.apache.org
Subject: ShuffledHashJoin Possible Issue
I am doing some experiments with join algorit