Re: ShuffledHashJoin Possible Issue

2015-10-19 Thread Davies Liu
Can you reproduce it on master? I can't reproduce it with the following code: >>> t2 = sqlContext.range(50).selectExpr("concat('A', id) as id") >>> t1 = sqlContext.range(10).selectExpr("concat('A', id) as id") >>> t1.join(t2).where(t1.id == t2.id).explain() ShuffledHashJoin [id#21], [id#19], Buil

RE: ShuffledHashJoin Possible Issue

2015-10-19 Thread gsvic
Hi Hao, Each table is created with the following python code snippet: data = [{'id': 'A%d'%i, 'value':ceil(random()*10)} for i in range(0,50)] with open('A.json', 'w+') as output: json.dump(data, output) The tables A and B containing 10 and 50 tuples respectively. In spark shell I type sq

RE: ShuffledHashJoin Possible Issue

2015-10-18 Thread Cheng, Hao
Hi Gsvic, Can you please provide detail code / steps to reproduce that? Hao -Original Message- From: gsvic [mailto:victora...@gmail.com] Sent: Monday, October 19, 2015 3:55 AM To: dev@spark.apache.org Subject: ShuffledHashJoin Possible Issue I am doing some experiments with join algorit