Can you reproduce it on master? I can't reproduce it with the following code:
>>> t2 = sqlContext.range(50).selectExpr("concat('A', id) as id") >>> t1 = sqlContext.range(10).selectExpr("concat('A', id) as id") >>> t1.join(t2).where(t1.id == t2.id).explain() ShuffledHashJoin [id#21], [id#19], BuildRight TungstenExchange hashpartitioning(id#21,200) TungstenProject [concat(A,cast(id#20L as string)) AS id#21] Scan PhysicalRDD[id#20L] TungstenExchange hashpartitioning(id#19,200) TungstenProject [concat(A,cast(id#18L as string)) AS id#19] Scan PhysicalRDD[id#18L] >>> t1.join(t2).where(t1.id == t2.id).count() 10 On Mon, Oct 19, 2015 at 2:59 AM, gsvic <victora...@gmail.com> wrote: > Hi Hao, > > Each table is created with the following python code snippet: > > data = [{'id': 'A%d'%i, 'value':ceil(random()*10)} for i in range(0,50)] > with open('A.json', 'w+') as output: > json.dump(data, output) > > The tables A and B containing 10 and 50 tuples respectively. > > In spark shell I type > > sqlContext.setConf("spark.sql.planner.sortMergeJoin", "false") to disable > sortMergeJoin and > sqlContext.setConf("spark.sql.autoBroadcastJoinThreshold", "0") to disable > BroadcastHashJoin, cause the tables are too small and this join will be > selected. > > Finally I run the following query: > t1.join(t2).where(t1("id").equalTo(t2("id"))).count > > and the result I get equals to zero, while ShuffledHashJoin and > SortMergeJoin returns the right result (10). > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/ShuffledHashJoin-Possible-Issue-tp14672p14682.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org