subject:"Broadcast join on multiple dataframes"

Re: Broadcast join on multiple dataframes

2016-02-04 Thread Srikanth

Hello, Any pointers on what is causing the optimizer to convert broadcast to shuffle join? This join is with a file that is just 4kb in size. Complete plan --> https://www.dropbox.com/s/apuomw1dg0t1jtc/plan_with_select.txt?dl=0 DAG from UI --> https://www.dropbox.com/s/4xc9d0rdkx2fun8/DAG_with_se

Re: Broadcast join on multiple dataframes

2016-01-29 Thread Srikanth

Micheal, Output of DF.queryExecution is saved to https://www.dropbox.com/s/1vizuwpswza1e3x/plan.txt?dl=0 I don't see anything in this to suggest a switch in strategy. Hopefully you find this helpful. Srikanth On Thu, Jan 28, 2016 at 4:43 PM, Michael Armbrust wrote: > Can you provide the analyz

Re: Broadcast join on multiple dataframes

2016-01-28 Thread Michael Armbrust

Can you provide the analyzed and optimized plans (explain(true)) On Thu, Jan 28, 2016 at 12:26 PM, Srikanth wrote: > Hello, > > I have a use case where one large table has to be joined with several > smaller tables. > I've added broadcast hint for all small tables in the joins. > > val large

Broadcast join on multiple dataframes

2016-01-28 Thread Srikanth

Hello, I have a use case where one large table has to be joined with several smaller tables. I've added broadcast hint for all small tables in the joins. val largeTableDF = sqlContext.read.format("com.databricks.spark.csv") val metaActionDF = sqlContext.read.format("json") val cidOrg