Re: Spark SQL table Join, one task is taking long

2014-12-04 Thread Veeranagouda Mukkanagoudar
Have you tried joins on regular RDD instead of schemaRDD? We have found that its 10 times faster than joins between schemaRDDs. val largeRDD = ... val smallRDD = ... largeRDD.join(smallRDD) // otherway JOIN would run for long. Only limitation i see with that implementation is regular RDD suppor

Re: Spark SQL table Join, one task is taking long

2014-12-04 Thread Venkat Subramanian
Hi Cheng, Thank you very much for taking your time and providing a detailed explanation. I tried a few things you suggested and some more things. The ContactDetail table (8 GB) is the fact table and DAgents is the Dim table (<500 KB), reverse of what you are assuming, but your ideas still apply.

Re: Spark SQL table Join, one task is taking long

2014-12-03 Thread Cheng Lian
Hey Venkat, This behavior seems reasonable. According to the table name, I guess here |DAgents| should be the fact table and |ContactDetails| is the dim table. Below is an explanation of a similar query, you may see |src| as |DAgents| and |src1| as |ContactDetails|. |0: jdbc:hive2://localhos

Re: Spark SQL table Join, one task is taking long

2014-12-02 Thread Venkat Subramanian
Bump up. Michael Armbrust, anybody from Spark SQL team? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-table-Join-one-task-is-taking-long-tp20124p20218.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---