RE: Implement customized Join for SparkSQL

Cheng, Hao Mon, 05 Jan 2015 03:35:13 -0800

Can you paste the error log?

From: Dai, Kevin [mailto:yun...@ebay.com]
Sent: Monday, January 5, 2015 6:29 PM
To: user@spark.apache.org
Subject: Implement customized Join for SparkSQL


Hi, All

Suppose I want to join two tables A and B as follows:

Select * from A join B on A.id = B.id

A is a file while B is a database which indexed by id and I wrapped it by Data 
source API.
The desired join flow is:

1.      Generate A's RDD[Row]

2.      Generate B's RDD[Row] from A by using A's id and B's data source api to 
get row from the database

3.      Merge these two RDDs to the final RDD[Row]

However it seems existing join strategy doesn't support it?

Any way to achieve it?

Best Regards,
Kevin.

RE: Implement customized Join for SparkSQL

Reply via email to