Have you taken a look at the join section in the streaming programming
guide?

http://spark.apache.org/docs/latest/streaming-programming-guide.html#stream-dataset-joins

On Wed, Apr 29, 2015 at 7:11 AM, Rendy Bambang Junior <
rendy.b.jun...@gmail.com> wrote:

> Let say I have transaction data and visit data
>
> visit
> | userId | Visit source | Timestamp |
> | A      | google ads   | 1         |
> | A      | facebook ads | 2         |
>
> transaction
> | userId | total price | timestamp |
> | A      | 100         | 248384    |
> | B      | 200         | 43298739  |
>
> I want to join transaction data and visit data to do sales attribution. I
> want to do it realtime whenever transaction occurs (streaming).
>
> Is it scalable to do join between one data and very big historical data
> using join function in spark? If it is not, then how it usually be done?
>
> Visit needs to be historical, since visit can be anytime before
> transaction (e.g. visit is one year before transaction occurs)
>
> Rendy
>

Reply via email to