Have you taken a look at the join section in the streaming programming


On Wed, Apr 29, 2015 at 7:11 AM, Rendy Bambang Junior <
rendy.b.jun...@gmail.com> wrote:

> Let say I have transaction data and visit data
> visit
> | userId | Visit source | Timestamp |
> | A      | google ads   | 1         |
> | A      | facebook ads | 2         |
> transaction
> | userId | total price | timestamp |
> | A      | 100         | 248384    |
> | B      | 200         | 43298739  |
> I want to join transaction data and visit data to do sales attribution. I
> want to do it realtime whenever transaction occurs (streaming).
> Is it scalable to do join between one data and very big historical data
> using join function in spark? If it is not, then how it usually be done?
> Visit needs to be historical, since visit can be anytime before
> transaction (e.g. visit is one year before transaction occurs)
> Rendy

Reply via email to