Thanks.
Since join will be done in regular basis in short period of time ( let say
20s) do you have any suggestions how to make it faster?
I am thinking of partitioning data set and cache it.
Rendy
On Apr 30, 2015 6:31 AM, "Tathagata Das" wrote:
> Have you taken a look at the join section in t
Have you taken a look at the join section in the streaming programming
guide?
http://spark.apache.org/docs/latest/streaming-programming-guide.html#stream-dataset-joins
On Wed, Apr 29, 2015 at 7:11 AM, Rendy Bambang Junior <
rendy.b.jun...@gmail.com> wrote:
> Let say I have transaction data and v
Let say I have transaction data and visit data
visit
| userId | Visit source | Timestamp |
| A | google ads | 1 |
| A | facebook ads | 2 |
transaction
| userId | total price | timestamp |
| A | 100 | 248384|
| B | 200 | 43298739 |
I want