Thanks Jörn, sounds like there's nothing obvious I'm missing, which is
encouraging.
I've not used Redis, but it does seem that for most of my current and
likely future use-cases it would be the best fit (nice compromise of scale
and easy setup / access).
Thanks,
Tom
On Wed, Sep 14, 2016 at 10:0
Hmm is it just a lookup and the values are small? I do not think that in this
case redis needs to be installed on each worker node. Redis has a rather
efficient protocol. Hence one or a few dedicated redis nodes probably fit your
purpose more then needed. Just try to reuse connections and do not
Hi all,
Interested in patterns people use in the wild for lookup against reference
data sets from a Spark streaming job. The reference dataset will be updated
during the life of the job (although being 30mins out of date wouldn't be
an issue, for example).
So far I have come up with a few options