subject:"Streaming \- lookup against reference data"

Re: Streaming - lookup against reference data

2016-09-15 Thread Tom Davis

Thanks Jörn, sounds like there's nothing obvious I'm missing, which is encouraging. I've not used Redis, but it does seem that for most of my current and likely future use-cases it would be the best fit (nice compromise of scale and easy setup / access). Thanks, Tom On Wed, Sep 14, 2016 at 10:0

Re: Streaming - lookup against reference data

2016-09-14 Thread Jörn Franke

Hmm is it just a lookup and the values are small? I do not think that in this case redis needs to be installed on each worker node. Redis has a rather efficient protocol. Hence one or a few dedicated redis nodes probably fit your purpose more then needed. Just try to reuse connections and do not

Streaming - lookup against reference data

2016-09-14 Thread Tom Davis

Hi all, Interested in patterns people use in the wild for lookup against reference data sets from a Spark streaming job. The reference dataset will be updated during the life of the job (although being 30mins out of date wouldn't be an issue, for example). So far I have come up with a few options