1. If you use some custom API library, there's a chance to end up with Serialization errors and all, but a normal http REST api would work fine except there could be a bit of performance lag + those api's might limit the number of requests.
2. I would go for this approach, either i will broadcast the ip data or i would cache it in a normal RDD and then i would join it with the stream data. Thanks Best Regards On Tue, Dec 2, 2014 at 8:44 PM, Noam Kfir <[email protected]> wrote: > Hi > > > I'm new to spark streaming. > > I'm currently writing spark streaming application to standardize events > coming from Kinesis. > > As part of the logic, I want to use IP to geo information > library or service. > > My questions: > > 1) If I would use some REST service for this task, do U think it would > cause performance penalty (over using library based solution) > > 2) If I would use a library based solution, I will have to use some local > db file. > What mechanism should I use in order to transfer such db file? a broadcast > variable? > > Tx, Noam. >
