> Does anyone has experience (performance wise and in general) if its a good > idea to call MongoDB from within a LookUpBOLT to check, if the > GPS-coordinate is within or outside of the geofence?
Is there a possibility you could cache the geofence data and do the lookups on the cached entries? If not, you could measure the execution time of the query against the MongoDB using a single query and parallel queries (with your expected parallelism), and calculate what that would mean for your Storm topology throughput. Let's say the query takes 150ms to execute (under a reasonable load), and your calculations would take another 50ms. Round that up to maybe 250ms total processing time for a single gps-coordinate, and you could calculate 4 gps-coordinates a second on a topology with a parallelism of 1. 10 times that with parallelism of 10, naturally. I guess the answer is it depends. How fast are those MongoDB queries, and how many gps-coordinates are you thinking you need to process. I was using MongoDB to persist raw event data on a system with no problems. The event streaming solution would write each incoming event into a MongoDB before any processing was done so that we could query the raw event data in case we needed ad-hoc queries or "replay" some of the event processing at a later date. I forgot exactly how much traffic we put in there during our load tests, probably a a few thousand events per second, if I remember correctly, but MongoDB was never the bottleneck, not even close. -TPP
