[Boston.pm] Passing large complex data structures between process

2013-04-03 Thread David Larochelle
I'm trying to optimize a database driven web crawler and I was wondering if anyone could offer any recommendations for interprocess communications. Currently, the driver process periodically queries a database to get a list of URLs to crawler. It then stores these url's to be downloaded in a

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread Ricker, William
This would be a great topic for a meeting, either a report later on what you found and how you used it, or as a workshop to evaluate your options. [FYI, Schedule for next Tuesday is Federico will update us on his embedded Perl hardware hacking project.] bill@$dayjob #include

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread Anthony Caravello
I've generally had luck with threads in Perl 5.10 and beyond, though if you're sharing variables containing large amounts of data they can be inefficient. Storing your data in Redis is often handy and fast, and Redis::List works nicely as a queuing system, even if you use threads. Hope that

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread David Larochelle
Thanks, I think that Redis would work but the system currently uses Postgresql, I'm looking for something simpler than having to maintain another service. (We're actively looking at NoSQL solutions but that would be a huge rearchitecting of the system.) Is there a clean way to do this with pipes

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread Anthony Caravello
To keep it simple, try threads and Thread::Queue module. That will keep it all 'in-house'. On Wed, Apr 3, 2013 at 11:28 AM, David Larochelle da...@larochelle.namewrote: Thanks, I think that Redis would work but the system currently uses Postgresql, I'm looking for something simpler than

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread Gyepi SAM
On Wed, Apr 03, 2013 at 10:34:17AM -0400, David Larochelle wrote: I'm trying to optimize a database driven web crawler and I was wondering if anyone could offer any recommendations for interprocess communications. Currently, the driver process periodically queries a database to get a list

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread Allen Barriere
Another option that I've used in similar situations: 1. have a process hit the database and generate a storable of the data 2. have multiple crawlers execute and unfreeze the storable into memory 3. do what you need to do with the data, pushing back to the database when necessary. Instead of