Re: [Boston.pm] Passing large complex data structures between process

2013-04-05 Thread Ben Tilly
Pro tip. I've seen both push based systems and pull based systems at work. The push based systems tend to break whenever the thing that you're pushing to has problems. Pull-based systems tend to be much more reliable in my experience. You have described a push-based system. I would therefore

Re: [Boston.pm] Passing large complex data structures between process

2013-04-05 Thread John Redford
Ben Tilly emitted: Pro tip. I've seen both push based systems and pull based systems at work. The push based systems tend to break whenever the thing that you're pushing to has problems. Pull-based systems tend to be much more reliable in my experience. [...] If you disregard this tip,

Re: [Boston.pm] Passing large complex data structures between process

2013-04-05 Thread Anthony Caravello
Queuing systems aren't really new or 'technofrippery'. In-memory FIFO stacks are ridiculously fast compared to transaction safe rdbms' for this simple purpose. Databases incur a lot of overhead for wonderful things that don't aid this cause. This isn't magic, sometimes it's just the right tool

Re: [Boston.pm] Passing large complex data structures between process

2013-04-05 Thread Ben Tilly
On Fri, Apr 5, 2013 at 12:04 PM, John Redford eire...@hotmail.com wrote: Ben Tilly emitted: Pro tip. I've seen both push based systems and pull based systems at work. The push based systems tend to break whenever the thing that you're pushing to has problems. Pull-based systems tend to be

Re: [Boston.pm] Passing large complex data structures between process

2013-04-05 Thread John Redford
Anthony Caravello writes: Queuing systems aren't really new or 'technofrippery'. In-memory FIFO stacks are ridiculously fast compared to transaction safe rdbms' for this simple purpose. Databases incur a lot of overhead for wonderful things that don't aid this cause. No one said queuing

Re: [Boston.pm] Passing large complex data structures between process

2013-04-05 Thread John Redford
Ben Tilly expands: On Fri, Apr 5, 2013 at 12:04 PM, John Redford eire...@hotmail.com wrote: Your writing is FUD. Are you reading something into what I wrote that wasn't there? Because I'm pretty sure that what I wrote isn't FUD. It was. Ask anyone. I'm not your English tutor. A

Re: [Boston.pm] Passing large complex data structures between process

2013-04-05 Thread Anthony Caravello
I bow to you. I've been on this list for a long time and figured my 20 years of development and engineering experience might be of assistance and for the first time I offered it. From now on, you should answer all the questions. -unsubscribe On Apr 5, 2013 6:05 PM, John Redford

Re: [Boston.pm] Passing large complex data structures between process

2013-04-04 Thread Morse, Richard E.MGH
On Apr 3, 2013, at 10:34 AM, David Larochelle da...@larochelle.name wrote: Currently, the driver process periodically queries a database to get a list of URLs to crawler. It then stores these url's to be downloaded in a complex in memory and pipes them to separate processes that do the actual

Re: [Boston.pm] Passing large complex data structures between process

2013-04-04 Thread David Larochelle
Thanks for all the feedback. I left out a lot of details about the system because I didn't want to complicate things. The purpose of the system is comprehensively study online media. We need the system to run 24 hours a day to download news articles in media sources such as the New York Times. We

Re: [Boston.pm] Passing large complex data structures between process

2013-04-04 Thread Anthony Caravello
This sounds like a perfect fit for a queuing service like RabbitMQ. Logstash uses Redis lists for this as it's simple to setup and pretty reliable, but there are many such applications available. The queue's would allow multiple backend processes to check for and take items as they became

Re: [Boston.pm] Passing large complex data structures between process

2013-04-04 Thread Gyepi SAM
On Thu, Apr 04, 2013 at 04:21:54PM -0400, David Larochelle wrote: My hope is to split the engine process into two pieces that ran in parallel: one to query the database and another to send downloads to fetchers. This way it won't matter how long the db query takes as long as we can get URLs

Re: [Boston.pm] Passing large complex data structures between process

2013-04-04 Thread John Redford
David Larochelle wrote: [...] We're using Postgresql 8.4 and running on Ubuntu. Almost all data is stored in the database. The system contains a list of media sources with associated RSS feeds. We have a downloads table that has all of the URLs that we want to download or have downloaded in

[Boston.pm] Passing large complex data structures between process

2013-04-03 Thread David Larochelle
I'm trying to optimize a database driven web crawler and I was wondering if anyone could offer any recommendations for interprocess communications. Currently, the driver process periodically queries a database to get a list of URLs to crawler. It then stores these url's to be downloaded in a

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread Ricker, William
not_speaking_for_the_firm -Original Message- From: Boston-pm [mailto:boston-pm-bounces+william.ricker=fmr@mail.pm.org] On Behalf Of David Larochelle Sent: Wednesday, April 03, 2013 10:34 AM To: Boston Perl Mongers Subject: [Boston.pm] Passing large complex data structures between process I'm trying

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread Anthony Caravello
To: Boston Perl Mongers Subject: [Boston.pm] Passing large complex data structures between process I'm trying to optimize a database driven web crawler and I was wondering if anyone could offer any recommendations for interprocess communications. Currently, the driver process periodically queries

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread David Larochelle
-pm-bounces+william.ricker= fmr@mail.pm.org] On Behalf Of David Larochelle Sent: Wednesday, April 03, 2013 10:34 AM To: Boston Perl Mongers Subject: [Boston.pm] Passing large complex data structures between process I'm trying to optimize a database driven web crawler and I was wondering

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread Anthony Caravello
:34 AM To: Boston Perl Mongers Subject: [Boston.pm] Passing large complex data structures between process I'm trying to optimize a database driven web crawler and I was wondering if anyone could offer any recommendations for interprocess communications. Currently, the driver process

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread Gyepi SAM
On Wed, Apr 03, 2013 at 10:34:17AM -0400, David Larochelle wrote: I'm trying to optimize a database driven web crawler and I was wondering if anyone could offer any recommendations for interprocess communications. Currently, the driver process periodically queries a database to get a list

Re: [Boston.pm] Passing large complex data structures between process

2013-04-03 Thread Allen Barriere
Another option that I've used in similar situations: 1. have a process hit the database and generate a storable of the data 2. have multiple crawlers execute and unfreeze the storable into memory 3. do what you need to do with the data, pushing back to the database when necessary. Instead of