>
> Our approach is that "money" should never be an issue. If you can "prove"
> us that polling is cheaper for you, then : 1- we'll match that, 2- we want
> to learn how :)
>

I'm currently doing it for 250K feeds on a Unix box with 8GB RAM. I use the
SimplePie PHP libraray to parse the feeds, and it does a decent job of
isolating me from the odds of each format.

I have a php file that picks a few hundred feeds at a time from the db and
fetches them in a loop. And I run this php file from a cron job that runs
every 10 minutes. But I run 50 instances of that file at a time (my crontab
file has 50 copies of the line that runs that php file). That's my poor
man's approach to multi-threading, but it works. Each feed has an integer
ID, and I use the mod of the ID and the MOD of the time in a clever equation
to decide which feeds to pull at each point in time. This allows me to not
have to keep track of the last time a feed was pulled, which saves quite a
bit of db access.

It's not a very sophisticated setup, but it works. Effectively, the ongoing
cost is the cost of renting an 8GB box (about $200/month). I'm reaching the
limits of what I can do on one box, though. Not because of the polling
process itself, but mostly due to the size of the data and the disk-swapping
that accompanies that. So I can either get a bigger box, shard my data, or
move to the app engine. I prefer the app engine because I've had good
experience scaling other projects on it in the recent past, and because I
want to solve the scaling problem once and for all rather than delay it.

Reply via email to