Waleed,

That really looks like what most people build. Can I ask what is the
"thruput" of your whole system? How long does it take you to fetch all the
feeds in the DB. (BTW, maybe you want to get this conversation out of that
mailing list).

If you want to go with GAE because you think that you'll get good enough
results with that and don't care about maintenance and all that jazz, that's
your decision eventually, but I'd suggest that you at least give a try to
our solution, so you cna benchmark/test (and even use it for free while
we're still in beta :D)

Let me know,

Julien

--
Julien Genestoux,

http://twitter.com/julien51
http://superfeedr.com

+1 (415) 254 7340
+33 (0)9 70 44 76 29
Sent from San Francisco, CA, United States

On Fri, Nov 13, 2009 at 10:21 PM, Waleed Abdulla <[email protected]> wrote:

> Our approach is that "money" should never be an issue. If you can "prove"
>> us that polling is cheaper for you, then : 1- we'll match that, 2- we want
>> to learn how :)
>>
>
> I'm currently doing it for 250K feeds on a Unix box with 8GB RAM. I use the
> SimplePie PHP libraray to parse the feeds, and it does a decent job of
> isolating me from the odds of each format.
>
> I have a php file that picks a few hundred feeds at a time from the db and
> fetches them in a loop. And I run this php file from a cron job that runs
> every 10 minutes. But I run 50 instances of that file at a time (my crontab
> file has 50 copies of the line that runs that php file). That's my poor
> man's approach to multi-threading, but it works. Each feed has an integer
> ID, and I use the mod of the ID and the MOD of the time in a clever equation
> to decide which feeds to pull at each point in time. This allows me to not
> have to keep track of the last time a feed was pulled, which saves quite a
> bit of db access.
>
> It's not a very sophisticated setup, but it works. Effectively, the ongoing
> cost is the cost of renting an 8GB box (about $200/month). I'm reaching the
> limits of what I can do on one box, though. Not because of the polling
> process itself, but mostly due to the size of the data and the disk-swapping
> that accompanies that. So I can either get a bigger box, shard my data, or
> move to the app engine. I prefer the app engine because I've had good
> experience scaling other projects on it in the recent past, and because I
> want to solve the scaling problem once and for all rather than delay it.
>
>
>

Reply via email to