Re: [twitter-dev] Best practice - Stream API into a FILE or MySQL or neither?

2010-01-18 Thread John Kalucki
Writing directly into the database ensures data loss during any sort of
database maintenance, performance degradation, or outage. Writing first to a
log file (or other asynchronous queueing mechanism) allows for
considerable operational flexibility. The wiki sketches the recommended
architecture.

-John Kalucki
http://twitter.com/jkalucki
Infrastructure, Twitter Inc.



On Sat, Jan 16, 2010 at 10:13 AM, GeorgeMedia georgeme...@gmail.com wrote:

 Just looking for thoughts on this.

 I am consuming the gardenhose via a php app on my web server. So far
 so good. The script simply creates a new file every X amount of time
 and starts feeding the stream into it so I get a continuous stream of
 fresh data and I can delete old data via cron. I plan to access the
 stream (files) with separate processes for further json parsing and
 data mining.

 But then that got me to thinking about simply feeding the data into a
 MySQL database for easier data manipulation and indexing. Would that
 cause a more stressful server load with the constant INSERT queries vs
 a process just dumping the data into a file [ via PHP fputs() ] that
 is perpetually open?

 What about simply running the php process and accessing the stream
 directly? Only grabbing a snapshot of the data when a process needs
 it? I'm not really concerned with historical data as my web based app
 is more focused on trends at a given moment. Just wondering out loud
 if simply letting the process run in the background grabbing data
 would eventually fill up any caches or system memory.



[twitter-dev] Best practice - Stream API into a FILE or MySQL or neither?

2010-01-16 Thread GeorgeMedia
Just looking for thoughts on this.

I am consuming the gardenhose via a php app on my web server. So far
so good. The script simply creates a new file every X amount of time
and starts feeding the stream into it so I get a continuous stream of
fresh data and I can delete old data via cron. I plan to access the
stream (files) with separate processes for further json parsing and
data mining.

But then that got me to thinking about simply feeding the data into a
MySQL database for easier data manipulation and indexing. Would that
cause a more stressful server load with the constant INSERT queries vs
a process just dumping the data into a file [ via PHP fputs() ] that
is perpetually open?

What about simply running the php process and accessing the stream
directly? Only grabbing a snapshot of the data when a process needs
it? I'm not really concerned with historical data as my web based app
is more focused on trends at a given moment. Just wondering out loud
if simply letting the process run in the background grabbing data
would eventually fill up any caches or system memory.