Writing directly into the database ensures data loss during any sort of database maintenance, performance degradation, or outage. Writing first to a log file (or other asynchronous queueing mechanism) allows for considerable operational flexibility. The wiki sketches the recommended architecture.
-John Kalucki http://twitter.com/jkalucki Infrastructure, Twitter Inc. On Sat, Jan 16, 2010 at 10:13 AM, GeorgeMedia <georgeme...@gmail.com> wrote: > Just looking for thoughts on this. > > I am consuming the gardenhose via a php app on my web server. So far > so good. The script simply creates a new file every X amount of time > and starts feeding the stream into it so I get a continuous stream of > fresh data and I can delete old data via cron. I plan to access the > stream (files) with separate processes for further json parsing and > data mining. > > But then that got me to thinking about simply feeding the data into a > MySQL database for easier data manipulation and indexing. Would that > cause a more stressful server load with the constant INSERT queries vs > a process just dumping the data into a file [ via PHP fputs() ] that > is perpetually open? > > What about simply running the php process and accessing the "stream" > directly? Only grabbing a snapshot of the data when a process needs > it? I'm not really concerned with historical data as my web based app > is more focused on trends at a given moment. Just wondering out loud > if simply letting the process run in the background grabbing data > would eventually fill up any caches or system memory. >