Hey Guys,

I'm the author of Phirehose so thought I'd jump in with some quick
thoughts -

Russ, you're spot on - what you describe there is exactly how
Phirehose is designed to be used.

For both of your guys use cases, the number of tweets seems to be very
low (ie: ~200/week), in which case, to be completely honest, you could
_probably_ just decode/process the tweets inline in the
enqueueStatus() method.

The reason this is highly discouraged (as you mention, Russ) is that
if you ever did get a sudden peak (ie: the term you're tracking became
a trending topic) you could end up getting a backlog in your stream
which can cause you to become disconnected - To quote the twitter wiki
on an API client:

"...should be isolated from any subsequent downstream processing
backlog or maintenance, otherwise queuing will occur in the Streaming
API. Eventually your client will be disconnected, resulting in data
loss..."

maestrojed: Although I disclaim my examples as not being "production
ready", this is simply because I have not tested it extensively, but I
have no reason to believe it would not work. There are quite a few
people using it in production, ie: Thomas here:
http://groups.google.com/group/phirehose-users/msg/af2c6eb424d7e117
who has it processing 1000 tweets per minute quite happily.

If you prefer not to work with flat files you could definitely insert
straight into a database. The reason why flat files are nice (for
single threaded applications) is that you're relying on the disk only.
A modern hard drive can easily write 40MB/s which should be far higher
than any twitter stream you're likely to encounter. Conversely, if you
write to a DB, there are other things to consider (row/table locks,
concurrency, etc).

To be completely honest, the clauses above are overly paranoid -
unless you're processing large amounts of tweets per second you
probably don't need to worry about this sort of thing, however you
should always remember that twitter is growing rapidly and what may be
a "quiet stream" today could be a raging torrent tomorrow.

Cheers!

  Fenn.

On Jan 27, 10:03 am, phptek <theruss....@gmail.com> wrote:
> I too am using Phirehose for a similar small no. of tweets.
>
> The general idea with streaming is not to process stuff live "on the
> wire" but to parse it into files or a DB and then further process this
> data from some additional code.
>
> As an example: I am looking to get geocoded tweets for specific areas
> in New Zealand which I can do with Phirehose. The idea with libs like
> Phirehose and others is to use it as a base for your own further work.
> So what I will do (as soon as I figure out why I'm getting blank lines
> out of curl) is perhaps to modify the enqueueStatus() function in
> filter-track-geo.php and instead o f using print_r() and printing
> everything to the screen, I will create my  own routing to instert the
> tweet-data into a DB.
>
> I will then likely create a new Phirehose method (or procedural code
> first!) to query the DB and post-process the tweets for specific
> keywords.
>
> I do it in this 2-stage process 1). because inserting into a DB first
> gives you some 'buffer' from issues that the API states can arise in a
> stream and 2). AFAIK you can't concatenate parameters together like
> 'location' and 'track' as they are logically "Or'd" together (probably
> due to the potential for the API to easily overload twitter's servers.
>
> Does any of that make sense? I haven't written any code as yet so
> can't give any concrete example, but I'm assuming you are at least
> familiar with PHP?
>
> Cheers
> Russ
>
> On Jan 27, 7:21 am, maestrojed <maestro...@highfivefriday.com> wrote:
>
> > For a project I want to collect all tweets containing a keyword and
> > store them in a database. I have built this functionality using the
> > search api but was missing tweets. I was told to switch to the
> > streaming API. You can see that post 
> > here:http://groups.google.com/group/twitter-development-talk/browse_thread...
>
> > Although using the search API was straight forward, even easy, I am
> > quite out of my league with this streaming API. In fact, I have never
> > worked with streaming data at all. I have read the documentation and
> > the only 2 examples I could find anywhere on the internet. There is a
> > PHP library called phirehosehttp://code.google.com/p/phirehose/but
> > it only helps with the streaming connection. I am still at a lose as
> > to how to process the tweets.. The example included with the phirehose
> > library is the most complete example I could find but it states that
> > it is not ready for production. This example writes all the tweets to
> > a flat file which I guess can then be parsed and stored in a db. Is
> > this a necessary step? Could one go straight to the db? Can anyone
> > help me out? Of course any example of production ready code would be
> > amazingly cool but realistically if someone can point out what issues
> > need to be addressed in this code that could help a lot too.
> > Here is that example:http://pastebin.com/fe677e00
>
> > Maybe I am asking for too much. I know I probably am. I just would
> > love to have these tweets stored in my db and know that as it is now I
> > don't have the knowledge to confidently release code into production.
> > BTW, the keyword I am targeting is not very popular, ~200 tweets a
> > week and this project is just to indefinitely store these tweets and
> > display them on a web page. i am not building a client or anything
> > like that. Its a fairly small project.
>
> > This was the other example I worked on and although it worked, the
> > author states it is lacking some 
> > necessities:http://blog.corunet.com/twitter-alerts-using-twitter-streaming-api/

Reply via email to