[twitter-dev] Re: illegal unicode character \uffff

2009-12-04 Thread braver
Mark, great to see you here! Now I trust the platform is in the right hands. :) Cheers, Alexy

[twitter-dev] Re: illegal unicode character \uffff

2009-12-03 Thread braver
On Dec 1, 10:49 pm, John Kalucki jkalu...@gmail.com wrote: Perhaps someone from Platform could weigh in on this? In [vulgar] Russian, I'd say it seems Platform retracted its tongue into a [bodily cavity]. :) Platform, hey! :) Cheers, Alexy

[twitter-dev] Re: illegal unicode character \uffff

2009-12-01 Thread braver
Gardenhose apparently returns illegal Unicode, as confirmed by PostgreSQL and Perl's Encode, a very trusted, high-mileage code. We surely can trap illegal Unicode errors but need to know whether you're aware of it, the rationale, and plan of action, if any. -- Alexy On Nov 21, 5:10 pm, braver

[twitter-dev] Re: illegal unicode character \uffff

2009-12-01 Thread braver
John -- thanks for clarification! Certainly it's the data in Twitter's database as a whole, not just the Streaming API. One question is whether you should accept illegal Unicode? Probably it's a safer thing to do to avoid scaring the clients, but maybe you'd want to apply some filter before

[twitter-dev] Re: historical trends

2009-11-05 Thread braver
Well, trends shown on Twitter itself have self-reinforcement effect: once a trends breaks into the Top 10, it's snowball after that. Thus, it's not sufficient to just study tweets when identifying trends. Breaking into the Top 10 is a major event. Thus I suggest Twitter carefully records when

[twitter-dev] Re: The Gardenhose Cooperative

2009-07-22 Thread braver
I don't see anything vulnerable in a reasonably done verification -- e.g., I'll ask you to grep a word in a day you have and tell me the count. I'll google you, and preferably see you here or on twitter. Heck, Twitter, I'll pay you guys a $1/day for backup fetch! Preferably then to the starting

[twitter-dev] updating follow/shadow/birddog list of users

2009-07-08 Thread braver
Uf you have thousands of users, do you really have to cook up a following file with comma-separated say 100,000 user IDs? Should it all be on one line? Now what happens if we want to drop some and add some IDs -- do we have to restart and re-upload all that list again? I see when the curl -d

[twitter-dev] Illegal byte sequence 0x00 in UTF8

2009-07-08 Thread braver
I'm loading twits into PostgreSQL, and get a few hundreds of errors for illegal sequence 0x00 in UTF8, e.g. (each leading . is 10,000 gardenhose twits): .org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding UTF8: 0x00 [loving the weather here in sunny birmingham uk at the

[twitter-dev] catching up with gardenhose

2009-07-07 Thread braver
We've lost gardenhose data 6/28-7/7, if anybody could share it we'd appreciate it very much! I'm @khrabrov, authorized for it. Cheers, Alexy

[twitter-dev] length limits for all fields

2009-06-18 Thread braver
In designing an SQL schema for statuses as returned by Streaming API, we need to know the length limits for all strings. Is there a single table with such lengths, and/or can you guys please specify them here? Cheers, Alexy

[twitter-dev] all conversations

2009-06-14 Thread braver
What percentage of all tweets are replies to others, i.e. contain @nick? We do research on dialogue and I'd like to get as many conversations as possible. So far the only reliable way I see to do it is crawl. Even with the /gardenhose I'm not sure that I'm capturing enough from each