Mark, great to see you here! Now I trust the platform is in the right
hands. :)
Cheers,
Alexy
On Dec 1, 10:49 pm, John Kalucki jkalu...@gmail.com wrote:
Perhaps someone from Platform could weigh in on this?
In [vulgar] Russian, I'd say it seems Platform retracted its tongue
into a [bodily cavity]. :) Platform, hey! :)
Cheers,
Alexy
Gardenhose apparently returns illegal Unicode, as confirmed by
PostgreSQL and Perl's Encode, a very trusted, high-mileage code. We
surely can trap illegal Unicode errors but need to know whether you're
aware of it, the rationale, and plan of action, if any. -- Alexy
On Nov 21, 5:10 pm, braver
John -- thanks for clarification! Certainly it's the data in
Twitter's database as a whole, not just the Streaming API. One
question is whether you should accept illegal Unicode? Probably it's
a safer thing to do to avoid scaring the clients, but maybe you'd want
to apply some filter before
Well, trends shown on Twitter itself have self-reinforcement effect:
once a trends breaks into the Top 10, it's snowball after that.
Thus, it's not sufficient to just study tweets when identifying
trends. Breaking into the Top 10 is a major event.
Thus I suggest Twitter carefully records when
I don't see anything vulnerable in a reasonably done verification --
e.g., I'll ask you to grep a word in a day you have and tell me the
count. I'll google you, and preferably see you here or on twitter.
Heck, Twitter, I'll pay you guys a $1/day for backup fetch!
Preferably then to the starting
Uf you have thousands of users, do you really have to cook up a
following file with comma-separated say 100,000 user IDs? Should it
all be on one line? Now what happens if we want to drop some and add
some IDs -- do we have to restart and re-upload all that list again?
I see when the curl -d
I'm loading twits into PostgreSQL, and get a few hundreds of errors
for illegal sequence 0x00 in UTF8, e.g. (each leading . is 10,000
gardenhose twits):
.org.postgresql.util.PSQLException: ERROR: invalid byte sequence for
encoding UTF8: 0x00 [loving the weather here in sunny birmingham uk
at the
We've lost gardenhose data 6/28-7/7, if anybody could share it we'd
appreciate it very much! I'm @khrabrov, authorized for it.
Cheers,
Alexy
In designing an SQL schema for statuses as returned by Streaming API,
we need to know the length limits for all strings. Is there a single
table with such lengths, and/or can you guys please specify them here?
Cheers,
Alexy
What percentage of all tweets are replies to others, i.e. contain
@nick? We do research on dialogue and I'd like to get as many
conversations as possible. So far the only reliable way I see to do
it is crawl. Even with the /gardenhose I'm not sure that I'm
capturing enough from each
11 matches
Mail list logo