from:"Nick"

On Wed, Oct 14, 2009 at 8:38 AM, Kyle B kylebarn...@gmail.com wrote:



 1. How are tweet IDs incremented?  Do they increase by a factor of 1,
 2, 5, 10...?


I've asked that question previously and the answer was a definitive We
aren't telling.  It seems to be considered a significant enough trade
secret that I wouldn't be at all surprised if they are skipping IDs randomly
to prevent people from doing exactly what you're seeking to do.  Nor would I
be surprised if they refuse to say a word about it now.

Short of figuring out an indirect approach, I don't think you'll be able to
come up with an accurate number.

Nick

[twitter-dev] Re: Help estimating tweets per day...

On Wed, Oct 14, 2009 at 3:27 PM, Kyle B kylebarn...@gmail.com wrote:


 Thanks for the info. It helps a lot.  Figuring out an accurate number
 is essential to my model, so much so that I am determined to find some
 method of estimating it to acceptable margins of error!


It occurs to me that perhaps this might not be so hard... and please do
share your results with us.

Just test a good-sized sample of IDs and see how many don't exist.  That
will give you an idea of how many there really are.  I'll be curious to see
if you get consistent results from one day to the next.  I won't be too
surprised to see if you don't, which would mean that Twitter is skipping a
random (or at least somewhat random) number of IDs each day.

However, if you want to continue to know this number, you'll have to
continue to sample.  And your sample might have to span multiple days to get
a reliable answer.

And I hate to say this, because if they're not already doing it, this might
make them start... Twitter could be monitoring for any process that
repeatedly asks for deliberately non-existent IDs, in order to block them,
to maintain the obfuscation.  Then you're stuck again, unless you can find a
way around that defense.

Assuming there are millions of IDs a day, you'll need a pretty good sample
size if you want to maintain a good number.

The good news in all this is that IIRC, Twitter has guaranteed that IDs will
increase chronologically.

The bad news is that I'm writing this off the top of my head and there's
probably an easy defense I haven't thought of, which somebody at Twitter
will think of just because they see this conversation.

Put 'em on double-secret probation, I say.

Nick

[twitter-dev] Re: Help estimating tweets per day...

On Wed, Oct 14, 2009 at 3:56 PM, Scott Haneda talkli...@newgeo.com wrote:


 And you don't think the streaming API will answer that for you?


It can't, can it?  It isn't the complete stream, only a sampled subset.
There's no way to know which IDs were skipped in order to obfuscate the
actual number of tweets.  A missing ID could either just have not been
sampled or not exist.

Nick

[twitter-dev] Re: Help estimating tweets per day...

On Wed, Oct 14, 2009 at 4:10 PM, Nick Arnett nick.arn...@gmail.com wrote:



 On Wed, Oct 14, 2009 at 3:27 PM, Kyle B kylebarn...@gmail.com wrote:


 Thanks for the info. It helps a lot.  Figuring out an accurate number
 is essential to my model, so much so that I am determined to find some
 method of estimating it to acceptable margins of error!



Couple of more thoughts dawned on me.

If the approach I'm suggesting violates the TOS, please realize that it is
not my intention to encourage anybody to violate the TOS.

Second, thinking more evil-like, one way about the kind of defense I
imagined would be to distribute the problem -- find a bunch of people who
would like the same data and coordinate the testing to see what percentage
of IDs actually exist.

Did I just describe a DDOS?  Please, no.

Another possible evil defense -- there's a fake tweet generator at Twitter,
really messing with the statistics; tweets that are ONLY visible to people
who try to retrieve them via IDs that appear nowhere in public.  A
honey-trap, in other words.

I've spent too much time working with intelligence agencies.

Nick

[twitter-dev] Re: At Symbol (@) in Twitter Search

2009-10-08 Thread Nick


Karthik - I'm not sure what you mean by decorated - the names do
hyperlink to the profile pages.  But typically, I think the @ being
found is within the tweet content.  All of these non-english tweets
either have the @ symbol alone, or they contain a mention.

I've looked through a lot of the API docs and online forums, and I saw
that Twitter doesn't really have a wildcard search.  Is that true?
If I want to search posts for mentions or *partial* words, is there
any way to do that?  For instance, if I want to find mentions that
start with @T2*, where * can be anything or nothing...(and I would
find, for example, @T2thenike) is there a way to do that?  This
partial search would be handy for searching hashtags too, so you don't
have to nail the hashtag term exactly

I guess I'm looking for a fuzzier way to search.  Any suggestions?

On Oct 8, 8:20 am, Karthik Murugan fermis...@gmail.com wrote:
 This is strange. Did you also notice that for Non-English tweets
 returned fromhttp://twitter.com/#search?q=%40, the user names are
 decorated with links to their profile pages?

 Well, Twitter doesn't index symbols like @ # $ ^. If you'd like to
 gather the tweets containing references to twitter names, you should
 use the Streaming API to do so, because Search API doesn't treat @ as
 a keyword.

 On Oct 8, 3:18 am, Nick t2then...@gmail.com wrote:

  The atsymbol(@) does not seem to work when searching tweets.  When
  the @symbolis alone (surrounded by nothing or whitespace) in a
  search, the search returns 0 English results.  Sometimes languages
  that use non-ascii characters seem to be found.  However, when the @
  is followed by at least 1 alphanumeric character, the query seems to
  work fine.

  I found this by searching for @ the diner downtown, and I was
  getting back 0 results, but at least 1 results should have been
  returned, because that's what I tweeted.  This probably affects
  searches for mentions too, because you can't just search for @.

  I've used the web site and the search 
  API:http://twitter.com/#search?q=%40http://search.twitter.com/search.atom...

  Has anyone gotten a query to work with @ standing alone?  This there a
  different way to perform the same search?

[twitter-dev] At Symbol (@) in Twitter Search

2009-10-07 Thread Nick


The at symbol (@) does not seem to work when searching tweets.  When
the @ symbol is alone (surrounded by nothing or whitespace) in a
search, the search returns 0 English results.  Sometimes languages
that use non-ascii characters seem to be found.  However, when the @
is followed by at least 1 alphanumeric character, the query seems to
work fine.

I found this by searching for @ the diner downtown, and I was
getting back 0 results, but at least 1 results should have been
returned, because that's what I tweeted.  This probably affects
searches for mentions too, because you can't just search for @.

I've used the web site and the search API:
http://twitter.com/#search?q=%40
http://search.twitter.com/search.atom?q=%40

Has anyone gotten a query to work with @ standing alone?  This there a
different way to perform the same search?

[twitter-dev] Re: how are the ten trends born?

2009-10-02 Thread Nick Arnett

On Fri, Oct 2, 2009 at 1:00 PM, David Fisher tib...@gmail.com wrote:



 For the most part its just a frequency count of words over a short
 time period, minus stop words, filtering out usernames (notice @foo is
 never a trend) and URLs. How it combines Wave OR Google Wave I'm
 unsure of, and then there's some basic spam filtering in there
 additionally.


I hope it isn't that naive -- do you know what they're doing, or are you
speculating?

For one thing, systems that count the unique individuals mentioning a term,
rather than just raw term counts, are far more accurate in predictive
modeling.

Furthermore, Twitter has plenty of data to incorporate traffic and social
network analysis to further improve this buzz analysis.

FYI, I've been doing social network buzz analytics for about ten years and
have some patents in that area (which don't belong to me, but to
Nielsen/Buzzmetrics).

Nick

[twitter-dev] Re: how are the ten trends born?

2009-10-01 Thread Nick Arnett

On Thu, Oct 1, 2009 at 8:20 AM, Martin Dudek goosegoesgro...@gmail.comwrote:


 Good morning

 wonder if somebody knows how twitter determines the ten trends it
 declares every five minutes? Is this a pure word/phrase frequency
 algorithm or some more complexity behind.


I wouldn't expect an answer to that.  I'd bet quit a bit of money that
Twitter considers the algorithm to be a trade secret.  If they disclosed it,
they'd be making it easier for people to manipulate the rankings.

What problem are you trying to solve?

Nick

[twitter-dev] Re: how are the ten trends born?

2009-10-01 Thread Nick Arnett

On Thu, Oct 1, 2009 at 10:05 PM, Martin Dudek goosegoesgro...@gmail.comwrote:


 just curious ...


That can be the most difficult and dangerous problem of all!

Nick

[twitter-dev] Re: About the oneforty application directory

2009-09-28 Thread Nick Arnett

On Mon, Sep 28, 2009 at 3:05 PM, Dewald Pretorius dpr...@gmail.com wrote:


 The other thing that really bugs is me the payment of the 70% in the
 form of a gift or donation. I cannot show that in the Sales Revenue of
 my business. If the amount becomes substantial, how do I explain to
 the tax man why my for-profit incorporated company is getting all
 these gifts and donations? And how do I do the accounting for my
 product units that were sold, but did not generate any top-line
 revenue?


Not sure how it works in other countries, but in the U.S. revenue is revenue
is revenue; most gifts are income to the person who receives them.  Even if
you are a non-profit, if you're making a profit from a substantial part of
your operations, you can end up owing taxes on it, even if you call the
income a gift.  Otherwise, everybody would call everything a gift and nobody
would pay taxes!

The fundamental rule is that when the gift is actually in exchange for
something of value, it is income to the receiver and not deductible as a
donation to the giver.

Nick

[twitter-dev] Re: master thesis related to Twitter

2009-09-27 Thread Nick Arnett

On Sun, Sep 27, 2009 at 5:11 PM, Stefna mstefa...@gmail.com wrote:


 strict formed data - 140 chars, #tag, @username, RT etc. - that's
 why there are so many sites presenting graphs, charts, trends,
 tendencies etc.


I suspect a large part of the answer to that is simply Because they can.
Unlike other large social networks, the data in Twitter is open by default.

I say this partly because I realized that's what grabbed me about Twitter,
much more than the service itself.  I was attracted to the fact that a lot
of social data was easily accessible.

I'm curious if anyone can cite a similarly open, large social network (are
there any?) that hasn't seen much third-party analysis and such.

Nick

[twitter-dev] Re: تم افتتاح منتدى الع جمى سوفت زيارتك شرف لنا

2009-09-19 Thread Nick Arnett

2009/9/19 alagmy hossamala...@gmail.com

 تم افتتاح منتدى العجمى سوفت زيارتك شرف لنا



 زورنا على منتدى العجمى سوفت لتحصل على كل ما هو جديد برامج العاب افلام
 صور

 شاهد بنفسك

 http://alagmy.almslol.net


And for those who were wondering, here's what Google translation yields for
this:

Opened Agami forum should visit an honor for us

Zorna forum Agami should to get all new games movies software
Photos

See for yourself

Sounds like spam to me...

Nick

[twitter-dev] Re: One letter hashtag for search API

2009-09-19 Thread Nick Arnett

On Fri, Sep 18, 2009 at 11:36 AM, Nobu Funaki nob.fun...@gmail.com wrote:


 Hi,

 I'm talking about hashtag.

 I tried to use #A, #B or #C whatever one letter hashtag, sometimes
 couldn't find them in search result. But sometimes I could.

 Could you tell me if there is any certain rules? Perhaps it is
 involved this rule.


I don't know if it's true of Twitter's search, but many search engines treat
single-letter words as stop words - they don't index them.  Many don't index
two-letter words, either, but apparently Twitter is.

Having said all that, I will add that many search engines have become less
aggressive about stop words as computing resources have become less of a
constraint than they used to be.

Nick

[twitter-dev] Re: Update on the Retweet API (we collapse retweets, not you we're adding statuses/retweets)

2009-09-18 Thread Nick Arnett

On Fri, Sep 18, 2009 at 1:57 PM, Marcel Molina mar...@twitter.com wrote:



 Asking developers to collapse retweets in timelines is onerous,
 complicated and confusing. We're not going to do it that way. We are
 going to add a resource that gives you all retweets for a given tweet.
 In timelines you will get only the first retweet. You can then request
 all retweets for that tweet at any time to get up to 100 retweets that
 have been created for it.


Will timelines show if additional retweets exist for each tweet?  Otherwise,
won't we have to make the request for every tweet to find out if there are
others?

Nick

[twitter-dev] Re: Can the Twitter API call me?

2009-09-12 Thread Nick Arnett

On Fri, Sep 11, 2009 at 8:24 AM, Duncan dun...@therecoveryplace.net wrote:


 Does Twitter have something in place where i can build a litener app
 that Twitter can HTTP/POST to when a new follower follows me or
 someone sends me a direct message, etc?


Gnip can do this for Twitter data and there are free accounts available.

Nick

[twitter-dev] Re: Know the number of results only

2009-09-08 Thread Nick Arnett

On Mon, Sep 7, 2009 at 8:33 PM, 8-30 silverfrien...@gmail.com wrote:


 How do I get the number of results for a given search phrase? I don't
 want the results themselves, I just want to know the size of the
 results set for any given phrase. For example jnni hinklebootmurgh
 returns 0 results, where michael jackson returns gazillions. How can
 I get just the total number of results? I thought max_id might have
 something to do with it, but evidently not.


This question has been asked a few times before - it isn't available.

Nick

[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-06 Thread Nick Arnett

On Sun, Sep 6, 2009 at 11:18 AM, Jesse Stay jesses...@gmail.com wrote:

 Thanks John.  I appreciate the various ways of accessing this data, but
 when you guys make updates to any of these, can you either do it in a beta
 environment we can test in first, or earlier in the week?  Where there are
 very few Twitter engineers monitoring these lists during the weekends, and
 we ourselves often have other plans, this really makes for an interesting
 weekend for all of us when changes go into production that break code.  It
 happens, but it would be nice to have this earlier in the week, or in a beta
 environment we can test in.



I think that's probably asking a lot of a company trying to grow as fast as
Twitter.  Graphs are very hard to scale.  Ask anybody who has tried.

Now if the graph weren't dependent on a centralized system

Nick

[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph

2009-09-06 Thread Nick Arnett

On Sun, Sep 6, 2009 at 1:52 PM, Jesse Stay jesses...@gmail.com wrote:

 I don't understand how asking to release features earlier in the week is
 asking a lot?  What does that have to do with scaling social graphs?


I was referring to a beta environment.

Nick

[twitter-dev] Re: Platform downtime is expected

2009-08-17 Thread Nick L.


 Hey All,

   Definitely experiencing OAuth issues as well.  Still in development
on my local machine but I keep getting time out errors upon trying
authenticate Oauth here:

  http://twitter.com/oauth/authenticate

  *Work-A-Round*

  So I could keep things going (and keep coding), my quick workaround
was to sign into twitter.com first, and then let my app authorize
oauth that way. This has worked, but I think its because  I've set my
app to sign-in-with twitter. You can do so on your app config page
on twitter (http://www.twitter.com/apps/). Not sure if this will help
some.

 The big issue is that I know other peoples (and their users) are
still unable to authorize the Oauth tokens, so hopefully that gets
resolved soon.

  Keep up the good work Twitts. Crossing my fingers this isn't an
ongoing issue.

On Aug 16, 6:41 pm, Ryan Sarver rsar...@twitter.com wrote:
 Everyone,

 Please see the updated post on status.twitter.com 
 -http://status.twitter.com/post/164410057/trouble-with-oauth-and-api-c

 We are continuing to assess the issue and will report back when we know more.

 Thanks for your patience, Ryan

 On Sun, Aug 16, 2009 at 2:32 PM, Hwee-Boon Yarhweeb...@gmail.com wrote:

  Can you confirm if OAuth access is the only known issue? I feel silly
  repeating the same question over and over again: Even /
  rate_limit_status calls are timing out on my server. I have no API
  access *at all*.

  --
  Hwee-Boon

  On Aug 17, 5:21 am, Chad Etzel jazzyc...@gmail.com wrote:
  We've asked the keeper-o-the-blog to post something to that effect.
  Hopefully it will appear soon.
  -Chad

  On Sun, Aug 16, 2009 at 4:42 PM, Dewald Pretoriusdpr...@gmail.com wrote:

   Can you at the very least PLEASE publish something on
   status.twitter.com about the API being down and/or very unresponsive
   at times, so that I have a link where I can refer my users, so that
   they can see I am not shitting them?

   Dewald

[twitter-dev] Re: My Issue with the ReTweet API and my solutions

2009-08-17 Thread Nick Arnett

For a long time, I've thought that retweeting was the most interesting thing
about Twitter - and not just explicit retweeting, but also implicit
retweeting (people posting the same URL around the same time, which may or
may not really be an intentional retweet).  I've thought of them as similar
to links in hypertext and like others, I created a site that analyzes
relationships among people by looking at their retweeting patterns.

This may sound odd, but making a user action easier for everyone is not
always a good idea.  An overly simple explanation for this is that the less
effort it is to do something, the less significant the action becomes.  That
doesn't mean that everything should be hard, it means there's an optimal
level of difficulty v. reward in social behavior.  I'd rather not see
Twitter encouraging a particular kind of social connection until the
structures it supports are better understood.  Has anybody really shown the
value of retweeting in creating strong social networks?  If so, it it clear
that the API would tend to further strengthen them?  I fear that the API is
motivated by a more naive assumption - people are doing this anyway, so
let's make it easier.  While that assumption is fine for things like soap,
it isn't right for social behavior.

I've been doing social media analytics for a long time.  One of the things
I'm always trying to measure is how much energy went into a particular user
behavior or action.  For example, a message that contains more original
words took more energy than a shorter one.  A message that quotes more than
one person takes more energy than one that quotes just one person.  A
message that contains a URL probably took more energy than one that
doesn't.  If the URL is unique in the medium, it probably took more energy
to create than a URL that already existed.

If the effect of the retweet API is to make retweeting so simple that the
act of retweeting loses much of its significance, that's a net loss.  More
people might retweet, but less of them will be deeply engaged.  Social
systems should never have the goal of getting everyone to the same level of
engagement.  It is human nature for some to be opinion leaders, but they
don't easily emerge when playing the game is made easy for everyone.
Unfortunately, the idea of getting as many people as possible to be as
active as possible is a deeply engrained habit in the media industry.  But
any successful community manager or analyst will tell you that it is far
more important to pay attention and nurture the core community that exists
in any social network.

The sweet spot for ease of retweeting lies somewhere between it being so
hard that only the most committed users do it (and the current manual method
is far better than that) and being so easy that everybody essenially votes
on everything, which would be bad.  Even though that sounds like democracy,
it is really demarchy.  Seen any successful demarchies?  I didn't think so.

I'm not so sure that Twitter isn't already in the sweet spot and the API is
going to drive it away from there.  I suspect that Twitter and those who
analyze it haven't had enough time to really figure out how it will fit into
the social networking ecosystem in the long run, so any decsions about this
are premature.  I'd rather see them continue make the social network easier
to analyze, not just for the sake of analytics, but because the results of
analytics are getting fed back into the network, which makes the network
smarter and smarter.

[twitter-dev] Trademark infringement

2009-08-12 Thread Nick Arnett

In case anybody wants some decent facts on this issue, Wikipedia has a
pretty good article.

http://en.wikipedia.org/wiki/Trademark_infringement

I'm not a lawyer, but I published a book on IP for developers a number of
years ago and learned a lot in the process.

In the end, it is extraordinarily unlikely that anybody could possibly fail
to infringe when using Twitter and/or their logo in a product or service
name that is based on Twitter.  That inevitably could cause confusion about
whether or not it is from the company Twitter or not.

Infringement has nothing to do with whether or not the infringement is
connected with a good or bad use, such as spamming, etc.

It will be interesting to see what they do with tweet.  Many attorneys
advise that if you want to preserve rights in a mark, you have to use it
only as a proper adjective.  Are they going to grant everybody permission to
use tweet but only that way?  I'll send you a Tweet(tm) message?  Seems
unlikely.  Tweet has become a noun and a verb, which I suspect means it is
fated to become genericized, if it hasn't already.

Nick

[twitter-dev] Re: Following Churn: Specific guidance needed

2009-08-11 Thread Nick Arnett

On Mon, Aug 10, 2009 at 8:31 PM, IDOLpeeps belm...@grandcentralholdings.com
 wrote:

 Lots of community members and developers are leaving Twitter because
 of what appears to them to be arbitrary suspension of accounts they've
 invested considerable time and good citizenship developing only to
 have them removed without notice and oppty to remedy.


I don't mean this argumentatively, but I am curious how you know this?  Is
it possible to quantify?

Always interested in community metrics...

Nick

[twitter-dev] Re: Twitter API is unresponsive, can we crawl the twitter.com website?

2009-08-09 Thread Nick Arnett

On Sun, Aug 9, 2009 at 7:55 AM, Cameron Kaiser spec...@floodgap.com wrote:


  I guess if you crawl like a crawler It should be okay otherwise how a
  newly developed spider would work. But if you crawl like a scraper
  you'll be banned.
  But I am not sure if Twitter can differentiate between them. I mean
  the request pattern. Does twitter check it?

 I'm sure they have ways of determining it internally, and I'm sure they
 won't reveal what those ways are.


They already have, to a certain extent.  It appears that they want up to a
10-second delay, which is an eternity in web crawling.  It limits a crawler
to 8,640 requests a day, which is peanuts, as I'm sure most people here
would realize immediately.

http://twitter.com/robots.txt

#Google Search Engine Robot
User-agent: Googlebot
# Crawl-delay: 10 -- Googlebot ignores crawl-delay ftl
Disallow: /*?
Disallow: /*/with_friends

#Yahoo! Search Engine Robot
User-Agent: Slurp
Crawl-delay: 1
Disallow: /*?
Disallow: /*/with_friends

#Microsoft Search Engine Robot
User-Agent: msnbot
Crawl-delay: 10
Disallow: /*?
Disallow: /*/with_friends

# Every bot that might possibly read and respect this file.
User-agent: *
Disallow: /*?
Disallow: /*/with_friends


What you don't see here is whatever internal list of user-agents and IP
addresses they might block.

Having spent countless days identifying robots on dozens of big consumer and
business sites, I know that it's very hard to eliminate the low-volume ones
that spoof normal web clients.  But any one that starts grabbing a
significant number of pages, or the same page more often than every 24 hours
or so, stands out from the data quickly... and probably gets blocked.  Of
course, that's essentially what the DDoS is doing, except that it probably
also probes for and uses specific vulnerabilities, rather than just making
page requests.

Nick
(who, among other things, was the product manager for the first commercial
web crawler and helped set the robots.txt standard)

[twitter-dev] Re: PubSubHubbub and Twitter RSS

2009-08-09 Thread Nick Arnett

On Sat, Aug 8, 2009 at 9:06 PM, Jesse Stay jesses...@gmail.com wrote:

 I know Twitter has bigger priorities, so if you can put this on your to
 think about list for after the DDoS problems are taken care of, I'd
 appreciate it.  Perhaps this question is for John since it has to do with
 real-time.  Anyway, is there any plan to support the PubSubHubbub protocol
 with Twitter's RSS feeds for users?  I think that could be a great
 alternative to Twitter real-time that's standards compliant and open.  It
 would also make things really easy for me for a project I'm working on.
  Here's the standard in case anyone needs a refresher:

 http://code.google.com/p/pubsubhubbub/

 You guys would rule if you supported this.  It would probably take a bit
 less strain on what you're doing now as well for real-time feeds.  It could
 also reduce repeated polling on RSS.


Couldn't app developers do this on their own, by allowing the user to
configure Also publish to pubsubhubhub server in the app?  There's a
potential revenue stream there for developers - charge a small fee for this
use of the server. That would make the system even more robust, since their
would still be a publishing path even if Twitter were completely down.

Seems to me that there are good reasons for both to exist... and I don't see
why Twitter needs to take the lead on this.  Current Twitter apps are sort
of like email clients that can only talk to one brand of mail server.

To put this another way, I think app developers need to start thinking of it
the way they really are using it - as infrastructure.  Complaining about the
current problem is a bit like a mechanic complaining that an auto parts
store doesn't have a particular part when there are ten other stores that
have it in stock.

Nick

[twitter-dev] Re: The silence is deafening....

2009-08-08 Thread Nick Arnett

Ask me Are we there yet? one more time and I'll turn this car around and
you won't go to Disneyland at all!

;-)

Nick

On Sat, Aug 8, 2009 at 2:39 PM, Dewald Pretorius dpr...@gmail.com wrote:


 tick tock tick tock tick tock tick tock tick tock tick tock

Possibly curmudgeonly thoughts about the DDoS and architecture... (was Re: [twitter-dev] Re: The silence is deafening....)

2009-08-08 Thread Nick Arnett

On Sat, Aug 8, 2009 at 5:40 PM, Dewald Pretorius dpr...@gmail.com wrote:


 Twitter needs to realize that our apps are NOT still down because of
 the ongoing denial-of-service attack. That's a cop-out to blame the
 attack.

 Our apps are still down because they cannot allow known, white-listed
 IP addresses through the defenses.

 And that is why I am getting frustrated, because I have asked multiple
 times months ago that they distinguish between friend and foe, and not
 kill everyone on sight when they are attacked.


What make you think that they can?  What if the DDoS attacks are spoofing
white-listed IP addresses sometimes?  That would totally fit with using 302s
as a response.

It's not a good idea to make assumptions about what they can and cannot do.
For Twitter to have grown as large as it is, I assume that they have some
very competent IT people, who surely are doing the best they can.  Even
though Twitter isn't taking a direct revenue hit on this, I'm sure that they
know that the damage to their reputation could cost them more and more as
this continues.

Hmmm... now does the idea of publishing tweetstreams as distributed RSS
feeds sound more attractive?  If there's a criticism to be leveled, seems to
me it should be at the dependence on a single point of failure, not their
inability to cope with the inevitable sophisticated attack.  DDoS and such
would have a far harder time causing this kind of trouble on a distributed
system.

As I've said before, this isn't really a criticism of Twitter - what they've
created shows the demand for this kind of service.  But imagine if right now
all the dead applications could fall back to reading RSS-published
twitterstreams instead of depending entirely on Twitter for them?

Hope that doesn't sound like I'm taking advantage of a bad situation, but I
really think this points out the serious limitations of their architecture,
not the competence of their IT people.  And no, those aren't the same
things.

Nick

[twitter-dev] Re: Twitpocalypse: The Second Coming is on the horizon

2009-07-31 Thread Nick Arnett

On Fri, Jul 31, 2009 at 3:59 PM, John Adams j...@twitter.com wrote:


 On Jul 31, 2009, at 3:37 PM, Josh Roesslein wrote:

  Well 64 bit should last for a while. Curious how long it will be until 128
 bit will be required.




 Mathematica tells me:
 Fri 24 Sep 58821 22:55:00


Darn it - I was planning to be on vacation that day!

Nick

[twitter-dev] Tool that shows who is using specific tags?

2009-07-19 Thread Nick Arnett

I've searched a bit, but it's hard to write a good query for this one -
anybody know of a tool that will show a list of users who have used a
specific tag?  It would be simple and I'll write it myself if need be.  All
it has to do is search for the tag and then compile and present a list of
unique users.

This occurred to me because I was thinking I'd like to see who is attending
the Community Leadership Summit, but I don't want to have to page through
all the results and manually assemble a list.  It would be extra cool if
there were a tool that did this for Twitter and blogs together.

Anybody know of something like this?

Nick

[twitter-dev] Re: Tool that shows who is using specific tags?

2009-07-19 Thread Nick Arnett

On Sun, Jul 19, 2009 at 6:16 PM, whoiskb whoi...@gmail.com wrote:


 Do you mean something like this?

 http://hashtags.org/


I'm assuming that you're joking... unless there's something there that
returns a unique list of users rather than a list of tweets.

Nick

[twitter-dev] Re: Twitter is not making money

On Sat, Jul 18, 2009 at 12:49 AM, David Fisher tib...@gmail.com wrote:


 Show me these killer companies doing great NLP with social networks. I
 find the ones that are doing stuff right now themselves are far behind
 the curve and not really pushing stuff to the edge. They are often
 marketing companies that have hired one NLP guy (and underpaid them)
 and are just pushing the marketing side. I have yet to see anything
 truly revolutionary come from most of these monitoring companies yet
 and they are all too narrow focused. Plus, none of them have the VC
 funding to really expand and grow (and not many people are getting new
 funding these days)


Who says that a company has to create something truly revolutionary to be
successful?  There are plenty of big successes that got where they are by
packaging and distributing better than anyone else, not with great
breakthroughs.  Heard of Microsoft?

Sentiment analysis, like everything else that depends on computers figuring
out language, isn't great.  Nor is anyone really close to writing software
that comes understands language with context, nuance, etc.  Language isn't
even well understood enough for anyone to write code to emulate it; it is at
the core of human intelligence.  Language in 140 character chunks is
*really* hard.

If you think there are no well-funded, successful companies in this domain,
take a look at Nielsen/Buzzmetrics.  They've been at this for more than 10
years.  They acquired my patents, from a startup where we demonstrated basic
sentiment analysis in 2000 and 2001, showing that our software could rate
the sentiment of Usenet movie reviews with 80 percent accuracy and forecast
box office.

I would love to see more people tackling this kind of problem, but nobody is
likely to succeed if they don't realize what has worked and what hasn't over
the last decade and more.  Intelligence agencies and law enforcement have
used relevant techniques for 20-30 years.  For example, traffic analysis is
fundamental and doesn't require any NLP, just as the NSA is able to identify
command and control centers by their behavior without having to decode a
single encrypted transmission.  The danger of focusing on NLP and other
really hard problems is that you fail to apply known techniques in new ways.

Having said all that, I'll add that a lot of what I saw over the last few
years in social media analytics was pretty eye candy without much behind
it.  If that's all you look at, then yes, it seems quite shallow.  But I
would hope that serious developers know that that's not all there is.  The
systems I've built over the last decade have been based first on traffic
analysis, then social network analysis, and last, text/lingustic analysis...
and to do the latter well, humans were involved in the final summarization
of topics, trends and so forth.

Nick

[twitter-dev] Developers unite – throw off the yok e of Twitter centralization and publish your tweetstreams!

Chuck Shotton's recent Twitter is a prototype comment inspired me to write
a blog post about overcoming the limitations of Twitter's design... I'm
suggesting that Twitter apps should publish their tweetstreams locally or to
a hosted service, as tagged RSS, so that anybody can aggregate, index and
otherwise add value to them... without having to rely exclusively on Twitter
to make the data available.  I'm not saying there isn't a role for Twitter
in the future, but I do believe Chuck hit the nail on the head in terms of
their limitations.

http://www.nickarnett.net/2009/07/18/developers-unite-throw-off-the-yoke-of-twitter-centralization-and-publish-your-tweetstreams/

And now I believe I had better duck.

Nick

[twitter-dev] Re: Developers unite – throw off the yoke of Twitter centralization and publish your tweetstream s!

On Sat, Jul 18, 2009 at 12:36 PM, Andrew Badera and...@badera.us wrote:


 Old news. This topic of conversation has been around since the
 internetworked opensourced clones like laconi.ca started growing in
 popularity.

 I think you missed the point.  What if TweetDeck, for example, by default
also published the user's tweetstream as an RSS feed, letting the user
choose where to publish it?  What if every app did that?  Everybody's
tweetstreams would be distributed on the Internet, rather than centralized
at Twitter.

Before Twitter existed, nobody had the traction to make this happen.  There
wasn't even a place for developers to *talk* about this level of
cooperation.  But now there is, right here.

Does anybody really think that the current centralized model can scale as
fast as the market wants?

Seems to me that it is in the best interests of app developers to work
together toward less dependency on Twitter as a repository.  And even though
it might seem like it is against Twitter's interest to do so, in the long
run I suspect its very survival depends on finding a role in which it
doesn't have to have every tweet on the planet flow through its servers.

Nick

[twitter-dev] Re: Twitter is not making money

On Sat, Jul 18, 2009 at 3:53 PM, M. Edward (Ed) Borasky zzn...@gmail.comwrote:



  And there is
 http://twittersentiment.appspot.com/, a research project by some grad
 students at Stanford. Perhaps you've heard of two other Stanford grad
 students, Sergey Brin and Larry Page?


Gilt by association?

Nick

[twitter-dev] Re: Twitter is not making money

On Sat, Jul 18, 2009 at 4:02 PM, M. Edward (Ed) Borasky zzn...@gmail.comwrote:


 Netflix, even without the contributions of the contest teams, is doing
 pretty well too. ;-)


Different problem - they're aggregating votes, not trying to interpret
language.  Although it is certainly possible that some of the competitors
are using third-party sources and linguistic analysis... I thought briefly
about giving that a shot.


 Man, it is so good to hear this from someone who's actually done it!
 The other point, though, is that the real thing, even traffic /
 social network analysis, is compute-resource intensive and requires a
 kind of programming knowledge that few have. So if something simple,
 like emoticon counting, provides *some* clues about sentiment, it may
 be worth doing. I'm not convinced, though, that it is worth doing.


I'm not sure that's so true... there are a lot of tools out there that can
be hooked together.  The statistics and time series analytics call for some
advanced knowledge, but I doubt if much of it is beyond a master's degree
level.  I found the harder parts to be figuring out what business problems
can be solved, then packaging and presenting the data to people in a useful
manner that also can be automated.  There are a lot of graphs and
visualizations, especially network visualizations, that work for the data at
one point in time and become a mess when the data changes, so they are
useless in an automated system.  I was designing for executives who wanted
everything summarized in a page... definitely a challenge.  All the plumbing
is hard to maintain, too, which is an argument for standards that would
allow the pain to be shared.

Nick

[twitter-dev] Re: Hashing standard for URLs to find the Twitter version of shortened URLs

2009-07-17 Thread Nick Arnett

On Fri, Jul 17, 2009 at 5:50 AM, Bjoern bjoer...@googlemail.com wrote:


 In fact if such a scheme was in place, it would also give people a way
 to officially link to a site. They could add the hash of the
 destination URL in their tweet and become searchable. I realize that
 would probably be too geeky for widespread adaption, but in theory I
 like the idea ;-)


This issue goes well beyond Twitter.  Those of us who have created any sort
of URL tracking and measurment application would benefit from it. There's
great value, I am certain, in being able to identify, as close to real-time
as possible, URLs that are being cited by a lot of people (or by
influencers/opinion leaders, etc.)  Each cite is a signifcant vote for the
page and when it occurs in real-time media (v. static web pages), it
provides a relevance metric that Google and its competitors aren't touching
yet.

This seemed to be worth a blog post:

http://www.nickarnett.net/2009/07/17/whats-really-wrong-with-url-shorteners/

Nick

[twitter-dev] Re: Matt Sanford, signing off.

2009-07-17 Thread Nick Arnett

On Fri, Jul 17, 2009 at 2:18 PM, Matt Sanford m...@twitter.com wrote:



 Good night, and good luck;
  – Matt Sanford / @mzsanford
 Twitter Dev


Bonne nuit et bonne chance!

(Just thought I'd throw in a language detection challenge.  A small one.)

Nick

[twitter-dev] Re: Twitter is not making money

On Thu, Jul 16, 2009 at 12:26 PM, Michael Yardley middleto...@gmail.comwrote:


 They are just running on Venture Capital.When the money runs out they
 will have to start chraging.You cannot run a business for FREE.People
 should have to pay to Twitter.So much a Tweet. LOL


Yeah, just like Google started charging us per search when they ran out of
money.

Nick

[twitter-dev] Social media developer groups?

Anybody here know of any organized social media developer groups, more than
just a forum or mailing list?
I did some searching and didn't come up with much of anything.

I'm wondering where developers would be likely to meet other developers,
network, etc., outside of application or platform-specific communities.

Thanks for any pointers.

Nick

[twitter-dev] Re: Social media developer groups?

On Thu, Jul 16, 2009 at 12:52 PM, JDG ghil...@gmail.com wrote:

 have you tried business-oriented social networking sites like LinkedIn? It
 has a bazillion groups and I'd be shocked if there weren't a social
 computing-oriented group.


Well, I'm really thinking more of something beyond just a web-based group -
meetups and such.  But there might be some such things loosely organized via
LinkedIn or similar.  Maybe that sort of loosely organized group is just
what the Internet is good for...

Nick

[twitter-dev] Re: Twitter is not making money

On Thu, Jul 16, 2009 at 1:14 PM, Stuart stut...@gmail.com wrote:


 Twitter have a business plan, we're just not worthy enough to know all
 the details. What we know so far is that they're planning to launch a
 premium account type with a bunch of tools to aid brand and engagement
 tracking. If Twitter can maintain their popularity big business will
 pay a small fortune to be able to measure the effectiveness of the way
 they're using their accounts.


At the risk of really deviating from developer talk... We know this?  Who's
we and how do we know this?

I have a hard time seeing how analysis of Twitter alone would compete with
existing services that monitor brands in conversations across many
platforms.  I started one of the first companies to do that, ten years ago,
which is quite a head start... and it is now owned by one of the biggest
brand monitoring companies on the planet.  Lots of competition has come
along since then.

Anyway, this was fun, but it's not about developing code as such, so I'll
shut up.  Maybe this is a conversation for that non-platform-specific social
media developer community I was wondering about... ;-)

Nick

[twitter-dev] Re: Searching for tweets that refer to an URL still impossible with bit.ly (and others)

On Wed, Jul 15, 2009 at 7:19 AM, Andrew Badera and...@badera.us wrote:



 But I believe bit.ly returns different, unique URLs for logged-in users


That is an option, but in my experience, it is relatively rare.

Nick

[twitter-dev] Re: Failed API returning over capacity HTML page content

On Wed, Jul 15, 2009 at 8:03 AM, J.D. jeremy.d.mul...@gmail.com wrote:


 This is really a pain because I'm calling the API and expecting JSON
 data back. Do I need to check the data each time and see if I actually
 got html by mistake? If so, then I'm uncertain what I should do with
 the html.


In my experience, that's necessary anyway - I wouldn't trust that it would
never happen.  My code waits a few seconds and tries again if the JSON parse
fails.  A bunch of fails in a row and it gives up.

Nick

[twitter-dev] Re: Searching for tweets that refer to an URL still impossible with bit.ly (and others)

On Wed, Jul 15, 2009 at 8:45 AM, Bill Kocik bko...@gmail.com wrote:



 So for 10 URLs, you post 10 status updates, then retrieve your own
 last 10 updates in one call by retrieving your own timeline via /
 statuses/user_timeline(and that's the one hit against your rate limit).


If Twitter will shorten multiple URLs in the same tweet, you could get even
more than that.  I just tried putting two longer URLs in a tweet and it
didn't shorten them at all, just did the ellipsis thing, so that was
inconclusive.  This method is rather unreliable, I suppose... and I don't
want to post more test tweets.  My mother will see them on Facebook and
become confused.  ;-)

Nick

[twitter-dev] Re: Rules About Making Money

On Wed, Jul 15, 2009 at 1:30 PM, MakeMoney chicagolocalde...@gmail.comwrote:


 I have a business plan and I am looking to role it out.  It involves
 using Twitter as a median.  I have already gotten interest from
 parties willing to pay for my service, but I beleive it may infringe
 upon how Twitter will eventually make money.  I do not want to invest
 in this service, and then have Twitter shut it down to replace it with
 their own.  I sent Twitter an email today asking them for a possible
 discussion time, but I am guessing they get a ton of these and most
 likely won't respond.  If not does anyone know the legality of using
 there service to make money?  And the legality of them being able to
 shut off my account?  Thanks.


Generally speaking, any company that uses its terms of service to stifle
competition is running the risk of violating anti-trust laws.  For that
reason, I seriously doubt if they'll even answer your email.  That's a very
dangerous conversation to have.  Companies have to compete on their
offerings, not by making deals with potential competitors.

Consider the fact that there are hundreds or thousands of software
developers who use Windows to compete with Microsoft.  Not only is that
legal, Microsoft has found itself in legal hot water when it tries to
prevent it.  Imagine, for example, if Microsoft tried to stop OpenOffice
from running under Windows.  The U.S. DOJ would jump all over that.

On the other hand, when you're dancing with the elephant it is easy to get
stepped on.  As I think Heidi Roizen used to say, How do you know that
Microsoft likes you?  They crush you last.

A lot of people don't understand anti-trust laws and how they affect
communities and their conversations.  For example, it would be a huge
problem if developers here began discussing and comparing how much they
charge for their work.  That sort of conversation tends to be interpreted by
the courts as price fixing, which is unlawful.

Nick

Hash tag standards? (Re: [twitter-dev] Re: @username matches and hash tags)

2009-06-30 Thread Nick Arnett

On Tue, Jun 30, 2009 at 10:57 AM, Chad Etzel jazzyc...@gmail.com wrote:



 I don't think #foo/bar is a valid hashtag, so I don't account for those.


Are there standards for hashtags?  I just did some searching and didn't come
up with anything.  Seems like the de facto standard is everything from the
hash mark through white space or ending.  But maybe the beauty of hashtags
is that vagueness...

Nick

[twitter-dev] Re: Counting retweets

2009-06-30 Thread Nick Arnett

On Tue, Jun 30, 2009 at 4:24 PM, Peter Denton petermden...@gmail.comwrote:

 I was playing around with retweeting and also found there is a pretty
 substantial amount of non crediting RT's. So if your trying to be scientific
 about the whole thing, you might want to search for the string. Just FYI.


More than that - if you're looking at retweeted URLs, you have to resolve
shortened URLs to their original.

Nick

[twitter-dev] Re: in_reply_to_status_id validation has changed

2009-06-28 Thread Nick


This is excellent news! (Sorry, I'm a little behind.)

However, documentation still asserts that the @mention must be at the
beginning of the tweet (maybe someone just needs to update this wiki
page: 
http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-statuses%C2%A0update)
and the Twitter web interface still appears to follow this rule.

That is, if I click reply from the web, it starts the tweet with the
@mention at the very beginning.  And if I move the @mention to
somewhere else in the tweet (to avoid the #fixreplies bug and let my
other followers see it), in_reply_to is silently dropped.  From the
web interface, this makes it impossible to send a reply that your
followers can both see and follow to the conversation thread.

Is the web interface just behind the times?  Or are we encouraged not
to use this newly-relaxed feature?

Thanks,
Nick

On Apr 30, 6:02 pm, Doug Williams d...@twitter.com wrote:
 Before today, the value of the in_reply_to_status_id field was validated by
 two requirements,:

 1) It was set to a valid status_id
 2) The valid status_id's author from #1 was @replied in the update (@reply
 here is the old definition where @user was at the beginning of the tweet).

 If the value of in_reply_to_status_id did not meet these criteria, it was
 silently dropped.

 We have relaxed requirement #2 to permit mentions, meaning that the user of
 the referenced tweet needs to be included somewhere in the update. Enjoy the
 new data!

 Thanks,
 Doug
 --

 Doug Williams
 Twitter Platform Supporthttp://twitter.com/dougw

[twitter-dev] Re: user ambiguity

2009-06-27 Thread Nick Arnett

On Sat, Jun 27, 2009 at 4:05 PM, Christine christine.kar...@gmail.comwrote:


 On Jun 27, 9:55 pm, Nick Arnett nick.arn...@gmail.com wrote:
  There are parameters available to avoid the ambiguity.
  See
 http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-statuses-user_t...

 Yes, I know, that's what I am now using. I was just wondering why
 there should be such an ambiguity in the api.


That's what happens when people don't think of everything in the first
release.

;-)

Nick

[twitter-dev] Re: Search beyond 7 days

2009-06-22 Thread Nick Arnett

On Sun, Jun 21, 2009 at 8:16 PM, Doug Williams d...@twitter.com wrote:


 Unfortunately not. We currently do not offer a method to retrieve
 tweets past what is available within our pagination limits [1].


Not meant in the smart-ass way, may I point out that Google will return
results much older.  I've used queries like site:twitter.com keyword to
find older tweets.  Biggest problem, though, is that you're searching all
the text, not just tweets.

The query site:twitter.com money has about 2 million hits, though it is
returning results from at least three domains - twitter.com,
m.twitter.comand (new to me)
explore.twitter.com.

Nick

[twitter-dev] Re: Don't take twitter down for maintenance today!

2009-06-16 Thread Nick Toumpelis


Excellent decision! Kudos to Twitter and NTT.

On 16 Ιουν 2009, at 3:41 ΠΜ, Doug Williams wrote:

For posterity's sake, I'm including a link explaining our  
rescheduling of this downtime: http://blog.twitter.com/2009/06/down-time-rescheduled.html


Thanks,
Doug


Nick Toumpelis

email: n...@toumpelis.me.uk
twitter: macsphere
web: http://www.canaryapp.com

[twitter-dev] Re: Spinn3r Twitter Social Media Rank

2009-06-15 Thread Nick Arnett

Kevin Burton? Reputation guy? Hey, it IS you! Cool.

Your post got me thinking about this topic... blog post in the works about
reputation portability and social networking APIs. And then I realized who
you are.

The ranking stuff is always interesting... but it always seems to be a
solution in search of a problem. The problem usually seems to be, Who
should I follow? and then the rankings seem so obvious as to be low
value... and I start wondering how this could be personalized. The big
rankings tell me who *everybody* should follow, but I want to know who *I*
should follow... and now I'm wondering how, with all the APIs around, I can
base that on much more than just Twitter-sourced data.

I keep thinking about community detection these days, too. In other words,
analyzing the social graph to identify communities that may not have
identified themselves yet. In fact, I tend to think these two things are
really the same thing, from different angles. The people who I should
follow are people I'm implicitly in community with anyway - shared contact,
shared interests, etc. On the other hand, all this begs the question of
what community is, why it has value, etc. What's the value in giving a
group of people the feedback that hey, you guys are behaving like a
community? What does that self-awareness trigger?

I'm rambling... but it was good to suddenly realize that this burtonater is
the same guy I've been paying attention to over the years. I'll be up in SF
next week if you'd like to connect.

Nick

On Mon, Jun 15, 2009 at 1:43 PM, burton burtona...@gmail.com wrote:

Hey guys.

We just pushed this today:

http://spinn3r.com/rank/twitter.php

as part of our Spinn3r 3.1 release:

http://blog.spinn3r.com/2009/06/spinn3r-31---now-with-twitter-support-and-social-media-ranking.html

Would love feedback.

If this is valuable for the community we would be willing to compute
deeper rankings (on a deeper crawl) and recompute this more regularly
(once every two weeks or so).

Kevin

[twitter-dev] Re: Streaming API + PHP and Python

2009-06-08 Thread Nick Arnett

Try calling encode(utf-8) on the strings before you do anything else with
them but when you do, you may find that you have to add Python
components.
In other words, if the string is foo, do this:

foo = foo.encode(utf-8)

Nick

On Mon, Jun 8, 2009 at 4:52 PM, Chad Etzel jazzyc...@gmail.com wrote:


 Hi Jason,

 Thanks!  I've tried it out, and it seems that it doesn't like unicode
 characters?  Here's the traceback I get:

 Exception in thread Thread-2:
 Traceback (most recent call last):
  File /usr/lib/python2.5/threading.py, line 486, in __bootstrap_inner
self.run()
  File spritzer.py, line 31, in run
print '%s -- %s' % (t['user']['screen_name'], t['text'])
 UnicodeEncodeError: 'ascii' codec can't encode characters in position
 11-14: ordinal not in range(128)

 Exception in thread Thread-1:
 Traceback (most recent call last):
  File /usr/lib/python2.5/threading.py, line 486, in __bootstrap_inner
self.run()
  File spritzer.py, line 31, in run
print '%s -- %s' % (t['user']['screen_name'], t['text'])
 UnicodeEncodeError: 'ascii' codec can't encode character u'\u2625' in
 position 9: ordinal not in range(128)



 I'm not fluent in python, so I'm not sure of the unicode
 capabilites... but otherwise it looks like it's connecting and
 receiving data.

 -Chad

 On Mon, Jun 8, 2009 at 8:36 PM, Jason Emerick jemer...@gmail.com wrote:
  Here is some rough python code that I quickly wrote last weekend to
 handle
  the json spritzer feed: http://gist.github.com/126173
 
  During the 3 or so days that I ran it, I didn't notice it die at any
 time...
 
  Jason Emerick
 
  The information transmitted (including attachments) is covered by the
  Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is intended
 only
  for the person(s) or entity/entities to which it is addressed and may
  contain confidential and/or privileged material.  Any review,
  retransmission, dissemination or other use of, or taking of any action in
  reliance upon, this information by persons or entities other than the
  intended recipient(s) is prohibited.  If you received this in error,
 please
  contact the sender and delete the material from any computer.
 
 
  On Mon, Jun 8, 2009 at 5:25 PM, Chad Etzel jazzyc...@gmail.com wrote:
 
  Well, glad I'm not the only one :) But still a bummer it's happening...
 
  Another strange thing is that his does *not* seem to happen with the
  /follow streams.  I have a PHP script running (same source, just
  requesting /follow instead of /spritzer) that has been connected for
  over 2 days.  Of course, it may die at any moment, I'm not sure..
 
  One big difference is that the throughput for that stream is much much
  less than the /hose streams, and I'm wondering if the sheer volume of
  bytes being pushed has something to do with it? That would be quite
  sad.
 
  I have PHP scripts acting as Jabber/XMPP clients that use the similar
  fsockopen/fread/fgets/fwrite mechanisms that have been up for months
  at a time, so I know those socket connections *can* stay up a long
  long time in theory.
 
  -Chad
 
  On Mon, Jun 8, 2009 at 5:00 PM, jstrellner j...@twitturly.com wrote:
  
   Hi Chad,
  
   We too have noticed the same behavior in PHP.  Initially I wrote
   something very similar to your example, and noticed that I'd get a
   random time's worth of data before it disconnected.  Then I rewrote
   it, which you can see at the below URL (modified to remove irrelevant
   code to this discussion), but I am still seeing similar results.  Now
   it goes for 2-3 days, and then stops getting data.
  
   I can see that the script is still running via ps on the command
   line, and I can still see data going through the server, just PHP
   doesn't process it anymore.
  
   http://pastie.org/505012
  
   I'd love to find out what is causing it.  I do have a couple of
   theories specific to my code that I am trying - the only thing that
   sucks is that it is random, so the tests take a few minutes or days,
   depending on when it feels like dying.
  
   Let me know if this code works or helps you in any way. Feel free to
   bounce any ideas off of me, maybe we can come up with a stable
   solution.
  
   -Joel
  
  
   On Jun 8, 1:36 pm, Chad Etzel jazzyc...@gmail.com wrote:
   I thought those things, too... but the following things made me think
   otherwise:
  
   a) The stream stops after a different number of updates/bytes each
   time, and will happily go on forever if I put an error-catching loop
   in the script.
  
   b) The same thing is happening in the python script.
  
   c) Curl/telnet works fine, so it's not a system resource depletion
   issue
  
   ...still confused,
   -Chad
  
   On Mon, Jun 8, 2009 at 4:31 PM, John Kalucki jkalu...@gmail.com
   wrote:
  
A theory: The PHP client has stopped reading data, for whatever
reason. The TCP buffers fill on the client host, the TCP window
closes, and wireshark shows no data flowing. netstat(1) will show

[twitter-dev] Re: random sampling of users....do we know anything about user id range?

2009-06-04 Thread Nick Arnett

On Wed, Jun 3, 2009 at 7:13 PM, TechRavingMad techraving...@gmail.comwrote:


 There are a little over 44.5 million twitter IDs as of right now
 (10:10pm cst 6/3/9) with what seems to be about 10 being added every
 second.


However, Twitter has been quite clear about not saying if status IDs
correspond to the actual number of statuses, so I'd guess that they're
equally circumspect about whether or not the number of user IDs corresponds
to the number of users.  In other words, we can be sure there are not more
than 44.5 million users, but we don't know how much lower the actual number
is.  We don't know if all IDs have been used... and even Twitter doesn't
know how many of those IDs belong to the same users.

I would think that if one wants a random sample of users, one would have to
propose a selection method and ask Twitter if there's any reason that it
would introduce a selection bias... and hope that they are willing to reply.

Seems to me that the biggest problem would be to include quiet users,
since only those who post in public become visible.

NIck

[twitter-dev] Re: Bulk id - screen_name resolution.

2009-06-01 Thread Nick Arnett

On Sun, May 31, 2009 at 3:57 PM, Stuart stut...@gmail.com wrote:


 Much as I respect Twitter and the great people who work there, I don't
 buy that this would place too much demand on their servers. They
 already use Memcached extensively, and this would be a pretty simple
 addition to that data store.


For that very reason, I'm not sure it makes sense for third parties to
collaborate on a single-purpose distributed store.  There are user/account
properties that Twitter won't implement, at least not until there's a lot of
demonstrated value.  In other words, the developer community could
collaborate on problems that have marginal value to Twitter in the short
run.

Nick

[twitter-dev] Re: Change Tracker

2009-06-01 Thread Nick Arnett

On Sun, May 31, 2009 at 7:22 PM, TechRavingMad techraving...@gmail.comwrote:


 I was wondering if there would be any way that Twitter could publish
 an RSS feed of when certain profile items are changed?


See http://code.google.com/p/twitter-api/issues/detail?id=334 for the bad
news about this.  Twitter doesn't seem to see this as an actual issue,
unfortunately.

I don't know how to vote for getting it out of the WontFix pile.

Nick

[twitter-dev] Re: urls in tweets

2009-05-31 Thread Nick Arnett

On Sun, May 31, 2009 at 4:53 AM, grand_unifier jijodasgu...@gmail.comwrote:


 i have written a code to get all tweets that have urls in them in atom
 or json format.

 now i want a way to:

 1separate the urls from the tweetslike a tweetmeme way...
 2find out if the url represents a video...

 how will i do that??


I don't think anyone can answer this in detail without knowing what language
are you writing this code in.  You should be able to use a regular
expression to extract the URLs and then use the file extension to detect
whether or not it is a direct link to a video file.  But if it is a link to
a page that contains a video, you'll have to fetch the page and examine its
links.

There are some URL patterns that you probably can assume point to pages that
contain video, such as YouTube URLs.

Nick

[twitter-dev] Re: Geographical distribution / latency of api servers

2009-05-31 Thread Nick Arnett

On Fri, May 29, 2009 at 2:27 AM, John Adams j...@twitter.com wrote:


 On May 29, 2009, at 2:14 AM, jmathai wrote:

  What's the geographical distribution of the api servers?  And, are
 requests routed to the nearest farm/colo?


 All servers are currently on the west coast.


Thus, daily prayers are in order, that the Big One doesn't hit until Twitter
has a redundant operations center elsewhere.

Nick

[twitter-dev] Re: What kind of server setups do Twitter search engines use for indexing public tweets?

2009-05-23 Thread Nick Arnett

On Sat, May 23, 2009 at 9:09 AM, J... celebur...@gmail.com wrote:


 I am curious what kind of server/hosting plans are used for sites like
 tweetmeme.com where twitter links are being indexed.


I operate TwURLed News, which is doing that sort of thing (
http://twURLedNews.com).  The database and analytics code are running on
Python and MySQL on a Intel-based BSD box.  The site is hosted at Bluehost,
using Wordpress.  I originally had the database there, too, but it looked
like it would eat too many CPU cycles, so I moved it back to a machine at my
office.

TwURLed News doesn't need a lot of processing power because it doesn't try
to drink the entire firehose.  It follows and crawls Twitter users based on
their track record of citing URLs that became popular, their proximity in
the social network to such people and their use of two-word phrases used by
such people.  In other words, recursive graph exploration in which citing a
URL that becomes popular adds to your weight in the graph.  Every few
minutes, it publishes the URLs that were posted by the currently highest
scoring people.  It also follows the highest-scoring people, periodically
un-following aggregators and those whose scores have fallen too low.  At the
moment it is following about 2,000 people (http://twitter.com/twurlednews).

Nick

[twitter-dev] Re: Date time string

2009-05-22 Thread Nick Arnett

On Thu, May 21, 2009 at 2:21 PM, John Meyer john.l.me...@gmail.com wrote:


 I've noticed that most of the date time strings in the XML responses are
 formatted like this Thu May 21 03:15:28 + 2009  What exactly is
 that +?


It is the offset, from GMT, of the time zone being stamped.  For example,
from your email's headers, there's this time stamp in a Received header:

Thu, 21 May 2009 14:21:53 -0700 (PDT)

That says that it was stamped in a time zone that is 7 hours behind
GMT.  The PDT specifies which of those time zones it is.  The date
header on your email was six hours behind GMT (-0600).

And if you are wondering why it is four digits, for hours and minutes,
it's because there are time zones whose offset is a number or hours
plus 30 minutes.

Nick

[twitter-dev] Re: How To: Remove Follower

2009-05-15 Thread Nick Arnett

On Thu, May 14, 2009 at 10:32 PM, TweetClean tweetcl...@gntmidnight.comwrote:


 I see that there is the ability to remove people I am following using
 the friendships/destroy API call.  How do I remove someone who is
 following me?  I am sure it is right in front of my eyes but I am not
 making any connections.


I think that would be blocks/create.

Nick

[twitter-dev] Re: Can Somebody Help Me Setup My Twitter API?

2009-05-14 Thread Nick Arnett

On Thu, May 14, 2009 at 10:53 AM, J... celebur...@gmail.com wrote:


 Hello Twitter API Group,

 I am a novice programmer who would like to experiment with the Twitter
 API, however I have spent the last week attempting to setup the API


What language are you using?

Nick

[twitter-dev] Re: Status ID closing in on maximum unsigned integer

2009-05-13 Thread Nick Arnett

On Wed, May 13, 2009 at 7:40 AM, Matt Sanford m...@twitter.com wrote:


 Make that *signed* integer. Writing too many emails at once, sorry.


That would be the MySQL (and any 32-bit OS) maximum signed integer, or
2,147,483,647,
I'm assuming.  So if we're using an unsigned INT in our tables, we're fine
for a while (months?) - status IDs will always be positive, right?  And
BIGINT, especially unsigned, is safe for a long, long time?

Just curious... does this mean Twitter really is closing in on 2 billion
tweets, or were some IDs skipped?  Is there a prize for whoever posts tweet
number 2 billion?  ;-)

Nick

[twitter-dev] Re: About the phenomenon of change line of no intention when it contributes entering

2009-05-12 Thread Nick Arnett

2009/5/12 moz syo...@gmail.com


 Because the following phenomenon was discovered, it reports.


Because the preceding syntax was observed to be stilted, it is wondered if
human is reporting or AI.

Nick

[twitter-dev] Re: About the phenomenon of change line of no intention when it contributes entering

2009-05-12 Thread Nick Arnett


That's actually what I thought.  I hope you realize I meant it only as
humor, not criticism.

Nick

On 5/12/09, moz syo...@gmail.com wrote:

 Nick Arnett wrote:

 I'm sorry. I get help of the translation software because it is not good
 at English.

[twitter-dev] Re: Planned site maintenance Friday, May 8th 2PM-3PM PST and Monday, May 11th Noon-1PM PST

On Thu, May 7, 2009 at 10:33 PM, Abraham Williams 4bra...@gmail.com wrote:

 That is usually what site maintenance means...

In my experience, a maintenance window means that the site could be down for
any or all of that period.  The way Doug wrote it, I'd imagine that they
expect the site will only be down for a short time, but they're reserving an
hour in case it takes longer for unanticipated reasons.

We shall see...

Nick

[twitter-dev] Abuse of multiple accounts

I knew this would happen... one person with a bunch of accounts has managed
to spam my social network analysis:
http://www.twurlednews.com/2009/05/08/entrepreneurs-wanted-12/

In this case, it is very obviously the same person, since she is using the
same picture for every account and only slight variations of her real name.

I can detect some of this by seeing real names that correlate to multiple
identical tweets... Curious if anybody else has thoughts on ways to identify
this sort of abuse.  Perhaps if the API told us what percentage of people
block each user?

Just noticed that most of her profiles have the same home page URL, so
that's a strong clue... and most of her tweets contain the same URL.

I'm sure that Twitter's fraud group uses some sort of scoring system... any
chance that any of that data could be shared in the API to help automated
systems avoid retweeting spam?

Nick

[twitter-dev] Re: Abuse of multiple accounts

On Fri, May 8, 2009 at 8:32 AM, Matt Sanford m...@twitter.com wrote:

 Hi there,
 We do have a slew of reports and tools for our abuse team looking at
 blocking, duplicates and some secret sauce to find bad accounts. I'll pass
 this on and see if it wasn't caught for some reason or is in the process of
 being handled. As far as sharing our data it via the API we have no plans to
 do that. The issue isn't showing the data to friends, it's showing it to
 enemies. I think the development community could probably come up with some
 cool analysis on this, but so could the spammers. If you show your opponent
 all of your cards they will raise the stakes.


I certainly understand that, but I was thinking more of a score, rather than
any information about what's behind the score, to use as evidential logic. I
can see why it is safer and easier to just keep it all behind the scenes
until and unless the account is shut down.

Any chance of sharing the percentage of people who have blocked each user?
 That's feedback from your users, after all, and thus somewhat belongs to
the community.  (There's probably a huge hole in that argument somewhere,
but I'm not going to think about it).

Nick

[twitter-dev] Re: Abuse of multiple accounts

On Fri, May 8, 2009 at 10:49 AM, Doug Williams d...@twitter.com wrote:

 Actually this set of accounts are prime targets to eventually get swept up
 by one of our automated spam algorithms.


That's good to hear.  I'm going to wait and see how often this happens
before I start working on new code to detect it myself.

Nick

[twitter-dev] Re: Abuse of multiple accounts

On Fri, May 8, 2009 at 11:56 AM, Doug Williams d...@twitter.com wrote:

 Nick,
 We have a chief scientist in house who actually manages all of these
 algorithms. It is his job to determine how to spot spam and sketchy users
 (his words) through the data. I'm sure you can understand why we cannot
 share this part of our secret sauce openly.


Really, I wasn't asking for algorithms... I was hoping for scores, but
that's okay.



 Also, if there are isolated incidents of spam or abuse that you want to
 report, you can always send an @reply to @dougw and I can take care of them
 on my own.


As long as they are isolated, I'm not going to worry much about 'em.  ;-)

Given that this was the first one I've noticed in months, it seems that you
guys are doing a good job.

Nick

[twitter-dev] Re: Planned site maintenance Friday, May 8th 2PM-3PM PST and Monday, May 11th Noon-1PM PST