[twitter-dev] Re: Snowflake: An update and some very important information

CodeWarden Wed, 20 Oct 2010 11:59:03 -0700

Matt,

In your example you show an unsigned long that uses the full 64 bits
bits, which fails to be properly represented on a Java platform.   In
a later explanation you say that there will only ever be 63 bits
used.  Can you confirm one way or the other?   i.e. will we ever see
an unsigned integer that will take more then 63 bits to represent?


-Paul


On Oct 19, 7:52 pm, Matt Harris <thematthar...@twitter.com> wrote:
> Hey everyone,
>
> Thank you to all of you for your questions, patience and contributions to
> this thread.  Hearing your views and knowing how you use the API helps us
> provide more information where there wasn't enough, and clarify details
> where there was ambiguity.
>
> I've collated the questions i've received from you directly, over Twitter to
> @twitterapi and through this list.  I hope the comments below provide enough
> information to answer those questions and explain the reasoning being our
> decisions.
>
> Thanks for your support and patience,
> @themattharris
>
> 1) Will search.twitter.com also include id_str and
> in_reply_to_status_id_str?
> Yes, Search will include the String representations of those IDs.
>
> 2) Which fields are affected by this change?
> All IDs which are transmitted as Integers will have a String representation
> in the API response. Only Tweet IDs (which includes mentions and retweets)
> will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and
> Users may change to a Snowflake ID scheme in the future but this isn’t
> planned for this year.
>
> We are adding String representations of the Integer IDs now so you can
> update all of your code to use the String representations throughout. to
> allow developers to make the change now for all the ID fields and be
> prepared should any other IDs break the 53bit boundary.
>
> 3) Which fields will have String representations?
> The fields which will have String representations are:
>
> id (DM, Saved Search, User, List )
> in_reply_to_status_id
> in_reply_to_user_id
> new_id (Streaming only. Will be removed when Snowflake is enabled)
> current_user_retweet_id (When include_my_retweet=1 is passed)
>
> 4) Can you provide a complete Tweet example with Snowflake ID to test?
>
> [{"coordinates":null,"truncated":false,"created_at":"Thu Oct 14 22:20:15
> +0000
> 2010","favorited":false,"entities":{"urls":[],"hashtags":[],"user_mentions" 
> :[{"name":"Matt
> Harris","id":777925,"id_str":"777925","indices":[0,14],"screen_name":"thema 
> ttharris"}]},"text":"@themattharris
> hey how are
> things?","annotations":null,"contributors":[{"id":819797,"id_str":"819797", 
> "screen_name":"episod"}],"id":10765432100123456789,"id_str":"10765432100123 
> 456789","retweet_count":0,"geo":null,"retweeted":false,"in_reply_to_user_id 
> ":777925,"in_reply_to_user_id_str":"777925","in_reply_to_screen_name":"them 
> attharris","user":{"id":6253282,"id_str":"6253282"},"source":"web","place": 
> null,"in_reply_to_status_id":10586268426842688951,"in_reply_to_status_id_st 
> r":"10586268426842688951"}]
>
> 5) What is happening with new_id in the Streaming API?
> new_id and new_id_str will be switched off when or soon after Snowflake is
> enabled on November 4th.
>
> 6) Why not restrict IDs to 53bits?
>
> A Snowflake ID is composed:
>  * 41bits for millisecond precision time (69 years)
>  * 10bits for a configured machine identity (1024 machines)
>  * 12bits for a sequence number (4096 per machine)
>
> The factor influencing the length of the ID is the time. For a 53bit ID this
> would mean only 31bits are available for the time. 31bits is only enough for
> 24 days (2147483648/(1000*60*60*24)) of time.
>
> Reducing the resolution of the timestamp would prevent a K-sorted resolution
> of 1 second or less.
>
> Reducing the configured machine identity or sequence number by 1bit would
> mean we couldn’t scale Twitter, or operate our infrastructure in an
> uncoordinated high-available way.
>
> 7) When will the 53bit Integer overflow happen?
> 24 days after Snowflake starts counting.
>
> 8) Is it safe to parse and store IDs as signed 64bit Integers?
> Yes.
>
> 9) Why offer both the String and Integer versions of the ID?
> The String representation is needed to ensure languages which cannot convert
> the >53bit Integer can still use the ID in other API requests.
>
> The Integer value is being retained for languages which can handle 
> numbers>53bit and to prevent applications which have not converted from being
>
> cut-off from Twitter.
>
>  10) When ID is null what will the _str representation be?
> The _str representation will also be null.
>
> 11) Did you really mean ‘unsigned’ 64bit Integer?
> Strictly speaking the Snowflake is a signed 64bit long under the hood. That
> being said, we will never use the negative bit and won’t require the full
> 64bits for positive numbers for about 69 years:
>
> http://www.google.com/search?q=%282**41%29+%2F+%2860*60*24*1000%29+%2...
>
> 12) Why not make the strings opt-in?
> We did consider this as an option but decided against it for a number of
> reasons. The first reason is that the ID is fundamental to being able to
> work with the data from the API so receiving the correct ID shouldn’t be
> something you have to opt into. The second, more influential reason, is that
> making the _str representations opt-in would create significant, performance
> affecting issues for the API.
>
> On Mon, Oct 18, 2010 at 5:34 PM, themattharris 
> <thematthar...@twitter.com>wrote:
>
>
>
> > Thanks to @gotwalt for spotting the missing commas.
>
> > Fixed JSON sample ...
>
> > [
> >  {
> >    "coordinates": null,
> >    "truncated": false,
> >    "created_at": "Thu Oct 14 22:20:15 +0000 2010",
> >    "favorited": false,
> >    "entities": {
> >      "urls": [
> >      ],
> >      "hashtags": [
> >      ],
> >      "user_mentions": [
> >        {
> >          "name": "Matt Harris",
> >          "id": 777925,
> >          "id_str": "777925",
> >          "indices": [
> >            0,
> >            14
> >          ],
> >          "screen_name": "themattharris"
> >        }
> >      ]
> >    },
> >    "text": "@themattharris hey how are things?",
> >    "annotations": null,
> >    "contributors": [
> >      {
> >        "id": 819797,
> >        "id_str": "819797",
> >        "screen_name": "episod"
> >      }
> >    ],
> >    "id": 12738165059,
> >    "id_str": "12738165059",
> >    "retweet_count": 0,
> >    "geo": null,
> >    "retweeted": false,
> >    "in_reply_to_user_id": 777925,
> >    "in_reply_to_user_id_str": "777925",
> >    "in_reply_to_screen_name": "themattharris",
> >    "user": {
> >      "id": 6253282,
> >      "id_str": "6253282"
> >    },
> >    "source": "web",
> >    "place": null,
> >    "in_reply_to_status_id": 12738040524,
> >    "in_reply_to_status_id_str": "12738040524"
> >  }
> > ]
>
> > Best,
> > @themattharris
>
> > On Oct 18, 5:19 pm, Matt Harris <thematthar...@twitter.com> wrote:
> > > Last week you may remember Twitter planned to enable the new Status ID
> > > generator - 'Snowflake' but didn't. The purpose of this email is to
> > explain
> > > the reason why this didn't happen, what we are doing about it, and what
> > the
> > > new release plan is.
>
> > > So what is Snowflake?
> > > ------------------------------
> > > Snowflake is a service we will be using to generate unique Tweet IDs.
> > These
> > > Tweet IDs are unique 64bit unsigned integers, which, instead of being
> > > sequential like the current IDs, are based on time. The full ID is
> > composed
> > > of a timestamp, a worker number, and a sequence number.
>
> > > The problem
> > > -----------------
> > > Before launch it came to our attention that some programming languages
> > such
> > > as Javascript cannot support numbers with >53bits. This can be easily
> > > examined by running a command similar to: (90071992547409921).toString()
> > in
> > > your browsers console or by running the following JSON snippet through
> > your
> > > JSON parser.
>
> > >     {"id": 10765432100123456789, "id_str": "10765432100123456789"}
>
> > > In affected JSON parsers the ID will not be converted successfully and
> > will
> > > lose accuracy. In some parsers there may even be an exception.
>
> > > The solution
> > > ----------------
> > > To allow javascript and JSON parsers to read the IDs we need to include a
> > > string version of any ID when responding in the JSON format. What this
> > means
> > > is Status, User, Direct Message and Saved Search IDs in the Twitter API
> > will
> > > now be returned as an integer and a string in JSON responses. This will
> > > apply to the main Twitter API, the Streaming API and the Search API.
>
> > > For example, a status object will now contain an id and an id_str. The
> > > following JSON representation of a status object shows the two versions
> > of
> > > the ID fields for each data point.
>
> > > [
> > >   {
> > >     "coordinates": null,
> > >     "truncated": false,
> > >     "created_at": "Thu Oct 14 22:20:15 +0000 2010",
> > >     "favorited": false,
> > >     "entities": {
> > >       "urls": [
> > >       ],
> > >       "hashtags": [
> > >       ],
> > >       "user_mentions": [
> > >         {
> > >           "name": "Matt Harris",
> > >           "id": 777925,
> > >           "id_str": "777925",
> > >           "indices": [
> > >             0,
> > >             14
> > >           ],
> > >           "screen_name": "themattharris"
> > >         }
> > >       ]
> > >     },
> > >     "text": "@themattharris hey how are things?",
> > >     "annotations": null,
> > >     "contributors": [
> > >       {
> > >         "id": 819797,
> > >         "id_str": "819797",
> > >         "screen_name": "episod"
> > >       }
> > >     ],
> > >     "id": 12738165059,
> > >     "id_str": "12738165059",
> > >     "retweet_count": 0,
> > >     "geo": null,
> > >     "retweeted": false,
> > >     "in_reply_to_user_id": 777925,
> > >     "in_reply_to_user_id_str": "777925",
> > >     "in_reply_to_screen_name": "themattharris",
> > >     "user": {
> > >       "id": 6253282
> > >       "id_str": "6253282"
> > >     },
> > >     "source": "web",
> > >     "place": null,
> > >     "in_reply_to_status_id": 12738040524
> > >     "in_reply_to_status_id_str": "12738040524"
> > >   }
> > > ]
>
> > > What should you do - RIGHT NOW
> > > ----------------------------------------------
> > > The first thing you should do is attempt to decode the JSON snippet above
> > > using your production code parser. Observe the output to confirm the ID
> > has
> > > not lost accuracy.
>
> > > What you do next depends on what happens:
>
> > > * If your code converts the ID successfully without losing accuracy you
> > are
> > > OK but should consider converting to the _str versions of IDs as soon as
> > > possible.
> > > * If your code has lost accuracy, convert your code to using the _str
> > > version immediately. If you do not do this your
>
> ...
>
> read more »

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

[twitter-dev] Re: Snowflake: An update and some very important information

Reply via email to