[twitter-dev] Re: Snowflake: An update and some very important information

SM Wed, 10 Nov 2010 21:35:14 -0800

Hello. Couple questions:

1.) Are you planning on eventually eliminating the integer
representation and only using strings for id's?


2.) If an application doesn't use Javascript to parse JSON (for
example, YAJL-OBJC and NSNumbers in Obj-C), is it necessary to make
any changes at all?

Thanks.



On Oct 19, 3:52 pm, Matt Harris <[email protected]> wrote:
> Hey everyone,
>
> Thank you to all of you for your questions, patience and contributions to
> this thread.  Hearing your views and knowing how you use the API helps us
> provide more information where there wasn't enough, and clarify details
> where there was ambiguity.
>
> I've collated the questions i've received from you directly, over Twitter to
> @twitterapi and through this list.  I hope the comments below provide enough
> information to answer those questions and explain the reasoning being our
> decisions.
>
> Thanks for your support and patience,
> @themattharris
>
> 1) Will search.twitter.com also include id_str and
> in_reply_to_status_id_str?
> Yes, Search will include the String representations of those IDs.
>
> 2) Which fields are affected by this change?
> All IDs which are transmitted as Integers will have a String representation
> in the API response. Only Tweet IDs (which includes mentions and retweets)
> will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and
> Users may change to a Snowflake ID scheme in the future but this isn’t
> planned for this year.
>
> We are adding String representations of the Integer IDs now so you can
> update all of your code to use the String representations throughout. to
> allow developers to make the change now for all the ID fields and be
> prepared should any other IDs break the 53bit boundary.
>
> 3) Which fields will have String representations?
> The fields which will have String representations are:
>
> id (DM, Saved Search, User, List )
> in_reply_to_status_id
> in_reply_to_user_id
> new_id (Streaming only. Will be removed when Snowflake is enabled)
> current_user_retweet_id (When include_my_retweet=1 is passed)
>
> 4) Can you provide a complete Tweet example with Snowflake ID to test?
>
> [{"coordinates":null,"truncated":false,"created_at":"Thu Oct 14 22:20:15
> +0000
> 2010","favorited":false,"entities":{"urls":[],"hashtags":[],"user_mentions" 
> :[{"name":"Matt
> Harris","id":777925,"id_str":"777925","indices":[0,14],"screen_name":"thema 
> ttharris"}]},"text":"@themattharris
> hey how are
> things?","annotations":null,"contributors":[{"id":819797,"id_str":"819797", 
> "screen_name":"episod"}],"id":10765432100123456789,"id_str":"10765432100123 
> 456789","retweet_count":0,"geo":null,"retweeted":false,"in_reply_to_user_id 
> ":777925,"in_reply_to_user_id_str":"777925","in_reply_to_screen_name":"them 
> attharris","user":{"id":6253282,"id_str":"6253282"},"source":"web","place": 
> null,"in_reply_to_status_id":10586268426842688951,"in_reply_to_status_id_st 
> r":"10586268426842688951"}]
>
> 5) What is happening with new_id in the Streaming API?
> new_id and new_id_str will be switched off when or soon after Snowflake is
> enabled on November 4th.
>
> 6) Why not restrict IDs to 53bits?
>
> A Snowflake ID is composed:
>  * 41bits for millisecond precision time (69 years)
>  * 10bits for a configured machine identity (1024 machines)
>  * 12bits for a sequence number (4096 per machine)
>
> The factor influencing the length of the ID is the time. For a 53bit ID this
> would mean only 31bits are available for the time. 31bits is only enough for
> 24 days (2147483648/(1000*60*60*24)) of time.
>
> Reducing the resolution of the timestamp would prevent a K-sorted resolution
> of 1 second or less.
>
> Reducing the configured machine identity or sequence number by 1bit would
> mean we couldn’t scale Twitter, or operate our infrastructure in an
> uncoordinated high-available way.
>
> 7) When will the 53bit Integer overflow happen?
> 24 days after Snowflake starts counting.
>
> 8) Is it safe to parse and store IDs as signed 64bit Integers?
> Yes.
>
> 9) Why offer both the String and Integer versions of the ID?
> The String representation is needed to ensure languages which cannot convert
> the >53bit Integer can still use the ID in other API requests.
>
> The Integer value is being retained for languages which can handle 
> numbers>53bit and to prevent applications which have not converted from being
>
> cut-off from Twitter.
>
>  10) When ID is null what will the _str representation be?
> The _str representation will also be null.
>
> 11) Did you really mean ‘unsigned’ 64bit Integer?
> Strictly speaking the Snowflake is a signed 64bit long under the hood. That
> being said, we will never use the negative bit and won’t require the full
> 64bits for positive numbers for about 69 years:
>
> http://www.google.com/search?q=%282**41%29+%2F+%2860*60*24*1000%29+%2...
>
> 12) Why not make the strings opt-in?
> We did consider this as an option but decided against it for a number of
> reasons. The first reason is that the ID is fundamental to being able to
> work with the data from the API so receiving the correct ID shouldn’t be
> something you have to opt into. The second, more influential reason, is that
> making the _str representations opt-in would create significant, performance
> affecting issues for the API.
>
> On Mon, Oct 18, 2010 at 5:34 PM, themattharris 
> <[email protected]>wrote:
>
>
>
>
>
>
>
> > Thanks to @gotwalt for spotting the missing commas.
>
> > Fixed JSON sample ...
>
> > [
> >  {
> >    "coordinates": null,
> >    "truncated": false,
> >    "created_at": "Thu Oct 14 22:20:15 +0000 2010",
> >    "favorited": false,
> >    "entities": {
> >      "urls": [
> >      ],
> >      "hashtags": [
> >      ],
> >      "user_mentions": [
> >        {
> >          "name": "Matt Harris",
> >          "id": 777925,
> >          "id_str": "777925",
> >          "indices": [
> >            0,
> >            14
> >          ],
> >          "screen_name": "themattharris"
> >        }
> >      ]
> >    },
> >    "text": "@themattharris hey how are things?",
> >    "annotations": null,
> >    "contributors": [
> >      {
> >        "id": 819797,
> >        "id_str": "819797",
> >        "screen_name": "episod"
> >      }
> >    ],
> >    "id": 12738165059,
> >    "id_str": "12738165059",
> >    "retweet_count": 0,
> >    "geo": null,
> >    "retweeted": false,
> >    "in_reply_to_user_id": 777925,
> >    "in_reply_to_user_id_str": "777925",
> >    "in_reply_to_screen_name": "themattharris",
> >    "user": {
> >      "id": 6253282,
> >      "id_str": "6253282"
> >    },
> >    "source": "web",
> >    "place": null,
> >    "in_reply_to_status_id": 12738040524,
> >    "in_reply_to_status_id_str": "12738040524"
> >  }
> > ]
>
> > Best,
> > @themattharris
>
> > On Oct 18, 5:19 pm, Matt Harris <[email protected]> wrote:
> > > Last week you may remember Twitter planned to enable the new Status ID
> > > generator - 'Snowflake' but didn't. The purpose of this email is to
> > explain
> > > the reason why this didn't happen, what we are doing about it, and what
> > the
> > > new release plan is.
>
> > > So what is Snowflake?
> > > ------------------------------
> > > Snowflake is a service we will be using to generate unique Tweet IDs.
> > These
> > > Tweet IDs are unique 64bit unsigned integers, which, instead of being
> > > sequential like the current IDs, are based on time. The full ID is
> > composed
> > > of a timestamp, a worker number, and a sequence number.
>
> > > The problem
> > > -----------------
> > > Before launch it came to our attention that some programming languages
> > such
> > > as Javascript cannot support numbers with >53bits. This can be easily
> > > examined by running a command similar to: (90071992547409921).toString()
> > in
> > > your browsers console or by running the following JSON snippet through
> > your
> > > JSON parser.
>
> > >     {"id": 10765432100123456789, "id_str": "10765432100123456789"}
>
> > > In affected JSON parsers the ID will not be converted successfully and
> > will
> > > lose accuracy. In some parsers there may even be an exception.
>
> > > The solution
> > > ----------------
> > > To allow javascript and JSON parsers to read the IDs we need to include a
> > > string version of any ID when responding in the JSON format. What this
> > means
> > > is Status, User, Direct Message and Saved Search IDs in the Twitter API
> > will
> > > now be returned as an integer and a string in JSON responses. This will
> > > apply to the main Twitter API, the Streaming API and the Search API.
>
> > > For example, a status object will now contain an id and an id_str. The
> > > following JSON representation of a status object shows the two versions
> > of
> > > the ID fields for each data point.
>
> > > [
> > >   {
> > >     "coordinates": null,
> > >     "truncated": false,
> > >     "created_at": "Thu Oct 14 22:20:15 +0000 2010",
> > >     "favorited": false,
> > >     "entities": {
> > >       "urls": [
> > >       ],
> > >       "hashtags": [
> > >       ],
> > >       "user_mentions": [
> > >         {
> > >           "name": "Matt Harris",
> > >           "id": 777925,
> > >           "id_str": "777925",
> > >           "indices": [
> > >             0,
> > >             14
> > >           ],
> > >           "screen_name": "themattharris"
> > >         }
> > >       ]
> > >     },
> > >     "text": "@themattharris hey how are things?",
> > >     "annotations": null,
> > >     "contributors": [
> > >       {
> > >         "id": 819797,
> > >         "id_str": "819797",
> > >         "screen_name": "episod"
> > >       }
> > >     ],
> > >     "id": 12738165059,
> > >     "id_str": "12738165059",
> > >     "retweet_count": 0,
> > >     "geo": null,
> > >     "retweeted": false,
> > >     "in_reply_to_user_id": 777925,
> > >     "in_reply_to_user_id_str": "777925",
> > >     "in_reply_to_screen_name": "themattharris",
> > >     "user": {
> > >       "id": 6253282
> > >       "id_str": "6253282"
> > >     },
> > >     "source": "web",
> > >     "place": null,
> > >     "in_reply_to_status_id": 12738040524
> > >     "in_reply_to_status_id_str": "12738040524"
> > >   }
> > > ]
>
> > > What should you do - RIGHT NOW
> > > ----------------------------------------------
> > > The first thing you should do is attempt to decode the JSON snippet above
> > > using your production code parser. Observe the output to confirm the ID
> > has
> > > not lost accuracy.
>
> > > What you do next depends on what happens:
>
> > > * If your code converts the ID successfully without losing accuracy you
> > are
> > > OK but should consider converting to the _str versions of IDs as soon as
> > > possible.
> > > * If your code has lost accuracy, convert your code to using the _str
> > > version immediately. If you do not do this your...
>
> read more »

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

[twitter-dev] Re: Snowflake: An update and some very important information

Reply via email to