Re: [twitter-dev] Re: Snowflake: An update and some very important information

Tom van der Woerdt Wed, 10 Nov 2010 22:56:35 -0800

1) Please don't, I don't want to have to convert everything back to
integers within my code. I consider the string representation a hack
around some issues with certain programming languages, and not an
optimal solution. Wouldn't want this to become the default option.


2) No

Tom


On 11/11/10 6:34 AM, SM wrote:
> Hello. Couple questions:
> 
> 1.) Are you planning on eventually eliminating the integer
> representation and only using strings for id's?
> 
> 2.) If an application doesn't use Javascript to parse JSON (for
> example, YAJL-OBJC and NSNumbers in Obj-C), is it necessary to make
> any changes at all?
> 
> Thanks.
> 
> 
> 
> On Oct 19, 3:52 pm, Matt Harris <[email protected]> wrote:
>> Hey everyone,
>>
>> Thank you to all of you for your questions, patience and contributions to
>> this thread.  Hearing your views and knowing how you use the API helps us
>> provide more information where there wasn't enough, and clarify details
>> where there was ambiguity.
>>
>> I've collated the questions i've received from you directly, over Twitter to
>> @twitterapi and through this list.  I hope the comments below provide enough
>> information to answer those questions and explain the reasoning being our
>> decisions.
>>
>> Thanks for your support and patience,
>> @themattharris
>>
>> 1) Will search.twitter.com also include id_str and
>> in_reply_to_status_id_str?
>> Yes, Search will include the String representations of those IDs.
>>
>> 2) Which fields are affected by this change?
>> All IDs which are transmitted as Integers will have a String representation
>> in the API response. Only Tweet IDs (which includes mentions and retweets)
>> will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and
>> Users may change to a Snowflake ID scheme in the future but this isn’t
>> planned for this year.
>>
>> We are adding String representations of the Integer IDs now so you can
>> update all of your code to use the String representations throughout. to
>> allow developers to make the change now for all the ID fields and be
>> prepared should any other IDs break the 53bit boundary.
>>
>> 3) Which fields will have String representations?
>> The fields which will have String representations are:
>>
>> id (DM, Saved Search, User, List )
>> in_reply_to_status_id
>> in_reply_to_user_id
>> new_id (Streaming only. Will be removed when Snowflake is enabled)
>> current_user_retweet_id (When include_my_retweet=1 is passed)
>>
>> 4) Can you provide a complete Tweet example with Snowflake ID to test?
>>
>> [{"coordinates":null,"truncated":false,"created_at":"Thu Oct 14 22:20:15
>> +0000
>> 2010","favorited":false,"entities":{"urls":[],"hashtags":[],"user_mentions" 
>> :[{"name":"Matt
>> Harris","id":777925,"id_str":"777925","indices":[0,14],"screen_name":"thema 
>> ttharris"}]},"text":"@themattharris
>> hey how are
>> things?","annotations":null,"contributors":[{"id":819797,"id_str":"819797", 
>> "screen_name":"episod"}],"id":10765432100123456789,"id_str":"10765432100123 
>> 456789","retweet_count":0,"geo":null,"retweeted":false,"in_reply_to_user_id 
>> ":777925,"in_reply_to_user_id_str":"777925","in_reply_to_screen_name":"them 
>> attharris","user":{"id":6253282,"id_str":"6253282"},"source":"web","place": 
>> null,"in_reply_to_status_id":10586268426842688951,"in_reply_to_status_id_st 
>> r":"10586268426842688951"}]
>>
>> 5) What is happening with new_id in the Streaming API?
>> new_id and new_id_str will be switched off when or soon after Snowflake is
>> enabled on November 4th.
>>
>> 6) Why not restrict IDs to 53bits?
>>
>> A Snowflake ID is composed:
>>  * 41bits for millisecond precision time (69 years)
>>  * 10bits for a configured machine identity (1024 machines)
>>  * 12bits for a sequence number (4096 per machine)
>>
>> The factor influencing the length of the ID is the time. For a 53bit ID this
>> would mean only 31bits are available for the time. 31bits is only enough for
>> 24 days (2147483648/(1000*60*60*24)) of time.
>>
>> Reducing the resolution of the timestamp would prevent a K-sorted resolution
>> of 1 second or less.
>>
>> Reducing the configured machine identity or sequence number by 1bit would
>> mean we couldn’t scale Twitter, or operate our infrastructure in an
>> uncoordinated high-available way.
>>
>> 7) When will the 53bit Integer overflow happen?
>> 24 days after Snowflake starts counting.
>>
>> 8) Is it safe to parse and store IDs as signed 64bit Integers?
>> Yes.
>>
>> 9) Why offer both the String and Integer versions of the ID?
>> The String representation is needed to ensure languages which cannot convert
>> the >53bit Integer can still use the ID in other API requests.
>>
>> The Integer value is being retained for languages which can handle 
>> numbers>53bit and to prevent applications which have not converted from being
>>
>> cut-off from Twitter.
>>
>>  10) When ID is null what will the _str representation be?
>> The _str representation will also be null.
>>
>> 11) Did you really mean ‘unsigned’ 64bit Integer?
>> Strictly speaking the Snowflake is a signed 64bit long under the hood. That
>> being said, we will never use the negative bit and won’t require the full
>> 64bits for positive numbers for about 69 years:
>>
>> http://www.google.com/search?q=%282**41%29+%2F+%2860*60*24*1000%29+%2...
>>
>> 12) Why not make the strings opt-in?
>> We did consider this as an option but decided against it for a number of
>> reasons. The first reason is that the ID is fundamental to being able to
>> work with the data from the API so receiving the correct ID shouldn’t be
>> something you have to opt into. The second, more influential reason, is that
>> making the _str representations opt-in would create significant, performance
>> affecting issues for the API.
>>
>> On Mon, Oct 18, 2010 at 5:34 PM, themattharris 
>> <[email protected]>wrote:
>>
>>
>>
>>
>>
>>
>>
>>> Thanks to @gotwalt for spotting the missing commas.
>>
>>> Fixed JSON sample ...
>>
>>> [
>>>  {
>>>    "coordinates": null,
>>>    "truncated": false,
>>>    "created_at": "Thu Oct 14 22:20:15 +0000 2010",
>>>    "favorited": false,
>>>    "entities": {
>>>      "urls": [
>>>      ],
>>>      "hashtags": [
>>>      ],
>>>      "user_mentions": [
>>>        {
>>>          "name": "Matt Harris",
>>>          "id": 777925,
>>>          "id_str": "777925",
>>>          "indices": [
>>>            0,
>>>            14
>>>          ],
>>>          "screen_name": "themattharris"
>>>        }
>>>      ]
>>>    },
>>>    "text": "@themattharris hey how are things?",
>>>    "annotations": null,
>>>    "contributors": [
>>>      {
>>>        "id": 819797,
>>>        "id_str": "819797",
>>>        "screen_name": "episod"
>>>      }
>>>    ],
>>>    "id": 12738165059,
>>>    "id_str": "12738165059",
>>>    "retweet_count": 0,
>>>    "geo": null,
>>>    "retweeted": false,
>>>    "in_reply_to_user_id": 777925,
>>>    "in_reply_to_user_id_str": "777925",
>>>    "in_reply_to_screen_name": "themattharris",
>>>    "user": {
>>>      "id": 6253282,
>>>      "id_str": "6253282"
>>>    },
>>>    "source": "web",
>>>    "place": null,
>>>    "in_reply_to_status_id": 12738040524,
>>>    "in_reply_to_status_id_str": "12738040524"
>>>  }
>>> ]
>>
>>> Best,
>>> @themattharris
>>
>>> On Oct 18, 5:19 pm, Matt Harris <[email protected]> wrote:
>>>> Last week you may remember Twitter planned to enable the new Status ID
>>>> generator - 'Snowflake' but didn't. The purpose of this email is to
>>> explain
>>>> the reason why this didn't happen, what we are doing about it, and what
>>> the
>>>> new release plan is.
>>
>>>> So what is Snowflake?
>>>> ------------------------------
>>>> Snowflake is a service we will be using to generate unique Tweet IDs.
>>> These
>>>> Tweet IDs are unique 64bit unsigned integers, which, instead of being
>>>> sequential like the current IDs, are based on time. The full ID is
>>> composed
>>>> of a timestamp, a worker number, and a sequence number.
>>
>>>> The problem
>>>> -----------------
>>>> Before launch it came to our attention that some programming languages
>>> such
>>>> as Javascript cannot support numbers with >53bits. This can be easily
>>>> examined by running a command similar to: (90071992547409921).toString()
>>> in
>>>> your browsers console or by running the following JSON snippet through
>>> your
>>>> JSON parser.
>>
>>>>     {"id": 10765432100123456789, "id_str": "10765432100123456789"}
>>
>>>> In affected JSON parsers the ID will not be converted successfully and
>>> will
>>>> lose accuracy. In some parsers there may even be an exception.
>>
>>>> The solution
>>>> ----------------
>>>> To allow javascript and JSON parsers to read the IDs we need to include a
>>>> string version of any ID when responding in the JSON format. What this
>>> means
>>>> is Status, User, Direct Message and Saved Search IDs in the Twitter API
>>> will
>>>> now be returned as an integer and a string in JSON responses. This will
>>>> apply to the main Twitter API, the Streaming API and the Search API.
>>
>>>> For example, a status object will now contain an id and an id_str. The
>>>> following JSON representation of a status object shows the two versions
>>> of
>>>> the ID fields for each data point.
>>
>>>> [
>>>>   {
>>>>     "coordinates": null,
>>>>     "truncated": false,
>>>>     "created_at": "Thu Oct 14 22:20:15 +0000 2010",
>>>>     "favorited": false,
>>>>     "entities": {
>>>>       "urls": [
>>>>       ],
>>>>       "hashtags": [
>>>>       ],
>>>>       "user_mentions": [
>>>>         {
>>>>           "name": "Matt Harris",
>>>>           "id": 777925,
>>>>           "id_str": "777925",
>>>>           "indices": [
>>>>             0,
>>>>             14
>>>>           ],
>>>>           "screen_name": "themattharris"
>>>>         }
>>>>       ]
>>>>     },
>>>>     "text": "@themattharris hey how are things?",
>>>>     "annotations": null,
>>>>     "contributors": [
>>>>       {
>>>>         "id": 819797,
>>>>         "id_str": "819797",
>>>>         "screen_name": "episod"
>>>>       }
>>>>     ],
>>>>     "id": 12738165059,
>>>>     "id_str": "12738165059",
>>>>     "retweet_count": 0,
>>>>     "geo": null,
>>>>     "retweeted": false,
>>>>     "in_reply_to_user_id": 777925,
>>>>     "in_reply_to_user_id_str": "777925",
>>>>     "in_reply_to_screen_name": "themattharris",
>>>>     "user": {
>>>>       "id": 6253282
>>>>       "id_str": "6253282"
>>>>     },
>>>>     "source": "web",
>>>>     "place": null,
>>>>     "in_reply_to_status_id": 12738040524
>>>>     "in_reply_to_status_id_str": "12738040524"
>>>>   }
>>>> ]
>>
>>>> What should you do - RIGHT NOW
>>>> ----------------------------------------------
>>>> The first thing you should do is attempt to decode the JSON snippet above
>>>> using your production code parser. Observe the output to confirm the ID
>>> has
>>>> not lost accuracy.
>>
>>>> What you do next depends on what happens:
>>
>>>> * If your code converts the ID successfully without losing accuracy you
>>> are
>>>> OK but should consider converting to the _str versions of IDs as soon as
>>>> possible.
>>>> * If your code has lost accuracy, convert your code to using the _str
>>>> version immediately. If you do not do this your...
>>
>> read more »
> 

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Re: [twitter-dev] Re: Snowflake: An update and some very important information

Reply via email to