subject:"\[twitter\-dev\] Re\: Snowflake\: An update and some very important information"

[twitter-dev] Re: Snowflake: An update and some very important information

2010-12-03 Thread TCI

Help.
I had been using BINARY(16) to store tweet id's in MYSQL. I can't find
a good resource to tell me the new data type I should use:
Reading the mysql documentation, I believe it should be unsigned
bigint - but would binary(64) be a better option?

I also read this comment against about BIGINT, with minimal comments.

http://blog.salientdigital.com/2010/11/13/twitter-api-snowflake-and-mysql/

Anybody storing tweet id's in MYSQL care to share what datatype
they're using?

TCI

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-12-03 Thread Tom van der Woerdt


Signed bigint is what Snowflake itself uses.

Tom


On 12/3/10 7:26 PM, TCI wrote:

Help.
I had been using BINARY(16) to store tweet id's in MYSQL. I can't find
a good resource to tell me the new data type I should use:
Reading the mysql documentation, I believe it should be unsigned
bigint - but would binary(64) be a better option?

I also read this comment against about BIGINT, with minimal comments.

http://blog.salientdigital.com/2010/11/13/twitter-api-snowflake-and-mysql/

Anybody storing tweet id's in MYSQL care to share what datatype
they're using?

TCI



--
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

[twitter-dev] Re: Snowflake: An update and some very important information

2010-11-23 Thread DustyReagan

Maybe this is a little naive, and I know you gotta' consider backwards
compatibility, but it seems like a bad idea to rely on status ID for
chronological sorting. If you want tweets displayed in order, sort by
the timestamp. If you get rid of the sorting requirement on the ID.
Why not do what Dean said? Just use a GUID for the ID. Languages
without GUIDs would treat it as a unique string.

It might be a pain to switch to GUIDs, but it sounds like switching to
Snowflake is no walk in the park either.

On Nov 22, 3:42 pm, jough jough.demp...@gmail.com wrote:
 I gather the reason for the 64-bit int type was to maintain some
 backwards-compatibility around the old sequential IDs, so both the old-
 style and Snowflake IDs could be sorted and you could glean that
 smaller IDs are older than larger integers.  U/GUIDs wouldn't be
 sortable in any meaningful fashion.

 - Jough

 On Nov 19, 10:42 pm, dean dean.pou...@gmail.com wrote:







  Why not just use a GUID or UUID type for the ID type (IE:
  3F2504E0-4F89-11D3-9A0C-0305E82C3301)?  This way you're not restricted
  by using a numeric data type that each language could potentially
  define differently.

  For languages that don't directly have a GUID or UUID type, they can
  treat that ID as a string, and the higher level languages can use the
  GUID data type directly.

  On Oct 18, 7:19 pm, Matt Harris thematthar...@twitter.com wrote:

   Last week you may remember Twitter planned to enable the new Status ID
   generator - 'Snowflake' but didn't. The purpose of this email is to 
   explain
   the reason why this didn't happen, what we are doing about it, and what 
   the
   new release plan is.

   So what is Snowflake?
   --
   Snowflake is a service we will be using to generate unique Tweet IDs. 
   These
   Tweet IDs are unique 64bit unsigned integers, which, instead of being
   sequential like the current IDs, are based on time. The full ID is 
   composed
   of a timestamp, a worker number, and a sequence number.

   The problem
   -
   Before launch it came to our attention that some programming languages 
   such
   as Javascript cannot support numbers with 53bits. This can be easily
   examined by running a command similar to: (90071992547409921).toString() 
   in
   your browsers console or by running the following JSON snippet through 
   your
   JSON parser.

       {id: 10765432100123456789, id_str: 10765432100123456789}

   In affected JSON parsers the ID will not be converted successfully and 
   will
   lose accuracy. In some parsers there may even be an exception.

   The solution
   
   To allow javascript and JSON parsers to read the IDs we need to include a
   string version of any ID when responding in the JSON format. What this 
   means
   is Status, User, Direct Message and Saved Search IDs in the Twitter API 
   will
   now be returned as an integer and a string in JSON responses. This will
   apply to the main Twitter API, the Streaming API and the Search API.

   For example, a status object will now contain an id and an id_str. The
   following JSON representation of a status object shows the two versions of
   the ID fields for each data point.

   [
     {
       coordinates: null,
       truncated: false,
       created_at: Thu Oct 14 22:20:15 + 2010,
       favorited: false,
       entities: {
         urls: [
         ],
         hashtags: [
         ],
         user_mentions: [
           {
             name: Matt Harris,
             id: 777925,
             id_str: 777925,
             indices: [
               0,
               14
             ],
             screen_name: themattharris
           }
         ]
       },
       text: @themattharris hey how are things?,
       annotations: null,
       contributors: [
         {
           id: 819797,
           id_str: 819797,
           screen_name: episod
         }
       ],
       id: 12738165059,
       id_str: 12738165059,
       retweet_count: 0,
       geo: null,
       retweeted: false,
       in_reply_to_user_id: 777925,
       in_reply_to_user_id_str: 777925,
       in_reply_to_screen_name: themattharris,
       user: {
         id: 6253282
         id_str: 6253282
       },
       source: web,
       place: null,
       in_reply_to_status_id: 12738040524
       in_reply_to_status_id_str: 12738040524
     }
   ]

   What should you do - RIGHT NOW
   --
   The first thing you should do is attempt to decode the JSON snippet above
   using your production code parser. Observe the output to confirm the ID 
   has
   not lost accuracy.

   What you do next depends on what happens:

   * If your code converts the ID successfully without losing accuracy you 
   are
   OK but should consider converting to the _str versions of IDs as soon as
   possible.
   * If your code has lost accuracy, convert your code to using the _str
   version immediately. If you do not

[twitter-dev] Re: Snowflake: An update and some very important information

2010-11-22 Thread jough

I gather the reason for the 64-bit int type was to maintain some
backwards-compatibility around the old sequential IDs, so both the old-
style and Snowflake IDs could be sorted and you could glean that
smaller IDs are older than larger integers.  U/GUIDs wouldn't be
sortable in any meaningful fashion.

- Jough

On Nov 19, 10:42 pm, dean dean.pou...@gmail.com wrote:
 Why not just use a GUID or UUID type for the ID type (IE:
 3F2504E0-4F89-11D3-9A0C-0305E82C3301)?  This way you're not restricted
 by using a numeric data type that each language could potentially
 define differently.

 For languages that don't directly have a GUID or UUID type, they can
 treat that ID as a string, and the higher level languages can use the
 GUID data type directly.

 On Oct 18, 7:19 pm, Matt Harris thematthar...@twitter.com wrote:

  Last week you may remember Twitter planned to enable the new Status ID
  generator - 'Snowflake' but didn't. The purpose of this email is to explain
  the reason why this didn't happen, what we are doing about it, and what the
  new release plan is.

  So what is Snowflake?
  --
  Snowflake is a service we will be using to generate unique Tweet IDs. These
  Tweet IDs are unique 64bit unsigned integers, which, instead of being
  sequential like the current IDs, are based on time. The full ID is composed
  of a timestamp, a worker number, and a sequence number.

  The problem
  -
  Before launch it came to our attention that some programming languages such
  as Javascript cannot support numbers with 53bits. This can be easily
  examined by running a command similar to: (90071992547409921).toString() in
  your browsers console or by running the following JSON snippet through your
  JSON parser.

      {id: 10765432100123456789, id_str: 10765432100123456789}

  In affected JSON parsers the ID will not be converted successfully and will
  lose accuracy. In some parsers there may even be an exception.

  The solution
  
  To allow javascript and JSON parsers to read the IDs we need to include a
  string version of any ID when responding in the JSON format. What this means
  is Status, User, Direct Message and Saved Search IDs in the Twitter API will
  now be returned as an integer and a string in JSON responses. This will
  apply to the main Twitter API, the Streaming API and the Search API.

  For example, a status object will now contain an id and an id_str. The
  following JSON representation of a status object shows the two versions of
  the ID fields for each data point.

  [
    {
      coordinates: null,
      truncated: false,
      created_at: Thu Oct 14 22:20:15 + 2010,
      favorited: false,
      entities: {
        urls: [
        ],
        hashtags: [
        ],
        user_mentions: [
          {
            name: Matt Harris,
            id: 777925,
            id_str: 777925,
            indices: [
              0,
              14
            ],
            screen_name: themattharris
          }
        ]
      },
      text: @themattharris hey how are things?,
      annotations: null,
      contributors: [
        {
          id: 819797,
          id_str: 819797,
          screen_name: episod
        }
      ],
      id: 12738165059,
      id_str: 12738165059,
      retweet_count: 0,
      geo: null,
      retweeted: false,
      in_reply_to_user_id: 777925,
      in_reply_to_user_id_str: 777925,
      in_reply_to_screen_name: themattharris,
      user: {
        id: 6253282
        id_str: 6253282
      },
      source: web,
      place: null,
      in_reply_to_status_id: 12738040524
      in_reply_to_status_id_str: 12738040524
    }
  ]

  What should you do - RIGHT NOW
  --
  The first thing you should do is attempt to decode the JSON snippet above
  using your production code parser. Observe the output to confirm the ID has
  not lost accuracy.

  What you do next depends on what happens:

  * If your code converts the ID successfully without losing accuracy you are
  OK but should consider converting to the _str versions of IDs as soon as
  possible.
  * If your code has lost accuracy, convert your code to using the _str
  version immediately. If you do not do this your code will be unable to
  interact with the Twitter API reliably.
  * In some language parsers, the JSON may throw an exception when reading the
  ID value. If this happens in your parser you will need to ‘pre-parse’ the
  data, removing or replacing ID parameters with their _str versions.

  Summary
  -
  1) If you develop in Javascript, know that you will have to update your code
  to read the string version instead of the integer version.

  2) If you use a JSON decoder, validate that the example JSON, above, decodes
  without throwing exceptions. If exceptions are thrown, you will need to
  pre-parse the data. Please let us know the name, version, and language of
  the parser which throws

[twitter-dev] Re: Snowflake: An update and some very important information

2010-11-20 Thread dean

Why not just use a GUID or UUID type for the ID type (IE:
3F2504E0-4F89-11D3-9A0C-0305E82C3301)?  This way you're not restricted
by using a numeric data type that each language could potentially
define differently.

For languages that don't directly have a GUID or UUID type, they can
treat that ID as a string, and the higher level languages can use the
GUID data type directly.

On Oct 18, 7:19 pm, Matt Harris thematthar...@twitter.com wrote:
 Last week you may remember Twitter planned to enable the new Status ID
 generator - 'Snowflake' but didn't. The purpose of this email is to explain
 the reason why this didn't happen, what we are doing about it, and what the
 new release plan is.

 So what is Snowflake?
 --
 Snowflake is a service we will be using to generate unique Tweet IDs. These
 Tweet IDs are unique 64bit unsigned integers, which, instead of being
 sequential like the current IDs, are based on time. The full ID is composed
 of a timestamp, a worker number, and a sequence number.

 The problem
 -
 Before launch it came to our attention that some programming languages such
 as Javascript cannot support numbers with 53bits. This can be easily
 examined by running a command similar to: (90071992547409921).toString() in
 your browsers console or by running the following JSON snippet through your
 JSON parser.

     {id: 10765432100123456789, id_str: 10765432100123456789}

 In affected JSON parsers the ID will not be converted successfully and will
 lose accuracy. In some parsers there may even be an exception.

 The solution
 
 To allow javascript and JSON parsers to read the IDs we need to include a
 string version of any ID when responding in the JSON format. What this means
 is Status, User, Direct Message and Saved Search IDs in the Twitter API will
 now be returned as an integer and a string in JSON responses. This will
 apply to the main Twitter API, the Streaming API and the Search API.

 For example, a status object will now contain an id and an id_str. The
 following JSON representation of a status object shows the two versions of
 the ID fields for each data point.

 [
   {
     coordinates: null,
     truncated: false,
     created_at: Thu Oct 14 22:20:15 + 2010,
     favorited: false,
     entities: {
       urls: [
       ],
       hashtags: [
       ],
       user_mentions: [
         {
           name: Matt Harris,
           id: 777925,
           id_str: 777925,
           indices: [
             0,
             14
           ],
           screen_name: themattharris
         }
       ]
     },
     text: @themattharris hey how are things?,
     annotations: null,
     contributors: [
       {
         id: 819797,
         id_str: 819797,
         screen_name: episod
       }
     ],
     id: 12738165059,
     id_str: 12738165059,
     retweet_count: 0,
     geo: null,
     retweeted: false,
     in_reply_to_user_id: 777925,
     in_reply_to_user_id_str: 777925,
     in_reply_to_screen_name: themattharris,
     user: {
       id: 6253282
       id_str: 6253282
     },
     source: web,
     place: null,
     in_reply_to_status_id: 12738040524
     in_reply_to_status_id_str: 12738040524
   }
 ]

 What should you do - RIGHT NOW
 --
 The first thing you should do is attempt to decode the JSON snippet above
 using your production code parser. Observe the output to confirm the ID has
 not lost accuracy.

 What you do next depends on what happens:

 * If your code converts the ID successfully without losing accuracy you are
 OK but should consider converting to the _str versions of IDs as soon as
 possible.
 * If your code has lost accuracy, convert your code to using the _str
 version immediately. If you do not do this your code will be unable to
 interact with the Twitter API reliably.
 * In some language parsers, the JSON may throw an exception when reading the
 ID value. If this happens in your parser you will need to ‘pre-parse’ the
 data, removing or replacing ID parameters with their _str versions.

 Summary
 -
 1) If you develop in Javascript, know that you will have to update your code
 to read the string version instead of the integer version.

 2) If you use a JSON decoder, validate that the example JSON, above, decodes
 without throwing exceptions. If exceptions are thrown, you will need to
 pre-parse the data. Please let us know the name, version, and language of
 the parser which throws the exception so we can investigate.

 Timeline
 ---
 by 22nd October 2010 (Friday): String versions of ID numbers will start
 appearing in the API responses
 4th November 2010 (Thursday) : Snowflake will be turned on but at ~41bit
 length
 26th November 2010 (Friday) : Status IDs will break 53bits in length and
 cease being usable as Integers in Javascript based languages

 We understand this isn’t as seamless a transition as we had planned and
 appreciate for some of

[twitter-dev] Re: Snowflake: An update and some very important information

2010-11-19 Thread mattpaul

+1 for bumping the version number and maintaining backwards
compatibility.

On Oct 20, 2:23 pm, Josh Roesslein jroessl...@gmail.com wrote:
 Isn't the point of having versioned API's so changes can be rolled out w/o
 breaking a much of applications at once?
 Why not increment to version 2 and replace all ID's as strings in the JSON
 format? Keep version 1 around for a few months
 allowing everyone to upgrade and then kill it off. This can also give
 twitter a chance to make any other breaking changes.

 If Twitter is never going to take advantage of the versioning they added
 what is the point of having it?
 I think just creating new fields to avoid versioning issues is unclean and
 messy.

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

[twitter-dev] Re: Snowflake: An update and some very important information

2010-11-11 Thread SM

Thanks, Tom. I also hope the answer to my first question is No.

I asked the 2nd question because in Matt's message that started this
thread, he says:

* If your code converts the ID successfully without losing accuracy
you are
OK but should consider converting to the _str versions of IDs as soon
as
possible.


Why would we consider converting to the _str version if we can already
parse the integers?



On Nov 10, 10:56 pm, Tom van der Woerdt i...@tvdw.eu wrote:
 1) Please don't, I don't want to have to convert everything back to
 integers within my code. I consider the string representation a hack
 around some issues with certain programming languages, and not an
 optimal solution. Wouldn't want this to become the default option.

 2) No

 Tom

 On 11/11/10 6:34 AM, SM wrote:







  Hello. Couple questions:

  1.) Are you planning on eventually eliminating the integer
  representation and only using strings for id's?

  2.) If an application doesn't use Javascript to parse JSON (for
  example, YAJL-OBJC and NSNumbers in Obj-C), is it necessary to make
  any changes at all?

  Thanks.

  On Oct 19, 3:52 pm, Matt Harris thematthar...@twitter.com wrote:
  Hey everyone,

  Thank you to all of you for your questions, patience and contributions to
  this thread.  Hearing your views and knowing how you use the API helps us
  provide more information where there wasn't enough, and clarify details
  where there was ambiguity.

  I've collated the questions i've received from you directly, over Twitter 
  to
  @twitterapi and through this list.  I hope the comments below provide 
  enough
  information to answer those questions and explain the reasoning being our
  decisions.

  Thanks for your support and patience,
  @themattharris

  1) Will search.twitter.com also include id_str and
  in_reply_to_status_id_str?
  Yes, Search will include the String representations of those IDs.

  2) Which fields are affected by this change?
  All IDs which are transmitted as Integers will have a String representation
  in the API response. Only Tweet IDs (which includes mentions and retweets)
  will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and
  Users may change to a Snowflake ID scheme in the future but this isnï¿½t
  planned for this year.

  We are adding String representations of the Integer IDs now so you can
  update all of your code to use the String representations throughout. to
  allow developers to make the change now for all the ID fields and be
  prepared should any other IDs break the 53bit boundary.

  3) Which fields will have String representations?
  The fields which will have String representations are:

  id (DM, Saved Search, User, List )
  in_reply_to_status_id
  in_reply_to_user_id
  new_id (Streaming only. Will be removed when Snowflake is enabled)
  current_user_retweet_id (When include_my_retweet=1 is passed)

  4) Can you provide a complete Tweet example with Snowflake ID to test?

  [{coordinates:null,truncated:false,created_at:Thu Oct 14 22:20:15
  +
  2010,favorited:false,entities:{urls:[],hashtags:[],user_mentions
   :[{name:Matt
  Harris,id:777925,id_str:777925,indices:[0,14],screen_name:thema
   ttharris}]},text:@themattharris
  hey how are
  things?,annotations:null,contributors:[{id:819797,id_str:819797,
   
  screen_name:episod}],id:10765432100123456789,id_str:10765432100123
   
  456789,retweet_count:0,geo:null,retweeted:false,in_reply_to_user_id
   
  :777925,in_reply_to_user_id_str:777925,in_reply_to_screen_name:them
   
  attharris,user:{id:6253282,id_str:6253282},source:web,place:
   
  null,in_reply_to_status_id:10586268426842688951,in_reply_to_status_id_st
   r:10586268426842688951}]

  5) What is happening with new_id in the Streaming API?
  new_id and new_id_str will be switched off when or soon after Snowflake is
  enabled on November 4th.

  6) Why not restrict IDs to 53bits?

  A Snowflake ID is composed:
   * 41bits for millisecond precision time (69 years)
   * 10bits for a configured machine identity (1024 machines)
   * 12bits for a sequence number (4096 per machine)

  The factor influencing the length of the ID is the time. For a 53bit ID 
  this
  would mean only 31bits are available for the time. 31bits is only enough 
  for
  24 days (2147483648/(1000*60*60*24)) of time.

  Reducing the resolution of the timestamp would prevent a K-sorted 
  resolution
  of 1 second or less.

  Reducing the configured machine identity or sequence number by 1bit would
  mean we couldnï¿½t scale Twitter, or operate our infrastructure in an
  uncoordinated high-available way.

  7) When will the 53bit Integer overflow happen?
  24 days after Snowflake starts counting.

  8) Is it safe to parse and store IDs as signed 64bit Integers?
  Yes.

  9) Why offer both the String and Integer versions of the ID?
  The String representation is needed to ensure languages which cannot 
  convert
  the 53bit Integer can still use the ID in other API requests.

  The Integer

[twitter-dev] Re: Snowflake: An update and some very important information

2010-11-10 Thread SM

Hello. Couple questions:

1.) Are you planning on eventually eliminating the integer
representation and only using strings for id's?

2.) If an application doesn't use Javascript to parse JSON (for
example, YAJL-OBJC and NSNumbers in Obj-C), is it necessary to make
any changes at all?

Thanks.



On Oct 19, 3:52 pm, Matt Harris thematthar...@twitter.com wrote:
 Hey everyone,

 Thank you to all of you for your questions, patience and contributions to
 this thread.  Hearing your views and knowing how you use the API helps us
 provide more information where there wasn't enough, and clarify details
 where there was ambiguity.

 I've collated the questions i've received from you directly, over Twitter to
 @twitterapi and through this list.  I hope the comments below provide enough
 information to answer those questions and explain the reasoning being our
 decisions.

 Thanks for your support and patience,
 @themattharris

 1) Will search.twitter.com also include id_str and
 in_reply_to_status_id_str?
 Yes, Search will include the String representations of those IDs.

 2) Which fields are affected by this change?
 All IDs which are transmitted as Integers will have a String representation
 in the API response. Only Tweet IDs (which includes mentions and retweets)
 will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and
 Users may change to a Snowflake ID scheme in the future but this isn’t
 planned for this year.

 We are adding String representations of the Integer IDs now so you can
 update all of your code to use the String representations throughout. to
 allow developers to make the change now for all the ID fields and be
 prepared should any other IDs break the 53bit boundary.

 3) Which fields will have String representations?
 The fields which will have String representations are:

 id (DM, Saved Search, User, List )
 in_reply_to_status_id
 in_reply_to_user_id
 new_id (Streaming only. Will be removed when Snowflake is enabled)
 current_user_retweet_id (When include_my_retweet=1 is passed)

 4) Can you provide a complete Tweet example with Snowflake ID to test?

 [{coordinates:null,truncated:false,created_at:Thu Oct 14 22:20:15
 +
 2010,favorited:false,entities:{urls:[],hashtags:[],user_mentions 
 :[{name:Matt
 Harris,id:777925,id_str:777925,indices:[0,14],screen_name:thema 
 ttharris}]},text:@themattharris
 hey how are
 things?,annotations:null,contributors:[{id:819797,id_str:819797, 
 screen_name:episod}],id:10765432100123456789,id_str:10765432100123 
 456789,retweet_count:0,geo:null,retweeted:false,in_reply_to_user_id 
 :777925,in_reply_to_user_id_str:777925,in_reply_to_screen_name:them 
 attharris,user:{id:6253282,id_str:6253282},source:web,place: 
 null,in_reply_to_status_id:10586268426842688951,in_reply_to_status_id_st 
 r:10586268426842688951}]

 5) What is happening with new_id in the Streaming API?
 new_id and new_id_str will be switched off when or soon after Snowflake is
 enabled on November 4th.

 6) Why not restrict IDs to 53bits?

 A Snowflake ID is composed:
  * 41bits for millisecond precision time (69 years)
  * 10bits for a configured machine identity (1024 machines)
  * 12bits for a sequence number (4096 per machine)

 The factor influencing the length of the ID is the time. For a 53bit ID this
 would mean only 31bits are available for the time. 31bits is only enough for
 24 days (2147483648/(1000*60*60*24)) of time.

 Reducing the resolution of the timestamp would prevent a K-sorted resolution
 of 1 second or less.

 Reducing the configured machine identity or sequence number by 1bit would
 mean we couldn’t scale Twitter, or operate our infrastructure in an
 uncoordinated high-available way.

 7) When will the 53bit Integer overflow happen?
 24 days after Snowflake starts counting.

 8) Is it safe to parse and store IDs as signed 64bit Integers?
 Yes.

 9) Why offer both the String and Integer versions of the ID?
 The String representation is needed to ensure languages which cannot convert
 the 53bit Integer can still use the ID in other API requests.

 The Integer value is being retained for languages which can handle 
 numbers53bit and to prevent applications which have not converted from being

 cut-off from Twitter.

  10) When ID is null what will the _str representation be?
 The _str representation will also be null.

 11) Did you really mean ‘unsigned’ 64bit Integer?
 Strictly speaking the Snowflake is a signed 64bit long under the hood. That
 being said, we will never use the negative bit and won’t require the full
 64bits for positive numbers for about 69 years:

 http://www.google.com/search?q=%282**41%29+%2F+%2860*60*24*1000%29+%2...

 12) Why not make the strings opt-in?
 We did consider this as an option but decided against it for a number of
 reasons. The first reason is that the ID is fundamental to being able to
 work with the data from the API so receiving the correct ID shouldn’t be
 something you have to opt into. The second, more influential

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-11-10 Thread Tom van der Woerdt

1) Please don't, I don't want to have to convert everything back to
integers within my code. I consider the string representation a hack
around some issues with certain programming languages, and not an
optimal solution. Wouldn't want this to become the default option.

2) No

Tom


On 11/11/10 6:34 AM, SM wrote:
 Hello. Couple questions:
 
 1.) Are you planning on eventually eliminating the integer
 representation and only using strings for id's?
 
 2.) If an application doesn't use Javascript to parse JSON (for
 example, YAJL-OBJC and NSNumbers in Obj-C), is it necessary to make
 any changes at all?
 
 Thanks.
 
 
 
 On Oct 19, 3:52 pm, Matt Harris thematthar...@twitter.com wrote:
 Hey everyone,

 Thank you to all of you for your questions, patience and contributions to
 this thread.  Hearing your views and knowing how you use the API helps us
 provide more information where there wasn't enough, and clarify details
 where there was ambiguity.

 I've collated the questions i've received from you directly, over Twitter to
 @twitterapi and through this list.  I hope the comments below provide enough
 information to answer those questions and explain the reasoning being our
 decisions.

 Thanks for your support and patience,
 @themattharris

 1) Will search.twitter.com also include id_str and
 in_reply_to_status_id_str?
 Yes, Search will include the String representations of those IDs.

 2) Which fields are affected by this change?
 All IDs which are transmitted as Integers will have a String representation
 in the API response. Only Tweet IDs (which includes mentions and retweets)
 will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and
 Users may change to a Snowflake ID scheme in the future but this isn’t
 planned for this year.

 We are adding String representations of the Integer IDs now so you can
 update all of your code to use the String representations throughout. to
 allow developers to make the change now for all the ID fields and be
 prepared should any other IDs break the 53bit boundary.

 3) Which fields will have String representations?
 The fields which will have String representations are:

 id (DM, Saved Search, User, List )
 in_reply_to_status_id
 in_reply_to_user_id
 new_id (Streaming only. Will be removed when Snowflake is enabled)
 current_user_retweet_id (When include_my_retweet=1 is passed)

 4) Can you provide a complete Tweet example with Snowflake ID to test?

 [{coordinates:null,truncated:false,created_at:Thu Oct 14 22:20:15
 +
 2010,favorited:false,entities:{urls:[],hashtags:[],user_mentions 
 :[{name:Matt
 Harris,id:777925,id_str:777925,indices:[0,14],screen_name:thema 
 ttharris}]},text:@themattharris
 hey how are
 things?,annotations:null,contributors:[{id:819797,id_str:819797, 
 screen_name:episod}],id:10765432100123456789,id_str:10765432100123 
 456789,retweet_count:0,geo:null,retweeted:false,in_reply_to_user_id 
 :777925,in_reply_to_user_id_str:777925,in_reply_to_screen_name:them 
 attharris,user:{id:6253282,id_str:6253282},source:web,place: 
 null,in_reply_to_status_id:10586268426842688951,in_reply_to_status_id_st 
 r:10586268426842688951}]

 5) What is happening with new_id in the Streaming API?
 new_id and new_id_str will be switched off when or soon after Snowflake is
 enabled on November 4th.

 6) Why not restrict IDs to 53bits?

 A Snowflake ID is composed:
  * 41bits for millisecond precision time (69 years)
  * 10bits for a configured machine identity (1024 machines)
  * 12bits for a sequence number (4096 per machine)

 The factor influencing the length of the ID is the time. For a 53bit ID this
 would mean only 31bits are available for the time. 31bits is only enough for
 24 days (2147483648/(1000*60*60*24)) of time.

 Reducing the resolution of the timestamp would prevent a K-sorted resolution
 of 1 second or less.

 Reducing the configured machine identity or sequence number by 1bit would
 mean we couldn’t scale Twitter, or operate our infrastructure in an
 uncoordinated high-available way.

 7) When will the 53bit Integer overflow happen?
 24 days after Snowflake starts counting.

 8) Is it safe to parse and store IDs as signed 64bit Integers?
 Yes.

 9) Why offer both the String and Integer versions of the ID?
 The String representation is needed to ensure languages which cannot convert
 the 53bit Integer can still use the ID in other API requests.

 The Integer value is being retained for languages which can handle 
 numbers53bit and to prevent applications which have not converted from being

 cut-off from Twitter.

  10) When ID is null what will the _str representation be?
 The _str representation will also be null.

 11) Did you really mean ‘unsigned’ 64bit Integer?
 Strictly speaking the Snowflake is a signed 64bit long under the hood. That
 being said, we will never use the negative bit and won’t require the full
 64bits for positive numbers for about 69 years:

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-25 Thread Johannes la Poutre

@Themattharris: was there any change to the implementation timeline?
Quote: by 22nd October 2010 (Friday): String versions of ID numbers
will start appearing in the API responses

I'm still not seeing id_str, to_user_id_str  and from_user_id_str etc.
in the current search API output, example:

http://search.twitter.com/search.json?geocode=52.155018%2C4.487658%2C1km

is there an updated timeline or did I miss something?

Best,

-- Johannes / @jlapoutre / @tweepsaround


On Oct 19, 2:19 am, Matt Harris thematthar...@twitter.com wrote:
 Last week you may remember Twitter planned to enable the new Status ID
 generator - 'Snowflake' but didn't. The purpose of this email is to explain
 the reason why this didn't happen, what we are doing about it, and what the
 new release plan is.

 So what is Snowflake?
 --
 Snowflake is a service we will be using to generate unique Tweet IDs. These
 Tweet IDs are unique 64bit unsigned integers, which, instead of being
 sequential like the current IDs, are based on time. The full ID is composed
 of a timestamp, a worker number, and a sequence number.

 The problem
 -
 Before launch it came to our attention that some programming languages such
 as Javascript cannot support numbers with 53bits. This can be easily
 examined by running a command similar to: (90071992547409921).toString() in
 your browsers console or by running the following JSON snippet through your
 JSON parser.

     {id: 10765432100123456789, id_str: 10765432100123456789}

 In affected JSON parsers the ID will not be converted successfully and will
 lose accuracy. In some parsers there may even be an exception.

 The solution
 
 To allow javascript and JSON parsers to read the IDs we need to include a
 string version of any ID when responding in the JSON format. What this means
 is Status, User, Direct Message and Saved Search IDs in the Twitter API will
 now be returned as an integer and a string in JSON responses. This will
 apply to the main Twitter API, the Streaming API and the Search API.

 For example, a status object will now contain an id and an id_str. The
 following JSON representation of a status object shows the two versions of
 the ID fields for each data point.

 [
   {
     coordinates: null,
     truncated: false,
     created_at: Thu Oct 14 22:20:15 + 2010,
     favorited: false,
     entities: {
       urls: [
       ],
       hashtags: [
       ],
       user_mentions: [
         {
           name: Matt Harris,
           id: 777925,
           id_str: 777925,
           indices: [
             0,
             14
           ],
           screen_name: themattharris
         }
       ]
     },
     text: @themattharris hey how are things?,
     annotations: null,
     contributors: [
       {
         id: 819797,
         id_str: 819797,
         screen_name: episod
       }
     ],
     id: 12738165059,
     id_str: 12738165059,
     retweet_count: 0,
     geo: null,
     retweeted: false,
     in_reply_to_user_id: 777925,
     in_reply_to_user_id_str: 777925,
     in_reply_to_screen_name: themattharris,
     user: {
       id: 6253282
       id_str: 6253282
     },
     source: web,
     place: null,
     in_reply_to_status_id: 12738040524
     in_reply_to_status_id_str: 12738040524
   }
 ]

 What should you do - RIGHT NOW
 --
 The first thing you should do is attempt to decode the JSON snippet above
 using your production code parser. Observe the output to confirm the ID has
 not lost accuracy.

 What you do next depends on what happens:

 * If your code converts the ID successfully without losing accuracy you are
 OK but should consider converting to the _str versions of IDs as soon as
 possible.
 * If your code has lost accuracy, convert your code to using the _str
 version immediately. If you do not do this your code will be unable to
 interact with the Twitter API reliably.
 * In some language parsers, the JSON may throw an exception when reading the
 ID value. If this happens in your parser you will need to ‘pre-parse’ the
 data, removing or replacing ID parameters with their _str versions.

 Summary
 -
 1) If you develop in Javascript, know that you will have to update your code
 to read the string version instead of the integer version.

 2) If you use a JSON decoder, validate that the example JSON, above, decodes
 without throwing exceptions. If exceptions are thrown, you will need to
 pre-parse the data. Please let us know the name, version, and language of
 the parser which throws the exception so we can investigate.

 Timeline
 ---
 by 22nd October 2010 (Friday): String versions of ID numbers will start
 appearing in the API responses
 4th November 2010 (Thursday) : Snowflake will be turned on but at ~41bit
 length
 26th November 2010 (Friday) : Status IDs will break 53bits in length and
 cease being usable as Integers in Javascript based languages

 We

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-25 Thread Tim

What about methods that list IDs without keys? e.g. blocks/ids which
produces a array like [ 12345, 6789,  ]
These appear to be still cast as integers.

I don't see the point in having integer IDs at all. Surely the case of
people wishing to handle them as integers is far smaller than the case
of people who simply need a unique reference. It is much easier for
the minority to cast string IDs to integers than it is for the rest of
us to do the reverse.

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-25 Thread Slate Smith


It's there. But for some reason now in all types of json requests -

On Oct 25, 2010, at 6:28 AM, Johannes la Poutre wrote:


@Themattharris: was there any change to the implementation timeline?
Quote: by 22nd October 2010 (Friday): String versions of ID numbers
will start appearing in the API responses

I'm still not seeing id_str, to_user_id_str  and from_user_id_str etc.
in the current search API output, example:

http://search.twitter.com/search.json?geocode=52.155018%2C4.487658%2C1km

is there an updated timeline or did I miss something?

Best,

-- Johannes / @jlapoutre / @tweepsaround


On Oct 19, 2:19 am, Matt Harris thematthar...@twitter.com wrote:
Last week you may remember Twitter planned to enable the new Status  
ID
generator - 'Snowflake' but didn't. The purpose of this email is to  
explain
the reason why this didn't happen, what we are doing about it, and  
what the

new release plan is.

So what is Snowflake?
--
Snowflake is a service we will be using to generate unique Tweet  
IDs. These

Tweet IDs are unique 64bit unsigned integers, which, instead of being
sequential like the current IDs, are based on time. The full ID is  
composed

of a timestamp, a worker number, and a sequence number.

The problem
-
Before launch it came to our attention that some programming  
languages such

as Javascript cannot support numbers with 53bits. This can be easily
examined by running a command similar to:  
(90071992547409921).toString() in
your browsers console or by running the following JSON snippet  
through your

JSON parser.

{id: 10765432100123456789, id_str: 10765432100123456789}

In affected JSON parsers the ID will not be converted successfully  
and will

lose accuracy. In some parsers there may even be an exception.

The solution

To allow javascript and JSON parsers to read the IDs we need to  
include a
string version of any ID when responding in the JSON format. What  
this means
is Status, User, Direct Message and Saved Search IDs in the Twitter  
API will
now be returned as an integer and a string in JSON responses. This  
will

apply to the main Twitter API, the Streaming API and the Search API.

For example, a status object will now contain an id and an id_str.  
The
following JSON representation of a status object shows the two  
versions of

the ID fields for each data point.

[
  {
coordinates: null,
truncated: false,
created_at: Thu Oct 14 22:20:15 + 2010,
favorited: false,
entities: {
  urls: [
  ],
  hashtags: [
  ],
  user_mentions: [
{
  name: Matt Harris,
  id: 777925,
  id_str: 777925,
  indices: [
0,
14
  ],
  screen_name: themattharris
}
  ]
},
text: @themattharris hey how are things?,
annotations: null,
contributors: [
  {
id: 819797,
id_str: 819797,
screen_name: episod
  }
],
id: 12738165059,
id_str: 12738165059,
retweet_count: 0,
geo: null,
retweeted: false,
in_reply_to_user_id: 777925,
in_reply_to_user_id_str: 777925,
in_reply_to_screen_name: themattharris,
user: {
  id: 6253282
  id_str: 6253282
},
source: web,
place: null,
in_reply_to_status_id: 12738040524
in_reply_to_status_id_str: 12738040524
  }
]

What should you do - RIGHT NOW
--
The first thing you should do is attempt to decode the JSON snippet  
above
using your production code parser. Observe the output to confirm  
the ID has

not lost accuracy.

What you do next depends on what happens:

* If your code converts the ID successfully without losing accuracy  
you are
OK but should consider converting to the _str versions of IDs as  
soon as

possible.
* If your code has lost accuracy, convert your code to using the _str
version immediately. If you do not do this your code will be unable  
to

interact with the Twitter API reliably.
* In some language parsers, the JSON may throw an exception when  
reading the
ID value. If this happens in your parser you will need to ‘pre- 
parse’ the

data, removing or replacing ID parameters with their _str versions.

Summary
-
1) If you develop in Javascript, know that you will have to update  
your code

to read the string version instead of the integer version.

2) If you use a JSON decoder, validate that the example JSON,  
above, decodes
without throwing exceptions. If exceptions are thrown, you will  
need to
pre-parse the data. Please let us know the name, version, and  
language of

the parser which throws the exception so we can investigate.

Timeline
---
by 22nd October 2010 (Friday): String versions of ID numbers will  
start

appearing in the API responses
4th November 2010 (Thursday) : Snowflake will be turned on but at  
~41bit

length
26th November 2010 (Friday) : Status IDs will break

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-25 Thread Matt Harris

The Search API hasn't rolled those fields in yet but they are due soon. The
API does include these fields but as Tim has highlighted the xyz/ids methods
don't yet have a String version.

The engineering teams are aware of this and are working on it. I'll post an
update once those additional responses have the fields.

Thanks for noticing they were missing and commenting on the thread.

@themattharris
Developer Advocate, Twitter
http://twitter.com/themattharris


On Mon, Oct 25, 2010 at 3:28 AM, Johannes la Poutre jsixp...@gmail.comwrote:

 @Themattharris: was there any change to the implementation timeline?
 Quote: by 22nd October 2010 (Friday): String versions of ID numbers
 will start appearing in the API responses

 I'm still not seeing id_str, to_user_id_str  and from_user_id_str etc.
 in the current search API output, example:

 http://search.twitter.com/search.json?geocode=52.155018%2C4.487658%2C1km

 is there an updated timeline or did I miss something?

 Best,

 -- Johannes / @jlapoutre / @tweepsaround


 On Oct 19, 2:19 am, Matt Harris thematthar...@twitter.com wrote:
  Last week you may remember Twitter planned to enable the new Status ID
  generator - 'Snowflake' but didn't. The purpose of this email is to
 explain
  the reason why this didn't happen, what we are doing about it, and what
 the
  new release plan is.
 
  So what is Snowflake?
  --
  Snowflake is a service we will be using to generate unique Tweet IDs.
 These
  Tweet IDs are unique 64bit unsigned integers, which, instead of being
  sequential like the current IDs, are based on time. The full ID is
 composed
  of a timestamp, a worker number, and a sequence number.
 
  The problem
  -
  Before launch it came to our attention that some programming languages
 such
  as Javascript cannot support numbers with 53bits. This can be easily
  examined by running a command similar to: (90071992547409921).toString()
 in
  your browsers console or by running the following JSON snippet through
 your
  JSON parser.
 
  {id: 10765432100123456789, id_str: 10765432100123456789}
 
  In affected JSON parsers the ID will not be converted successfully and
 will
  lose accuracy. In some parsers there may even be an exception.
 
  The solution
  
  To allow javascript and JSON parsers to read the IDs we need to include a
  string version of any ID when responding in the JSON format. What this
 means
  is Status, User, Direct Message and Saved Search IDs in the Twitter API
 will
  now be returned as an integer and a string in JSON responses. This will
  apply to the main Twitter API, the Streaming API and the Search API.
 
  For example, a status object will now contain an id and an id_str. The
  following JSON representation of a status object shows the two versions
 of
  the ID fields for each data point.
 
  [
{
  coordinates: null,
  truncated: false,
  created_at: Thu Oct 14 22:20:15 + 2010,
  favorited: false,
  entities: {
urls: [
],
hashtags: [
],
user_mentions: [
  {
name: Matt Harris,
id: 777925,
id_str: 777925,
indices: [
  0,
  14
],
screen_name: themattharris
  }
]
  },
  text: @themattharris hey how are things?,
  annotations: null,
  contributors: [
{
  id: 819797,
  id_str: 819797,
  screen_name: episod
}
  ],
  id: 12738165059,
  id_str: 12738165059,
  retweet_count: 0,
  geo: null,
  retweeted: false,
  in_reply_to_user_id: 777925,
  in_reply_to_user_id_str: 777925,
  in_reply_to_screen_name: themattharris,
  user: {
id: 6253282
id_str: 6253282
  },
  source: web,
  place: null,
  in_reply_to_status_id: 12738040524
  in_reply_to_status_id_str: 12738040524
}
  ]
 
  What should you do - RIGHT NOW
  --
  The first thing you should do is attempt to decode the JSON snippet above
  using your production code parser. Observe the output to confirm the ID
 has
  not lost accuracy.
 
  What you do next depends on what happens:
 
  * If your code converts the ID successfully without losing accuracy you
 are
  OK but should consider converting to the _str versions of IDs as soon as
  possible.
  * If your code has lost accuracy, convert your code to using the _str
  version immediately. If you do not do this your code will be unable to
  interact with the Twitter API reliably.
  * In some language parsers, the JSON may throw an exception when reading
 the
  ID value. If this happens in your parser you will need to ‘pre-parse’ the
  data, removing or replacing ID parameters with their _str versions.
 
  Summary
  -
  1) If you develop in Javascript, know that you will have to update your
 code
  to read the string

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-25 Thread Matt Harris

Hey everyone,

Further to this mornings email, the Search API now contains the _str
representations of it's IDs.

Best
@themattharris
Developer Advocate, Twitter
http://twitter.com/themattharris


On Mon, Oct 25, 2010 at 1:33 PM, Matt Harris thematthar...@twitter.comwrote:

 The Search API hasn't rolled those fields in yet but they are due soon. The
 API does include these fields but as Tim has highlighted the xyz/ids methods
 don't yet have a String version.

 The engineering teams are aware of this and are working on it. I'll post an
 update once those additional responses have the fields.

 Thanks for noticing they were missing and commenting on the thread.

 @themattharris
 Developer Advocate, Twitter
 http://twitter.com/themattharris


 On Mon, Oct 25, 2010 at 3:28 AM, Johannes la Poutre jsixp...@gmail.comwrote:

 @Themattharris: was there any change to the implementation timeline?
 Quote: by 22nd October 2010 (Friday): String versions of ID numbers
 will start appearing in the API responses

 I'm still not seeing id_str, to_user_id_str  and from_user_id_str etc.
 in the current search API output, example:

 http://search.twitter.com/search.json?geocode=52.155018%2C4.487658%2C1km

 is there an updated timeline or did I miss something?

 Best,

 -- Johannes / @jlapoutre / @tweepsaround


 On Oct 19, 2:19 am, Matt Harris thematthar...@twitter.com wrote:
  Last week you may remember Twitter planned to enable the new Status ID
  generator - 'Snowflake' but didn't. The purpose of this email is to
 explain
  the reason why this didn't happen, what we are doing about it, and what
 the
  new release plan is.
 
  So what is Snowflake?
  --
  Snowflake is a service we will be using to generate unique Tweet IDs.
 These
  Tweet IDs are unique 64bit unsigned integers, which, instead of being
  sequential like the current IDs, are based on time. The full ID is
 composed
  of a timestamp, a worker number, and a sequence number.
 
  The problem
  -
  Before launch it came to our attention that some programming languages
 such
  as Javascript cannot support numbers with 53bits. This can be easily
  examined by running a command similar to: (90071992547409921).toString()
 in
  your browsers console or by running the following JSON snippet through
 your
  JSON parser.
 
  {id: 10765432100123456789, id_str: 10765432100123456789}
 
  In affected JSON parsers the ID will not be converted successfully and
 will
  lose accuracy. In some parsers there may even be an exception.
 
  The solution
  
  To allow javascript and JSON parsers to read the IDs we need to include
 a
  string version of any ID when responding in the JSON format. What this
 means
  is Status, User, Direct Message and Saved Search IDs in the Twitter API
 will
  now be returned as an integer and a string in JSON responses. This will
  apply to the main Twitter API, the Streaming API and the Search API.
 
  For example, a status object will now contain an id and an id_str. The
  following JSON representation of a status object shows the two versions
 of
  the ID fields for each data point.
 
  [
{
  coordinates: null,
  truncated: false,
  created_at: Thu Oct 14 22:20:15 + 2010,
  favorited: false,
  entities: {
urls: [
],
hashtags: [
],
user_mentions: [
  {
name: Matt Harris,
id: 777925,
id_str: 777925,
indices: [
  0,
  14
],
screen_name: themattharris
  }
]
  },
  text: @themattharris hey how are things?,
  annotations: null,
  contributors: [
{
  id: 819797,
  id_str: 819797,
  screen_name: episod
}
  ],
  id: 12738165059,
  id_str: 12738165059,
  retweet_count: 0,
  geo: null,
  retweeted: false,
  in_reply_to_user_id: 777925,
  in_reply_to_user_id_str: 777925,
  in_reply_to_screen_name: themattharris,
  user: {
id: 6253282
id_str: 6253282
  },
  source: web,
  place: null,
  in_reply_to_status_id: 12738040524
  in_reply_to_status_id_str: 12738040524
}
  ]
 
  What should you do - RIGHT NOW
  --
  The first thing you should do is attempt to decode the JSON snippet
 above
  using your production code parser. Observe the output to confirm the ID
 has
  not lost accuracy.
 
  What you do next depends on what happens:
 
  * If your code converts the ID successfully without losing accuracy you
 are
  OK but should consider converting to the _str versions of IDs as soon as
  possible.
  * If your code has lost accuracy, convert your code to using the _str
  version immediately. If you do not do this your code will be unable to
  interact with the Twitter API reliably.
  * In some language parsers, the JSON may throw an exception when

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-22 Thread Reivax

What's the point? That would still require a change in the JSON
output, to replace integers with strings.

On 21 oct, 15:38, David Nicol davidni...@gmail.com wrote:
 Unless I did the math wrong, a 64 bit quantity is expressable in

 (64 * log(2)) / log(62) = 10.7487219

 eleven characters drawn from A-Za-z0-9

 and they can still be sortable!

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-21 Thread David Nicol

Unless I did the math wrong, a 64 bit quantity is expressable in

(64 * log(2)) / log(62) = 10.7487219

eleven characters drawn from A-Za-z0-9

and they can still be sortable!

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-20 Thread M. Edward (Ed) Borasky


6) Why not restrict IDs to 53bits?

A Snowflake ID is composed:
 * 41bits for millisecond precision time (69 years)
 * 10bits for a configured machine identity (1024 machines)
 * 12bits for a sequence number (4096 per machine)

The factor influencing the length of the ID is the time. For a 53bit ID this
would mean only 31bits are available for the time. 31bits is only enough for
24 days (2147483648/(1000*60*60*24)) of time.

Reducing the resolution of the timestamp would prevent a K-sorted resolution
of 1 second or less.

Reducing the configured machine identity or sequence number by 1bit would
mean we couldn’t scale Twitter, or operate our infrastructure in an
uncoordinated high-available way.


Interesting ... so you have the theoretical capacity to scale to 2**22  
(about 4 million) tweets per millisecond? Even 4 million tweets a  
second seems unrealistic, as does a single machine only being able  
to generate 4096 IDs. I think if you're really expecting this kind of  
volume, the FPGA vendors probably can help you out. We are talking  
clocks and counters, here, right, not Javascript interpreters or  
robust linear regressions? ;-)


Ah, well, I'll check back on you guys in 69 years to see how you're  
holding up. ;-)


--
M. Edward (Ed) Borasky
http://borasky-research.net http://twitter.com/znmeb

A mathematician is a device for turning coffee into theorems. - Paul Erdos


--
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-20 Thread Navin Kabra

On Oct 19, 5:19 am, Matt Harris thematthar...@twitter.com wrote:
 So what is Snowflake?
 --
 Snowflake is a service we will be using to generate unique Tweet IDs. These
 Tweet IDs are unique 64bit unsigned integers, which, instead of being
 sequential like the current IDs, are based on time. The full ID is composed
 of a timestamp, a worker number, and a sequence number.

Will the new IDs be monotonically increasing? If my programming
language can handle 64bit integers, can I compare two IDs and expect
that the lower ID represents an older tweet (approximately)?

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-20 Thread Tom van der Woerdt

Yes, you can.

Tom


On Oct 20, 2010, at 1:37 PM, Navin  Kabra navin.ka...@gmail.com wrote:

 On Oct 19, 5:19 am, Matt Harris thematthar...@twitter.com wrote:
 So what is Snowflake?
 --
 Snowflake is a service we will be using to generate unique Tweet IDs. These
 Tweet IDs are unique 64bit unsigned integers, which, instead of being
 sequential like the current IDs, are based on time. The full ID is composed
 of a timestamp, a worker number, and a sequence number.
 
 Will the new IDs be monotonically increasing? If my programming
 language can handle 64bit integers, can I compare two IDs and expect
 that the lower ID represents an older tweet (approximately)?
 
 -- 
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group: 
 http://groups.google.com/group/twitter-development-talk

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-20 Thread CodeWarden

Matt,

In your example you show an unsigned long that uses the full 64 bits
bits, which fails to be properly represented on a Java platform.   In
a later explanation you say that there will only ever be 63 bits
used.  Can you confirm one way or the other?   i.e. will we ever see
an unsigned integer that will take more then 63 bits to represent?

-Paul


On Oct 19, 7:52 pm, Matt Harris thematthar...@twitter.com wrote:
 Hey everyone,

 Thank you to all of you for your questions, patience and contributions to
 this thread.  Hearing your views and knowing how you use the API helps us
 provide more information where there wasn't enough, and clarify details
 where there was ambiguity.

 I've collated the questions i've received from you directly, over Twitter to
 @twitterapi and through this list.  I hope the comments below provide enough
 information to answer those questions and explain the reasoning being our
 decisions.

 Thanks for your support and patience,
 @themattharris

 1) Will search.twitter.com also include id_str and
 in_reply_to_status_id_str?
 Yes, Search will include the String representations of those IDs.

 2) Which fields are affected by this change?
 All IDs which are transmitted as Integers will have a String representation
 in the API response. Only Tweet IDs (which includes mentions and retweets)
 will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and
 Users may change to a Snowflake ID scheme in the future but this isn’t
 planned for this year.

 We are adding String representations of the Integer IDs now so you can
 update all of your code to use the String representations throughout. to
 allow developers to make the change now for all the ID fields and be
 prepared should any other IDs break the 53bit boundary.

 3) Which fields will have String representations?
 The fields which will have String representations are:

 id (DM, Saved Search, User, List )
 in_reply_to_status_id
 in_reply_to_user_id
 new_id (Streaming only. Will be removed when Snowflake is enabled)
 current_user_retweet_id (When include_my_retweet=1 is passed)

 4) Can you provide a complete Tweet example with Snowflake ID to test?

 [{coordinates:null,truncated:false,created_at:Thu Oct 14 22:20:15
 +
 2010,favorited:false,entities:{urls:[],hashtags:[],user_mentions 
 :[{name:Matt
 Harris,id:777925,id_str:777925,indices:[0,14],screen_name:thema 
 ttharris}]},text:@themattharris
 hey how are
 things?,annotations:null,contributors:[{id:819797,id_str:819797, 
 screen_name:episod}],id:10765432100123456789,id_str:10765432100123 
 456789,retweet_count:0,geo:null,retweeted:false,in_reply_to_user_id 
 :777925,in_reply_to_user_id_str:777925,in_reply_to_screen_name:them 
 attharris,user:{id:6253282,id_str:6253282},source:web,place: 
 null,in_reply_to_status_id:10586268426842688951,in_reply_to_status_id_st 
 r:10586268426842688951}]

 5) What is happening with new_id in the Streaming API?
 new_id and new_id_str will be switched off when or soon after Snowflake is
 enabled on November 4th.

 6) Why not restrict IDs to 53bits?

 A Snowflake ID is composed:
  * 41bits for millisecond precision time (69 years)
  * 10bits for a configured machine identity (1024 machines)
  * 12bits for a sequence number (4096 per machine)

 The factor influencing the length of the ID is the time. For a 53bit ID this
 would mean only 31bits are available for the time. 31bits is only enough for
 24 days (2147483648/(1000*60*60*24)) of time.

 Reducing the resolution of the timestamp would prevent a K-sorted resolution
 of 1 second or less.

 Reducing the configured machine identity or sequence number by 1bit would
 mean we couldn’t scale Twitter, or operate our infrastructure in an
 uncoordinated high-available way.

 7) When will the 53bit Integer overflow happen?
 24 days after Snowflake starts counting.

 8) Is it safe to parse and store IDs as signed 64bit Integers?
 Yes.

 9) Why offer both the String and Integer versions of the ID?
 The String representation is needed to ensure languages which cannot convert
 the 53bit Integer can still use the ID in other API requests.

 The Integer value is being retained for languages which can handle 
 numbers53bit and to prevent applications which have not converted from being

 cut-off from Twitter.

  10) When ID is null what will the _str representation be?
 The _str representation will also be null.

 11) Did you really mean ‘unsigned’ 64bit Integer?
 Strictly speaking the Snowflake is a signed 64bit long under the hood. That
 being said, we will never use the negative bit and won’t require the full
 64bits for positive numbers for about 69 years:

 http://www.google.com/search?q=%282**41%29+%2F+%2860*60*24*1000%29+%2...

 12) Why not make the strings opt-in?
 We did consider this as an option but decided against it for a number of
 reasons. The first reason is that the ID is fundamental to being able to
 work with the data from the API so receiving the correct ID shouldn’t be

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-20 Thread Josh Roesslein

Isn't the point of having versioned API's so changes can be rolled out w/o
breaking a much of applications at once?
Why not increment to version 2 and replace all ID's as strings in the JSON
format? Keep version 1 around for a few months
allowing everyone to upgrade and then kill it off. This can also give
twitter a chance to make any other breaking changes.

If Twitter is never going to take advantage of the versioning they added
what is the point of having it?
I think just creating new fields to avoid versioning issues is unclean and
messy.

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread jon

Hi,

Will user ids be generated by snowflake in the near future?  Is it
safe to parse and store them as signed 64bit integers?

Thanks.

On Oct 18, 8:34 pm, themattharris thematthar...@twitter.com wrote:
 Thanks to @gotwalt for spotting the missing commas.

 Fixed JSON sample ...

 [
   {
     coordinates: null,
     truncated: false,
     created_at: Thu Oct 14 22:20:15 + 2010,
     favorited: false,
     entities: {
       urls: [
       ],
       hashtags: [
       ],
       user_mentions: [
         {
           name: Matt Harris,
           id: 777925,
           id_str: 777925,
           indices: [
             0,
             14
           ],
           screen_name: themattharris
         }
       ]
     },
     text: @themattharris hey how are things?,
     annotations: null,
     contributors: [
       {
         id: 819797,
         id_str: 819797,
         screen_name: episod
       }
     ],
     id: 12738165059,
     id_str: 12738165059,
     retweet_count: 0,
     geo: null,
     retweeted: false,
     in_reply_to_user_id: 777925,
     in_reply_to_user_id_str: 777925,
     in_reply_to_screen_name: themattharris,
     user: {
       id: 6253282,
       id_str: 6253282
     },
     source: web,
     place: null,
     in_reply_to_status_id: 12738040524,
     in_reply_to_status_id_str: 12738040524
   }
 ]

 Best,
 @themattharris

 On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:



  Last week you may remember Twitter planned to enable the new Status ID
  generator - 'Snowflake' but didn't. The purpose of this email is to explain
  the reason why this didn't happen, what we are doing about it, and what the
  new release plan is.

  So what is Snowflake?
  --
  Snowflake is a service we will be using to generate unique Tweet IDs. These
  Tweet IDs are unique 64bit unsigned integers, which, instead of being
  sequential like the current IDs, are based on time. The full ID is composed
  of a timestamp, a worker number, and a sequence number.

  The problem
  -
  Before launch it came to our attention that some programming languages such
  as Javascript cannot support numbers with 53bits. This can be easily
  examined by running a command similar to: (90071992547409921).toString() in
  your browsers console or by running the following JSON snippet through your
  JSON parser.

      {id: 10765432100123456789, id_str: 10765432100123456789}

  In affected JSON parsers the ID will not be converted successfully and will
  lose accuracy. In some parsers there may even be an exception.

  The solution
  
  To allow javascript and JSON parsers to read the IDs we need to include a
  string version of any ID when responding in the JSON format. What this means
  is Status, User, Direct Message and Saved Search IDs in the Twitter API will
  now be returned as an integer and a string in JSON responses. This will
  apply to the main Twitter API, the Streaming API and the Search API.

  For example, a status object will now contain an id and an id_str. The
  following JSON representation of a status object shows the two versions of
  the ID fields for each data point.

  [
    {
      coordinates: null,
      truncated: false,
      created_at: Thu Oct 14 22:20:15 + 2010,
      favorited: false,
      entities: {
        urls: [
        ],
        hashtags: [
        ],
        user_mentions: [
          {
            name: Matt Harris,
            id: 777925,
            id_str: 777925,
            indices: [
              0,
              14
            ],
            screen_name: themattharris
          }
        ]
      },
      text: @themattharris hey how are things?,
      annotations: null,
      contributors: [
        {
          id: 819797,
          id_str: 819797,
          screen_name: episod
        }
      ],
      id: 12738165059,
      id_str: 12738165059,
      retweet_count: 0,
      geo: null,
      retweeted: false,
      in_reply_to_user_id: 777925,
      in_reply_to_user_id_str: 777925,
      in_reply_to_screen_name: themattharris,
      user: {
        id: 6253282
        id_str: 6253282
      },
      source: web,
      place: null,
      in_reply_to_status_id: 12738040524
      in_reply_to_status_id_str: 12738040524
    }
  ]

  What should you do - RIGHT NOW
  --
  The first thing you should do is attempt to decode the JSON snippet above
  using your production code parser. Observe the output to confirm the ID has
  not lost accuracy.

  What you do next depends on what happens:

  * If your code converts the ID successfully without losing accuracy you are
  OK but should consider converting to the _str versions of IDs as soon as
  possible.
  * If your code has lost accuracy, convert your code to using the _str
  version immediately. If you do not do this your code will be unable to
  interact with the Twitter API reliably.
  * In some language parsers, the

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread jon

Hi,

You wrote that the IDs are unsigned 64 bit ints, but the IdWorker is
pumping out java Longs which are signed. I'm assuming that was a
typo, but please clarify.

http://github.com/twitter/snowflake/blob/master/src/main/scala/com/twitter/service/snowflake/IdWorker.scala

Thanks,

- Jon

On Oct 18, 8:19 pm, Matt Harris thematthar...@twitter.com wrote:
Last week you may remember Twitter planned to enable the new Status ID
generator - 'Snowflake' but didn't. The purpose of this email is to explain
the reason why this didn't happen, what we are doing about it, and what the
new release plan is.

So what is Snowflake?
--
Snowflake is a service we will be using to generate unique Tweet IDs. These
Tweet IDs are unique 64bit unsigned integers, which, instead of being
sequential like the current IDs, are based on time. The full ID is composed
of a timestamp, a worker number, and a sequence number.

The problem
-
Before launch it came to our attention that some programming languages such
as Javascript cannot support numbers with 53bits. This can be easily
examined by running a command similar to: (90071992547409921).toString() in
your browsers console or by running the following JSON snippet through your
JSON parser.

{id: 10765432100123456789, id_str: 10765432100123456789}

In affected JSON parsers the ID will not be converted successfully and will
lose accuracy. In some parsers there may even be an exception.

The solution

To allow javascript and JSON parsers to read the IDs we need to include a
string version of any ID when responding in the JSON format. What this means
is Status, User, Direct Message and Saved Search IDs in the Twitter API will
now be returned as an integer and a string in JSON responses. This will
apply to the main Twitter API, the Streaming API and the Search API.

For example, a status object will now contain an id and an id_str. The
following JSON representation of a status object shows the two versions of
the ID fields for each data point.

[
{
coordinates: null,
truncated: false,
created_at: Thu Oct 14 22:20:15 + 2010,
favorited: false,
entities: {
urls: [
],
hashtags: [
],
user_mentions: [
{
name: Matt Harris,
id: 777925,
id_str: 777925,
indices: [
0,
14
],
screen_name: themattharris
}
]
},
text: @themattharris hey how are things?,
annotations: null,
contributors: [
{
id: 819797,
id_str: 819797,
screen_name: episod
}
],
id: 12738165059,
id_str: 12738165059,
retweet_count: 0,
geo: null,
retweeted: false,
in_reply_to_user_id: 777925,
in_reply_to_user_id_str: 777925,
in_reply_to_screen_name: themattharris,
user: {
id: 6253282
id_str: 6253282
},
source: web,
place: null,
in_reply_to_status_id: 12738040524
in_reply_to_status_id_str: 12738040524
}
]

What should you do - RIGHT NOW
--
The first thing you should do is attempt to decode the JSON snippet above
using your production code parser. Observe the output to confirm the ID has
not lost accuracy.

What you do next depends on what happens:

* If your code converts the ID successfully without losing accuracy you are
OK but should consider converting to the _str versions of IDs as soon as
possible.
* If your code has lost accuracy, convert your code to using the _str
version immediately. If you do not do this your code will be unable to
interact with the Twitter API reliably.
* In some language parsers, the JSON may throw an exception when reading the
ID value. If this happens in your parser you will need to ‘pre-parse’ the
data, removing or replacing ID parameters with their _str versions.

Summary
-
1) If you develop in Javascript, know that you will have to update your code
to read the string version instead of the integer version.

2) If you use a JSON decoder, validate that the example JSON, above, decodes
without throwing exceptions. If exceptions are thrown, you will need to
pre-parse the data. Please let us know the name, version, and language of
the parser which throws the exception so we can investigate.

Timeline
---
by 22nd October 2010 (Friday): String versions of ID numbers will start
appearing in the API responses
4th November 2010 (Thursday) : Snowflake will be turned on but at ~41bit
length
26th November 2010 (Friday) : Status IDs will break 53bits in length and
cease being usable as Integers in Javascript based languages

We understand this isn’t as seamless a transition as we had planned and
appreciate for some of you this change requires an update to your code.
We’ve tried to give as much time as

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Detroitpro

Out of curiosity; what advantage is there of including both the int
and the string? Why not only offer the string and put the parsing on
the client side?

Thanks,
@detroitpro

On Oct 18, 8:34 pm, themattharris thematthar...@twitter.com wrote:
 Thanks to @gotwalt for spotting the missing commas.

 Fixed JSON sample ...

 [
   {
     coordinates: null,
     truncated: false,
     created_at: Thu Oct 14 22:20:15 + 2010,
     favorited: false,
     entities: {
       urls: [
       ],
       hashtags: [
       ],
       user_mentions: [
         {
           name: Matt Harris,
           id: 777925,
           id_str: 777925,
           indices: [
             0,
             14
           ],
           screen_name: themattharris
         }
       ]
     },
     text: @themattharris hey how are things?,
     annotations: null,
     contributors: [
       {
         id: 819797,
         id_str: 819797,
         screen_name: episod
       }
     ],
     id: 12738165059,
     id_str: 12738165059,
     retweet_count: 0,
     geo: null,
     retweeted: false,
     in_reply_to_user_id: 777925,
     in_reply_to_user_id_str: 777925,
     in_reply_to_screen_name: themattharris,
     user: {
       id: 6253282,
       id_str: 6253282
     },
     source: web,
     place: null,
     in_reply_to_status_id: 12738040524,
     in_reply_to_status_id_str: 12738040524
   }
 ]

 Best,
 @themattharris

 On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:

  Last week you may remember Twitter planned to enable the new Status ID
  generator - 'Snowflake' but didn't. The purpose of this email is to explain
  the reason why this didn't happen, what we are doing about it, and what the
  new release plan is.

  So what is Snowflake?
  --
  Snowflake is a service we will be using to generate unique Tweet IDs. These
  Tweet IDs are unique 64bit unsigned integers, which, instead of being
  sequential like the current IDs, are based on time. The full ID is composed
  of a timestamp, a worker number, and a sequence number.

  The problem
  -
  Before launch it came to our attention that some programming languages such
  as Javascript cannot support numbers with 53bits. This can be easily
  examined by running a command similar to: (90071992547409921).toString() in
  your browsers console or by running the following JSON snippet through your
  JSON parser.

      {id: 10765432100123456789, id_str: 10765432100123456789}

  In affected JSON parsers the ID will not be converted successfully and will
  lose accuracy. In some parsers there may even be an exception.

  The solution
  
  To allow javascript and JSON parsers to read the IDs we need to include a
  string version of any ID when responding in the JSON format. What this means
  is Status, User, Direct Message and Saved Search IDs in the Twitter API will
  now be returned as an integer and a string in JSON responses. This will
  apply to the main Twitter API, the Streaming API and the Search API.

  For example, a status object will now contain an id and an id_str. The
  following JSON representation of a status object shows the two versions of
  the ID fields for each data point.

  [
    {
      coordinates: null,
      truncated: false,
      created_at: Thu Oct 14 22:20:15 + 2010,
      favorited: false,
      entities: {
        urls: [
        ],
        hashtags: [
        ],
        user_mentions: [
          {
            name: Matt Harris,
            id: 777925,
            id_str: 777925,
            indices: [
              0,
              14
            ],
            screen_name: themattharris
          }
        ]
      },
      text: @themattharris hey how are things?,
      annotations: null,
      contributors: [
        {
          id: 819797,
          id_str: 819797,
          screen_name: episod
        }
      ],
      id: 12738165059,
      id_str: 12738165059,
      retweet_count: 0,
      geo: null,
      retweeted: false,
      in_reply_to_user_id: 777925,
      in_reply_to_user_id_str: 777925,
      in_reply_to_screen_name: themattharris,
      user: {
        id: 6253282
        id_str: 6253282
      },
      source: web,
      place: null,
      in_reply_to_status_id: 12738040524
      in_reply_to_status_id_str: 12738040524
    }
  ]

  What should you do - RIGHT NOW
  --
  The first thing you should do is attempt to decode the JSON snippet above
  using your production code parser. Observe the output to confirm the ID has
  not lost accuracy.

  What you do next depends on what happens:

  * If your code converts the ID successfully without losing accuracy you are
  OK but should consider converting to the _str versions of IDs as soon as
  possible.
  * If your code has lost accuracy, convert your code to using the _str
  version immediately. If you do not do this your code will be unable to
  interact with the Twitter API

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Reivax

Hi

In the case where an id is null (as in in_reply_to_status_id:null )
what will the value of in_reply_to_status_id_str be ?

Thanks
Xavier

On 19 oct, 02:34, themattharris thematthar...@twitter.com wrote:
 Thanks to @gotwalt for spotting the missing commas.

 Fixed JSON sample ...

 [
   {
     coordinates: null,
     truncated: false,
     created_at: Thu Oct 14 22:20:15 + 2010,
     favorited: false,
     entities: {
       urls: [
       ],
       hashtags: [
       ],
       user_mentions: [
         {
           name: Matt Harris,
           id: 777925,
           id_str: 777925,
           indices: [
             0,
             14
           ],
           screen_name: themattharris
         }
       ]
     },
     text: @themattharris hey how are things?,
     annotations: null,
     contributors: [
       {
         id: 819797,
         id_str: 819797,
         screen_name: episod
       }
     ],
     id: 12738165059,
     id_str: 12738165059,
     retweet_count: 0,
     geo: null,
     retweeted: false,
     in_reply_to_user_id: 777925,
     in_reply_to_user_id_str: 777925,
     in_reply_to_screen_name: themattharris,
     user: {
       id: 6253282,
       id_str: 6253282
     },
     source: web,
     place: null,
     in_reply_to_status_id: 12738040524,
     in_reply_to_status_id_str: 12738040524
   }
 ]

 Best,
 @themattharris

 On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:

  Last week you may remember Twitter planned to enable the new Status ID
  generator - 'Snowflake' but didn't. The purpose of this email is to explain
  the reason why this didn't happen, what we are doing about it, and what the
  new release plan is.

  So what is Snowflake?
  --
  Snowflake is a service we will be using to generate unique Tweet IDs. These
  Tweet IDs are unique 64bit unsigned integers, which, instead of being
  sequential like the current IDs, are based on time. The full ID is composed
  of a timestamp, a worker number, and a sequence number.

  The problem
  -
  Before launch it came to our attention that some programming languages such
  as Javascript cannot support numbers with 53bits. This can be easily
  examined by running a command similar to: (90071992547409921).toString() in
  your browsers console or by running the following JSON snippet through your
  JSON parser.

      {id: 10765432100123456789, id_str: 10765432100123456789}

  In affected JSON parsers the ID will not be converted successfully and will
  lose accuracy. In some parsers there may even be an exception.

  The solution
  
  To allow javascript and JSON parsers to read the IDs we need to include a
  string version of any ID when responding in the JSON format. What this means
  is Status, User, Direct Message and Saved Search IDs in the Twitter API will
  now be returned as an integer and a string in JSON responses. This will
  apply to the main Twitter API, the Streaming API and the Search API.

  For example, a status object will now contain an id and an id_str. The
  following JSON representation of a status object shows the two versions of
  the ID fields for each data point.

  [
    {
      coordinates: null,
      truncated: false,
      created_at: Thu Oct 14 22:20:15 + 2010,
      favorited: false,
      entities: {
        urls: [
        ],
        hashtags: [
        ],
        user_mentions: [
          {
            name: Matt Harris,
            id: 777925,
            id_str: 777925,
            indices: [
              0,
              14
            ],
            screen_name: themattharris
          }
        ]
      },
      text: @themattharris hey how are things?,
      annotations: null,
      contributors: [
        {
          id: 819797,
          id_str: 819797,
          screen_name: episod
        }
      ],
      id: 12738165059,
      id_str: 12738165059,
      retweet_count: 0,
      geo: null,
      retweeted: false,
      in_reply_to_user_id: 777925,
      in_reply_to_user_id_str: 777925,
      in_reply_to_screen_name: themattharris,
      user: {
        id: 6253282
        id_str: 6253282
      },
      source: web,
      place: null,
      in_reply_to_status_id: 12738040524
      in_reply_to_status_id_str: 12738040524
    }
  ]

  What should you do - RIGHT NOW
  --
  The first thing you should do is attempt to decode the JSON snippet above
  using your production code parser. Observe the output to confirm the ID has
  not lost accuracy.

  What you do next depends on what happens:

  * If your code converts the ID successfully without losing accuracy you are
  OK but should consider converting to the _str versions of IDs as soon as
  possible.
  * If your code has lost accuracy, convert your code to using the _str
  version immediately. If you do not do this your code will be unable to
  interact with the Twitter API reliably.
  * In some language parsers, the

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Tom van der Woerdt

Because a lot of Twitter applications (including my own) would go crazy
immediately.

Tom


On 10/19/10 6:09 AM, Detroitpro wrote:
 Out of curiosity; what advantage is there of including both the int
 and the string? Why not only offer the string and put the parsing on
 the client side?
 
 Thanks,
 @detroitpro
 
 On Oct 18, 8:34 pm, themattharris thematthar...@twitter.com wrote:
 Thanks to @gotwalt for spotting the missing commas.

 Fixed JSON sample ...

 [
   {
 coordinates: null,
 truncated: false,
 created_at: Thu Oct 14 22:20:15 + 2010,
 favorited: false,
 entities: {
   urls: [
   ],
   hashtags: [
   ],
   user_mentions: [
 {
   name: Matt Harris,
   id: 777925,
   id_str: 777925,
   indices: [
 0,
 14
   ],
   screen_name: themattharris
 }
   ]
 },
 text: @themattharris hey how are things?,
 annotations: null,
 contributors: [
   {
 id: 819797,
 id_str: 819797,
 screen_name: episod
   }
 ],
 id: 12738165059,
 id_str: 12738165059,
 retweet_count: 0,
 geo: null,
 retweeted: false,
 in_reply_to_user_id: 777925,
 in_reply_to_user_id_str: 777925,
 in_reply_to_screen_name: themattharris,
 user: {
   id: 6253282,
   id_str: 6253282
 },
 source: web,
 place: null,
 in_reply_to_status_id: 12738040524,
 in_reply_to_status_id_str: 12738040524
   }
 ]

 Best,
 @themattharris

 On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:

 Last week you may remember Twitter planned to enable the new Status ID
 generator - 'Snowflake' but didn't. The purpose of this email is to explain
 the reason why this didn't happen, what we are doing about it, and what the
 new release plan is.

 So what is Snowflake?
 --
 Snowflake is a service we will be using to generate unique Tweet IDs. These
 Tweet IDs are unique 64bit unsigned integers, which, instead of being
 sequential like the current IDs, are based on time. The full ID is composed
 of a timestamp, a worker number, and a sequence number.

 The problem
 -
 Before launch it came to our attention that some programming languages such
 as Javascript cannot support numbers with 53bits. This can be easily
 examined by running a command similar to: (90071992547409921).toString() in
 your browsers console or by running the following JSON snippet through your
 JSON parser.

 {id: 10765432100123456789, id_str: 10765432100123456789}

 In affected JSON parsers the ID will not be converted successfully and will
 lose accuracy. In some parsers there may even be an exception.

 The solution
 
 To allow javascript and JSON parsers to read the IDs we need to include a
 string version of any ID when responding in the JSON format. What this means
 is Status, User, Direct Message and Saved Search IDs in the Twitter API will
 now be returned as an integer and a string in JSON responses. This will
 apply to the main Twitter API, the Streaming API and the Search API.

 For example, a status object will now contain an id and an id_str. The
 following JSON representation of a status object shows the two versions of
 the ID fields for each data point.

 [
   {
 coordinates: null,
 truncated: false,
 created_at: Thu Oct 14 22:20:15 + 2010,
 favorited: false,
 entities: {
   urls: [
   ],
   hashtags: [
   ],
   user_mentions: [
 {
   name: Matt Harris,
   id: 777925,
   id_str: 777925,
   indices: [
 0,
 14
   ],
   screen_name: themattharris
 }
   ]
 },
 text: @themattharris hey how are things?,
 annotations: null,
 contributors: [
   {
 id: 819797,
 id_str: 819797,
 screen_name: episod
   }
 ],
 id: 12738165059,
 id_str: 12738165059,
 retweet_count: 0,
 geo: null,
 retweeted: false,
 in_reply_to_user_id: 777925,
 in_reply_to_user_id_str: 777925,
 in_reply_to_screen_name: themattharris,
 user: {
   id: 6253282
   id_str: 6253282
 },
 source: web,
 place: null,
 in_reply_to_status_id: 12738040524
 in_reply_to_status_id_str: 12738040524
   }
 ]

 What should you do - RIGHT NOW
 --
 The first thing you should do is attempt to decode the JSON snippet above
 using your production code parser. Observe the output to confirm the ID has
 not lost accuracy.

 What you do next depends on what happens:

 * If your code converts the ID successfully without losing accuracy you are
 OK but should consider converting to the _str versions of IDs as soon as
 possible.
 * If your code has lost accuracy, convert your code to using the _str
 version immediately. If you do not do this your code will be

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Tom van der Woerdt

Probably null as well.

Tom


On 10/19/10 11:49 AM, Reivax wrote:
 Hi
 
 In the case where an id is null (as in in_reply_to_status_id:null )
 what will the value of in_reply_to_status_id_str be ?
 
 Thanks
 Xavier
 
 On 19 oct, 02:34, themattharris thematthar...@twitter.com wrote:
 Thanks to @gotwalt for spotting the missing commas.

 Fixed JSON sample ...

 [
   {
 coordinates: null,
 truncated: false,
 created_at: Thu Oct 14 22:20:15 + 2010,
 favorited: false,
 entities: {
   urls: [
   ],
   hashtags: [
   ],
   user_mentions: [
 {
   name: Matt Harris,
   id: 777925,
   id_str: 777925,
   indices: [
 0,
 14
   ],
   screen_name: themattharris
 }
   ]
 },
 text: @themattharris hey how are things?,
 annotations: null,
 contributors: [
   {
 id: 819797,
 id_str: 819797,
 screen_name: episod
   }
 ],
 id: 12738165059,
 id_str: 12738165059,
 retweet_count: 0,
 geo: null,
 retweeted: false,
 in_reply_to_user_id: 777925,
 in_reply_to_user_id_str: 777925,
 in_reply_to_screen_name: themattharris,
 user: {
   id: 6253282,
   id_str: 6253282
 },
 source: web,
 place: null,
 in_reply_to_status_id: 12738040524,
 in_reply_to_status_id_str: 12738040524
   }
 ]

 Best,
 @themattharris

 On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:

 Last week you may remember Twitter planned to enable the new Status ID
 generator - 'Snowflake' but didn't. The purpose of this email is to explain
 the reason why this didn't happen, what we are doing about it, and what the
 new release plan is.

 So what is Snowflake?
 --
 Snowflake is a service we will be using to generate unique Tweet IDs. These
 Tweet IDs are unique 64bit unsigned integers, which, instead of being
 sequential like the current IDs, are based on time. The full ID is composed
 of a timestamp, a worker number, and a sequence number.

 The problem
 -
 Before launch it came to our attention that some programming languages such
 as Javascript cannot support numbers with 53bits. This can be easily
 examined by running a command similar to: (90071992547409921).toString() in
 your browsers console or by running the following JSON snippet through your
 JSON parser.

 {id: 10765432100123456789, id_str: 10765432100123456789}

 In affected JSON parsers the ID will not be converted successfully and will
 lose accuracy. In some parsers there may even be an exception.

 The solution
 
 To allow javascript and JSON parsers to read the IDs we need to include a
 string version of any ID when responding in the JSON format. What this means
 is Status, User, Direct Message and Saved Search IDs in the Twitter API will
 now be returned as an integer and a string in JSON responses. This will
 apply to the main Twitter API, the Streaming API and the Search API.

 For example, a status object will now contain an id and an id_str. The
 following JSON representation of a status object shows the two versions of
 the ID fields for each data point.

 [
   {
 coordinates: null,
 truncated: false,
 created_at: Thu Oct 14 22:20:15 + 2010,
 favorited: false,
 entities: {
   urls: [
   ],
   hashtags: [
   ],
   user_mentions: [
 {
   name: Matt Harris,
   id: 777925,
   id_str: 777925,
   indices: [
 0,
 14
   ],
   screen_name: themattharris
 }
   ]
 },
 text: @themattharris hey how are things?,
 annotations: null,
 contributors: [
   {
 id: 819797,
 id_str: 819797,
 screen_name: episod
   }
 ],
 id: 12738165059,
 id_str: 12738165059,
 retweet_count: 0,
 geo: null,
 retweeted: false,
 in_reply_to_user_id: 777925,
 in_reply_to_user_id_str: 777925,
 in_reply_to_screen_name: themattharris,
 user: {
   id: 6253282
   id_str: 6253282
 },
 source: web,
 place: null,
 in_reply_to_status_id: 12738040524
 in_reply_to_status_id_str: 12738040524
   }
 ]

 What should you do - RIGHT NOW
 --
 The first thing you should do is attempt to decode the JSON snippet above
 using your production code parser. Observe the output to confirm the ID has
 not lost accuracy.

 What you do next depends on what happens:

 * If your code converts the ID successfully without losing accuracy you are
 OK but should consider converting to the _str versions of IDs as soon as
 possible.
 * If your code has lost accuracy, convert your code to using the _str
 version immediately. If you do not do this your code will be unable to
 interact with the Twitter API reliably.
 * In some language parsers, the JSON may throw an

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Dan Checkoway

I'm also patiently awaiting a response from twitter about this. Are the ids
sane for 64-bit *signed* long?

Dan

On Mon, Oct 18, 2010 at 9:08 PM, jon jonhoff...@gmail.com wrote:

Hi,

You wrote that the IDs are unsigned 64 bit ints, but the IdWorker is
pumping out java Longs which are signed. I'm assuming that was a
typo, but please clarify.

http://github.com/twitter/snowflake/blob/master/src/main/scala/com/twitter/service/snowflake/IdWorker.scala

Thanks,

- Jon

On Oct 18, 8:19 pm, Matt Harris thematthar...@twitter.com wrote:
Last week you may remember Twitter planned to enable the new Status ID
generator - 'Snowflake' but didn't. The purpose of this email is to
explain
the reason why this didn't happen, what we are doing about it, and what
the
new release plan is.

So what is Snowflake?
--
Snowflake is a service we will be using to generate unique Tweet IDs.
These
Tweet IDs are unique 64bit unsigned integers, which, instead of being
sequential like the current IDs, are based on time. The full ID is
composed
of a timestamp, a worker number, and a sequence number.

The problem
-
Before launch it came to our attention that some programming languages
such
as Javascript cannot support numbers with 53bits. This can be easily
examined by running a command similar to: (90071992547409921).toString()
in
your browsers console or by running the following JSON snippet through
your
JSON parser.

{id: 10765432100123456789, id_str: 10765432100123456789}

In affected JSON parsers the ID will not be converted successfully and
will
lose accuracy. In some parsers there may even be an exception.

The solution

To allow javascript and JSON parsers to read the IDs we need to include a
string version of any ID when responding in the JSON format. What this
means
is Status, User, Direct Message and Saved Search IDs in the Twitter API
will
now be returned as an integer and a string in JSON responses. This will
apply to the main Twitter API, the Streaming API and the Search API.

For example, a status object will now contain an id and an id_str. The
following JSON representation of a status object shows the two versions
of
the ID fields for each data point.

What you do next depends on what happens:

* If your code converts the ID successfully without losing accuracy you
are
OK but should consider converting to the _str versions of IDs as soon as
possible.
* If your code has lost accuracy, convert your code to using the _str
version immediately. If you do not do this your code will be unable to
interact with the Twitter API reliably.
* In some language parsers, the JSON may throw an exception when reading
the
ID value. If this happens in your parser you will need to ‘pre-parse’ the
data, removing or replacing ID parameters with their _str versions.

Summary
-
1) If you develop in Javascript, know that you will have to update your
code
to read the string version instead of the integer version.

2) If you use a JSON decoder, validate that the example JSON, above,
decodes
without throwing exceptions. If exceptions are thrown, you will need to
pre-parse the data. Please let us know the name, version, and language of
the parser which throws the exception so we can investigate.

Timeline
---
by 22nd October 2010 (Friday): String versions of ID numbers will start
appearing in the API responses
4th November 2010 (Thursday) : Snowflake will be turned on but at ~41bit

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Craig Hockenberry

This approach feels wrong to me. The red flag is the duplication of
data within the payload: in 30+ years of professional development,
I've never seen that work out well.

The root of the problem is that you've chosen to deliver data in a
format (JSON) that can't support integers with a value greater than
2^53 bits. And some of your data uses 2^64 bits.

The result is that you're working around the problem in a language by
using a string. Avoiding the root problem will encumber you with
legacy that you'll regret later.

Look at your proposed solution from a different point-of-view: say you
have a language that can't handle Unicode well (e.g. BASIC or Ruby.)
Would you solve this problem by adding another field called
text_ascii?

text: @themattharris hey how are things in København?.
text_ascii: @themattharris hey how are things in Kobenhavn?.

Seems silly, yet that is exactly what you're doing for Javascript and
long integers.

A part of this legacy in your payload is future confusion for
developers. Someone new to the Twitter API is going to be confused as
to why your ID values have both numeric and string representations.
And smart developers are going to lean towards the numeric
representation:

* 8 bytes of storage for 10765432100123456789 instead of 20 bytes.
* Faster sorting (less data to compare.)
* Correct sorting: 011 and 10 have different order depending on
whether you're sorting the string or numeric representation.

They'll eventually pay the price for choosing incorrectly.

Every ID in the API is going to need documentation as a result. For
example, are place IDs affected by this change? And what about the IDs
returned by the Search API? (there's no mention of since_id_str and
max_id_str above.)

Losing consistency with the XML format is also a problem. Unless
you're planning on adding _str elements to the XML payload, you're
presenting developers with a one-way street. A consumer of JSON
id_str can't  easily change the format of data they want to consume.

In my mind, you really only have two good choices at this point:

1) Limit Snowflake's ID space to 2^53 bits. Easier for developers,
harder for Twitter.

2) Make all Twitter IDs into strings. Easier for Twitter, harder for
developers.

The second choice is obviously more disruptive, but if you really need
the ID space, it's the right one. Even if it means I need to make
major changes to my code.


On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:
 Last week you may remember Twitter planned to enable the new Status ID
 generator - 'Snowflake' but didn't. The purpose of this email is to explain
 the reason why this didn't happen, what we are doing about it, and what the
 new release plan is.

 So what is Snowflake?
 --
 Snowflake is a service we will be using to generate unique Tweet IDs. These
 Tweet IDs are unique 64bit unsigned integers, which, instead of being
 sequential like the current IDs, are based on time. The full ID is composed
 of a timestamp, a worker number, and a sequence number.

 The problem
 -
 Before launch it came to our attention that some programming languages such
 as Javascript cannot support numbers with 53bits. This can be easily
 examined by running a command similar to: (90071992547409921).toString() in
 your browsers console or by running the following JSON snippet through your
 JSON parser.

     {id: 10765432100123456789, id_str: 10765432100123456789}

 In affected JSON parsers the ID will not be converted successfully and will
 lose accuracy. In some parsers there may even be an exception.

 The solution
 
 To allow javascript and JSON parsers to read the IDs we need to include a
 string version of any ID when responding in the JSON format. What this means
 is Status, User, Direct Message and Saved Search IDs in the Twitter API will
 now be returned as an integer and a string in JSON responses. This will
 apply to the main Twitter API, the Streaming API and the Search API.

 For example, a status object will now contain an id and an id_str. The
 following JSON representation of a status object shows the two versions of
 the ID fields for each data point.

 [
   {
     coordinates: null,
     truncated: false,
     created_at: Thu Oct 14 22:20:15 + 2010,
     favorited: false,
     entities: {
       urls: [
       ],
       hashtags: [
       ],
       user_mentions: [
         {
           name: Matt Harris,
           id: 777925,
           id_str: 777925,
           indices: [
             0,
             14
           ],
           screen_name: themattharris
         }
       ]
     },
     text: @themattharris hey how are things?,
     annotations: null,
     contributors: [
       {
         id: 819797,
         id_str: 819797,
         screen_name: episod
       }
     ],
     id: 12738165059,
     id_str: 12738165059,
     retweet_count: 0,
     geo: null,
     retweeted: false,
     in_reply_to_user_id: 777925,

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Hayes Davis

I did some investigation into the snowflake algorithm recently and yes, it's
safe for 64bit signed longs. Even if Twitter moved away from using
scala/java longs internally (which are definitely signed), you'd still have
something like 65 years from now before the algorithm rolled past the 2^63-1
barrier.

I've posted a a gist[1] in ruby with a few little methods for playing with
the time part of a snowflake id if you're interested.

Hayes

1 http://gist.github.com/634586

On Tue, Oct 19, 2010 at 6:27 AM, Dan Checkoway dchecko...@gmail.com wrote:

I'm also patiently awaiting a response from twitter about this. Are the
ids sane for 64-bit *signed* long?

Dan

On Mon, Oct 18, 2010 at 9:08 PM, jon jonhoff...@gmail.com wrote:

Hi,

You wrote that the IDs are unsigned 64 bit ints, but the IdWorker is
pumping out java Longs which are signed. I'm assuming that was a
typo, but please clarify.

http://github.com/twitter/snowflake/blob/master/src/main/scala/com/twitter/service/snowflake/IdWorker.scala

Thanks,

- Jon

On Oct 18, 8:19 pm, Matt Harris thematthar...@twitter.com wrote:
Last week you may remember Twitter planned to enable the new Status ID
generator - 'Snowflake' but didn't. The purpose of this email is to
explain
the reason why this didn't happen, what we are doing about it, and what
the
new release plan is.

So what is Snowflake?
--
Snowflake is a service we will be using to generate unique Tweet IDs.
These
Tweet IDs are unique 64bit unsigned integers, which, instead of being
sequential like the current IDs, are based on time. The full ID is
composed
of a timestamp, a worker number, and a sequence number.

The problem
-
Before launch it came to our attention that some programming languages
such
as Javascript cannot support numbers with 53bits. This can be easily
examined by running a command similar to: (90071992547409921).toString()
in
your browsers console or by running the following JSON snippet through
your
JSON parser.

{id: 10765432100123456789, id_str: 10765432100123456789}

In affected JSON parsers the ID will not be converted successfully and
will
lose accuracy. In some parsers there may even be an exception.

The solution

To allow javascript and JSON parsers to read the IDs we need to include
a
string version of any ID when responding in the JSON format. What this
means
is Status, User, Direct Message and Saved Search IDs in the Twitter API
will
now be returned as an integer and a string in JSON responses. This will
apply to the main Twitter API, the Streaming API and the Search API.

For example, a status object will now contain an id and an id_str. The
following JSON representation of a status object shows the two versions
of
the ID fields for each data point.

What should you do - RIGHT NOW
--
The first thing you should do is attempt to decode the JSON snippet
above
using your production code parser. Observe the output to confirm the ID
has
not lost accuracy.

What you do next depends on what happens:

* If your code converts the ID successfully without losing accuracy you
are
OK but should consider converting to the _str versions of IDs as soon as
possible.
* If your code has lost accuracy, convert your code to using the _str
version immediately. If you do not do this your code will be unable to
interact with the Twitter API reliably.
* In some language parsers, the JSON may throw an exception when reading
the
ID value. If this happens in your parser you will need to ‘pre-parse’
the
data, removing or replacing ID parameters with their _str versions.

Summary
-
1) If you develop in Javascript, know that you will have to update your
code
to

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Tom van der Woerdt

I wouldn't blame this on JSON, because it's not JSON that has the
problems, but JavaScript. All of my Objective-C apps that communicate
use JSON as well, and they don't have the limitation. The issue does not
apply to XML either - there's no type specification in XML.

As far as I know, this issue will only cause trouble for a few
applications that work with JavaScript and depend on the IDs a lot.

My suggestion to solve this issue would be to introduce an additional
parameter (just like include_rts, just with a different name) that turns
all IDs into strings. No extra fields, just an additional optional
parameter. Won't cause trouble for the applications that can't parse it
and requires minimal implementation effort for developers.

I hope I'm not too late with my suggestion :-)

Tom


On 10/19/10 7:10 PM, Craig Hockenberry wrote:
 This approach feels wrong to me. The red flag is the duplication of
 data within the payload: in 30+ years of professional development,
 I've never seen that work out well.
 
 The root of the problem is that you've chosen to deliver data in a
 format (JSON) that can't support integers with a value greater than
 2^53 bits. And some of your data uses 2^64 bits.
 
 The result is that you're working around the problem in a language by
 using a string. Avoiding the root problem will encumber you with
 legacy that you'll regret later.
 
 Look at your proposed solution from a different point-of-view: say you
 have a language that can't handle Unicode well (e.g. BASIC or Ruby.)
 Would you solve this problem by adding another field called
 text_ascii?
 
 text: @themattharris hey how are things in København?.
 text_ascii: @themattharris hey how are things in Kobenhavn?.
 
 Seems silly, yet that is exactly what you're doing for Javascript and
 long integers.
 
 A part of this legacy in your payload is future confusion for
 developers. Someone new to the Twitter API is going to be confused as
 to why your ID values have both numeric and string representations.
 And smart developers are going to lean towards the numeric
 representation:
 
 * 8 bytes of storage for 10765432100123456789 instead of 20 bytes.
 * Faster sorting (less data to compare.)
 * Correct sorting: 011 and 10 have different order depending on
 whether you're sorting the string or numeric representation.
 
 They'll eventually pay the price for choosing incorrectly.
 
 Every ID in the API is going to need documentation as a result. For
 example, are place IDs affected by this change? And what about the IDs
 returned by the Search API? (there's no mention of since_id_str and
 max_id_str above.)
 
 Losing consistency with the XML format is also a problem. Unless
 you're planning on adding _str elements to the XML payload, you're
 presenting developers with a one-way street. A consumer of JSON
 id_str can't  easily change the format of data they want to consume.
 
 In my mind, you really only have two good choices at this point:
 
 1) Limit Snowflake's ID space to 2^53 bits. Easier for developers,
 harder for Twitter.
 
 2) Make all Twitter IDs into strings. Easier for Twitter, harder for
 developers.
 
 The second choice is obviously more disruptive, but if you really need
 the ID space, it's the right one. Even if it means I need to make
 major changes to my code.
 
 
 On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:
 Last week you may remember Twitter planned to enable the new Status ID
 generator - 'Snowflake' but didn't. The purpose of this email is to explain
 the reason why this didn't happen, what we are doing about it, and what the
 new release plan is.

 So what is Snowflake?
 --
 Snowflake is a service we will be using to generate unique Tweet IDs. These
 Tweet IDs are unique 64bit unsigned integers, which, instead of being
 sequential like the current IDs, are based on time. The full ID is composed
 of a timestamp, a worker number, and a sequence number.

 The problem
 -
 Before launch it came to our attention that some programming languages such
 as Javascript cannot support numbers with 53bits. This can be easily
 examined by running a command similar to: (90071992547409921).toString() in
 your browsers console or by running the following JSON snippet through your
 JSON parser.

 {id: 10765432100123456789, id_str: 10765432100123456789}

 In affected JSON parsers the ID will not be converted successfully and will
 lose accuracy. In some parsers there may even be an exception.

 The solution
 
 To allow javascript and JSON parsers to read the IDs we need to include a
 string version of any ID when responding in the JSON format. What this means
 is Status, User, Direct Message and Saved Search IDs in the Twitter API will
 now be returned as an integer and a string in JSON responses. This will
 apply to the main Twitter API, the Streaming API and the Search API.

 For example, a status object will now contain an id and an id_str. The

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread M. Edward (Ed) Borasky

With all due respect, the root of the problem is that computer  
scientists think in terms of abstract machines with infinitely-wide  
registers, infinitely many addressable RAM cells, etc., and business  
people think in terms of human populations and their tweet rates  
growing geometrically for all time. Journalists believe neither of  
these. ;-) And neither assumption is realistic, which is why we have  
to make decisions like this from time to time, and why sometimes we  
predict disasters like Y2K or 32-bit machines crumbling in 2038 that  
don't actually happen. ;-)


So - for Twitter: what is your *realistic* projection for when a  
53-bit integer ID will overflow? What are the underlying assumptions  
about human population growth, spread of Twitter, revenue models,  
competition, etc.? I know this is all highly confidential, so for sake  
of argument, assume current tweet rates per user and the goal your  
executives have stated of a billion users, with a plateau at that  
point. The question I'm asking is whether you *really* need 64-bit  
integer IDs for tweets or for users. ;-)


By the way, I ask similar questions of all the big data geeks out  
there - so many naked emperors, so little time. ;-)


--
M. Edward (Ed) Borasky
http://borasky-research.net http://twitter.com/znmeb

A mathematician is a device for turning coffee into theorems. - Paul Erdos


Quoting Craig Hockenberry craig.hockenbe...@gmail.com:


This approach feels wrong to me. The red flag is the duplication of
data within the payload: in 30+ years of professional development,
I've never seen that work out well.

The root of the problem is that you've chosen to deliver data in a
format (JSON) that can't support integers with a value greater than
2^53 bits. And some of your data uses 2^64 bits.

The result is that you're working around the problem in a language by
using a string. Avoiding the root problem will encumber you with
legacy that you'll regret later.

Look at your proposed solution from a different point-of-view: say you
have a language that can't handle Unicode well (e.g. BASIC or Ruby.)
Would you solve this problem by adding another field called
text_ascii?

text: @themattharris hey how are things in København?.
text_ascii: @themattharris hey how are things in Kobenhavn?.

Seems silly, yet that is exactly what you're doing for Javascript and
long integers.

A part of this legacy in your payload is future confusion for
developers. Someone new to the Twitter API is going to be confused as
to why your ID values have both numeric and string representations.
And smart developers are going to lean towards the numeric
representation:

* 8 bytes of storage for 10765432100123456789 instead of 20 bytes.
* Faster sorting (less data to compare.)
* Correct sorting: 011 and 10 have different order depending on
whether you're sorting the string or numeric representation.

They'll eventually pay the price for choosing incorrectly.

Every ID in the API is going to need documentation as a result. For
example, are place IDs affected by this change? And what about the IDs
returned by the Search API? (there's no mention of since_id_str and
max_id_str above.)

Losing consistency with the XML format is also a problem. Unless
you're planning on adding _str elements to the XML payload, you're
presenting developers with a one-way street. A consumer of JSON
id_str can't  easily change the format of data they want to consume.

In my mind, you really only have two good choices at this point:

1) Limit Snowflake's ID space to 2^53 bits. Easier for developers,
harder for Twitter.

2) Make all Twitter IDs into strings. Easier for Twitter, harder for
developers.

The second choice is obviously more disruptive, but if you really need
the ID space, it's the right one. Even if it means I need to make
major changes to my code.


On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:

Last week you may remember Twitter planned to enable the new Status ID
generator - 'Snowflake' but didn't. The purpose of this email is to explain
the reason why this didn't happen, what we are doing about it, and what the
new release plan is.

So what is Snowflake?
--
Snowflake is a service we will be using to generate unique Tweet IDs. These
Tweet IDs are unique 64bit unsigned integers, which, instead of being
sequential like the current IDs, are based on time. The full ID is composed
of a timestamp, a worker number, and a sequence number.

The problem
-
Before launch it came to our attention that some programming languages such
as Javascript cannot support numbers with 53bits. This can be easily
examined by running a command similar to: (90071992547409921).toString() in
your browsers console or by running the following JSON snippet through your
JSON parser.

    {id: 10765432100123456789, id_str: 10765432100123456789}

In affected JSON parsers the ID will not be converted successfully and will
lose accuracy. In some

Re: [twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Tom van der Woerdt

You normally don't need all 53 bits - but as snowflake simply skips a
few IDs every now and then, you get a lot more IDs.

Chance of it actually reaching 53 bits? I'd say that it happens at the
end of November... Friday the 26th?

Tom


On 10/19/10 8:04 PM, M. Edward (Ed) Borasky wrote:
 With all due respect, the root of the problem is that computer
 scientists think in terms of abstract machines with infinitely-wide
 registers, infinitely many addressable RAM cells, etc., and business
 people think in terms of human populations and their tweet rates
 growing geometrically for all time. Journalists believe neither of
 these. ;-) And neither assumption is realistic, which is why we have to
 make decisions like this from time to time, and why sometimes we predict
 disasters like Y2K or 32-bit machines crumbling in 2038 that don't
 actually happen. ;-)
 
 So - for Twitter: what is your *realistic* projection for when a 53-bit
 integer ID will overflow? What are the underlying assumptions about
 human population growth, spread of Twitter, revenue models, competition,
 etc.? I know this is all highly confidential, so for sake of argument,
 assume current tweet rates per user and the goal your executives have
 stated of a billion users, with a plateau at that point. The question
 I'm asking is whether you *really* need 64-bit integer IDs for tweets or
 for users. ;-)
 
 By the way, I ask similar questions of all the big data geeks out
 there - so many naked emperors, so little time. ;-)
 

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Craig Hockenberry

This still gives you some API legacy that you need to maintain, but
it's a much cleaner approach than what is currently being proposed.

-ch

On Oct 19, 10:39 am, Tom van der Woerdt i...@tvdw.eu wrote:
 I wouldn't blame this on JSON, because it's not JSON that has the
 problems, but JavaScript. All of my Objective-C apps that communicate
 use JSON as well, and they don't have the limitation. The issue does not
 apply to XML either - there's no type specification in XML.

 As far as I know, this issue will only cause trouble for a few
 applications that work with JavaScript and depend on the IDs a lot.

 My suggestion to solve this issue would be to introduce an additional
 parameter (just like include_rts, just with a different name) that turns
 all IDs into strings. No extra fields, just an additional optional
 parameter. Won't cause trouble for the applications that can't parse it
 and requires minimal implementation effort for developers.

 I hope I'm not too late with my suggestion :-)

 Tom

 On 10/19/10 7:10 PM, Craig Hockenberry wrote:



  This approach feels wrong to me. The red flag is the duplication of
  data within the payload: in 30+ years of professional development,
  I've never seen that work out well.

  The root of the problem is that you've chosen to deliver data in a
  format (JSON) that can't support integers with a value greater than
  2^53 bits. And some of your data uses 2^64 bits.

  The result is that you're working around the problem in a language by
  using a string. Avoiding the root problem will encumber you with
  legacy that you'll regret later.

  Look at your proposed solution from a different point-of-view: say you
  have a language that can't handle Unicode well (e.g. BASIC or Ruby.)
  Would you solve this problem by adding another field called
  text_ascii?

  text: @themattharris hey how are things in K benhavn?.
  text_ascii: @themattharris hey how are things in Kobenhavn?.

  Seems silly, yet that is exactly what you're doing for Javascript and
  long integers.

  A part of this legacy in your payload is future confusion for
  developers. Someone new to the Twitter API is going to be confused as
  to why your ID values have both numeric and string representations.
  And smart developers are going to lean towards the numeric
  representation:

  * 8 bytes of storage for 10765432100123456789 instead of 20 bytes.
  * Faster sorting (less data to compare.)
  * Correct sorting: 011 and 10 have different order depending on
  whether you're sorting the string or numeric representation.

  They'll eventually pay the price for choosing incorrectly.

  Every ID in the API is going to need documentation as a result. For
  example, are place IDs affected by this change? And what about the IDs
  returned by the Search API? (there's no mention of since_id_str and
  max_id_str above.)

  Losing consistency with the XML format is also a problem. Unless
  you're planning on adding _str elements to the XML payload, you're
  presenting developers with a one-way street. A consumer of JSON
  id_str can't  easily change the format of data they want to consume.

  In my mind, you really only have two good choices at this point:

  1) Limit Snowflake's ID space to 2^53 bits. Easier for developers,
  harder for Twitter.

  2) Make all Twitter IDs into strings. Easier for Twitter, harder for
  developers.

  The second choice is obviously more disruptive, but if you really need
  the ID space, it's the right one. Even if it means I need to make
  major changes to my code.

  On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:
  Last week you may remember Twitter planned to enable the new Status ID
  generator - 'Snowflake' but didn't. The purpose of this email is to explain
  the reason why this didn't happen, what we are doing about it, and what the
  new release plan is.

  So what is Snowflake?
  --
  Snowflake is a service we will be using to generate unique Tweet IDs. These
  Tweet IDs are unique 64bit unsigned integers, which, instead of being
  sequential like the current IDs, are based on time. The full ID is composed
  of a timestamp, a worker number, and a sequence number.

  The problem
  -
  Before launch it came to our attention that some programming languages such
  as Javascript cannot support numbers with 53bits. This can be easily
  examined by running a command similar to: (90071992547409921).toString() in
  your browsers console or by running the following JSON snippet through your
  JSON parser.

      {id: 10765432100123456789, id_str: 10765432100123456789}

  In affected JSON parsers the ID will not be converted successfully and will
  lose accuracy. In some parsers there may even be an exception.

  The solution
  
  To allow javascript and JSON parsers to read the IDs we need to include a
  string version of any ID when responding in the JSON format. What this 
  means
  is Status, User, Direct

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread twittelator

It's kind of cool that Craig has been programming professionally
longer than some people on this list have even been alive!

I've thought about the problem, and at least came up with a catchy
name for it to go with the crises of yore:  Bitpocalypse.

Since Twittelator uses the JSON library YAJL, my approach will be to
correctly populate the id with an unsigned long long created from
the id_str field at the point of data arrival. Then it's done and no
other code needs changing. But I'll find out Friday!

On Oct 19, 12:27 pm, Craig Hockenberry craig.hockenbe...@gmail.com
wrote:
 This still gives you some API legacy that you need to maintain, but
 it's a much cleaner approach than what is currently being proposed.

 -ch

 On Oct 19, 10:39 am, Tom van der Woerdt i...@tvdw.eu wrote:



  I wouldn't blame this on JSON, because it's not JSON that has the
  problems, but JavaScript. All of my Objective-C apps that communicate
  use JSON as well, and they don't have the limitation. The issue does not
  apply to XML either - there's no type specification in XML.

  As far as I know, this issue will only cause trouble for a few
  applications that work with JavaScript and depend on the IDs a lot.

  My suggestion to solve this issue would be to introduce an additional
  parameter (just like include_rts, just with a different name) that turns
  all IDs into strings. No extra fields, just an additional optional
  parameter. Won't cause trouble for the applications that can't parse it
  and requires minimal implementation effort for developers.

  I hope I'm not too late with my suggestion :-)

  Tom

  On 10/19/10 7:10 PM, Craig Hockenberry wrote:

   This approach feels wrong to me. The red flag is the duplication of
   data within the payload: in 30+ years of professional development,
   I've never seen that work out well.

   The root of the problem is that you've chosen to deliver data in a
   format (JSON) that can't support integers with a value greater than
   2^53 bits. And some of your data uses 2^64 bits.

   The result is that you're working around the problem in a language by
   using a string. Avoiding the root problem will encumber you with
   legacy that you'll regret later.

   Look at your proposed solution from a different point-of-view: say you
   have a language that can't handle Unicode well (e.g. BASIC or Ruby.)
   Would you solve this problem by adding another field called
   text_ascii?

   text: @themattharris hey how are things in K benhavn?.
   text_ascii: @themattharris hey how are things in Kobenhavn?.

   Seems silly, yet that is exactly what you're doing for Javascript and
   long integers.

   A part of this legacy in your payload is future confusion for
   developers. Someone new to the Twitter API is going to be confused as
   to why your ID values have both numeric and string representations.
   And smart developers are going to lean towards the numeric
   representation:

   * 8 bytes of storage for 10765432100123456789 instead of 20 bytes.
   * Faster sorting (less data to compare.)
   * Correct sorting: 011 and 10 have different order depending on
   whether you're sorting the string or numeric representation.

   They'll eventually pay the price for choosing incorrectly.

   Every ID in the API is going to need documentation as a result. For
   example, are place IDs affected by this change? And what about the IDs
   returned by the Search API? (there's no mention of since_id_str and
   max_id_str above.)

   Losing consistency with the XML format is also a problem. Unless
   you're planning on adding _str elements to the XML payload, you're
   presenting developers with a one-way street. A consumer of JSON
   id_str can't  easily change the format of data they want to consume.

   In my mind, you really only have two good choices at this point:

   1) Limit Snowflake's ID space to 2^53 bits. Easier for developers,
   harder for Twitter.

   2) Make all Twitter IDs into strings. Easier for Twitter, harder for
   developers.

   The second choice is obviously more disruptive, but if you really need
   the ID space, it's the right one. Even if it means I need to make
   major changes to my code.

   On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:
   Last week you may remember Twitter planned to enable the new Status ID
   generator - 'Snowflake' but didn't. The purpose of this email is to 
   explain
   the reason why this didn't happen, what we are doing about it, and what 
   the
   new release plan is.

   So what is Snowflake?
   --
   Snowflake is a service we will be using to generate unique Tweet IDs. 
   These
   Tweet IDs are unique 64bit unsigned integers, which, instead of being
   sequential like the current IDs, are based on time. The full ID is 
   composed
   of a timestamp, a worker number, and a sequence number.

   The problem
   -
   Before launch it came to our attention that some programming languages

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-19 Thread Matt Harris

Hey everyone,

Thank you to all of you for your questions, patience and contributions to
this thread. Hearing your views and knowing how you use the API helps us
provide more information where there wasn't enough, and clarify details
where there was ambiguity.

I've collated the questions i've received from you directly, over Twitter to
@twitterapi and through this list. I hope the comments below provide enough
information to answer those questions and explain the reasoning being our
decisions.

Thanks for your support and patience,
@themattharris

1) Will search.twitter.com also include id_str and
in_reply_to_status_id_str?
Yes, Search will include the String representations of those IDs.

2) Which fields are affected by this change?
All IDs which are transmitted as Integers will have a String representation
in the API response. Only Tweet IDs (which includes mentions and retweets)
will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and
Users may change to a Snowflake ID scheme in the future but this isn’t
planned for this year.

We are adding String representations of the Integer IDs now so you can
update all of your code to use the String representations throughout. to
allow developers to make the change now for all the ID fields and be
prepared should any other IDs break the 53bit boundary.

3) Which fields will have String representations?
The fields which will have String representations are:

id (DM, Saved Search, User, List )
in_reply_to_status_id
in_reply_to_user_id
new_id (Streaming only. Will be removed when Snowflake is enabled)
current_user_retweet_id (When include_my_retweet=1 is passed)

4) Can you provide a complete Tweet example with Snowflake ID to test?

[{coordinates:null,truncated:false,created_at:Thu Oct 14 22:20:15
+
2010,favorited:false,entities:{urls:[],hashtags:[],user_mentions:[{name:Matt
Harris,id:777925,id_str:777925,indices:[0,14],screen_name:themattharris}]},text:@themattharris
hey how are
things?,annotations:null,contributors:[{id:819797,id_str:819797,screen_name:episod}],id:10765432100123456789,id_str:10765432100123456789,retweet_count:0,geo:null,retweeted:false,in_reply_to_user_id:777925,in_reply_to_user_id_str:777925,in_reply_to_screen_name:themattharris,user:{id:6253282,id_str:6253282},source:web,place:null,in_reply_to_status_id:10586268426842688951,in_reply_to_status_id_str:10586268426842688951}]

5) What is happening with new_id in the Streaming API?
new_id and new_id_str will be switched off when or soon after Snowflake is
enabled on November 4th.

6) Why not restrict IDs to 53bits?

A Snowflake ID is composed:
* 41bits for millisecond precision time (69 years)
* 10bits for a configured machine identity (1024 machines)
* 12bits for a sequence number (4096 per machine)

The factor influencing the length of the ID is the time. For a 53bit ID this
would mean only 31bits are available for the time. 31bits is only enough for
24 days (2147483648/(1000*60*60*24)) of time.

Reducing the resolution of the timestamp would prevent a K-sorted resolution
of 1 second or less.

Reducing the configured machine identity or sequence number by 1bit would
mean we couldn’t scale Twitter, or operate our infrastructure in an
uncoordinated high-available way.

7) When will the 53bit Integer overflow happen?
24 days after Snowflake starts counting.

8) Is it safe to parse and store IDs as signed 64bit Integers?
Yes.

9) Why offer both the String and Integer versions of the ID?
The String representation is needed to ensure languages which cannot convert
the 53bit Integer can still use the ID in other API requests.

The Integer value is being retained for languages which can handle numbers
53bit and to prevent applications which have not converted from being
cut-off from Twitter.

10) When ID is null what will the _str representation be?
The _str representation will also be null.

11) Did you really mean ‘unsigned’ 64bit Integer?
Strictly speaking the Snowflake is a signed 64bit long under the hood. That
being said, we will never use the negative bit and won’t require the full
64bits for positive numbers for about 69 years:

http://www.google.com/search?q=%282**41%29+%2F+%2860*60*24*1000%29+%2F+365

12) Why not make the strings opt-in?
We did consider this as an option but decided against it for a number of
reasons. The first reason is that the ID is fundamental to being able to
work with the data from the API so receiving the correct ID shouldn’t be
something you have to opt into. The second, more influential reason, is that
making the _str representations opt-in would create significant, performance
affecting issues for the API.

On Mon, Oct 18, 2010 at 5:34 PM, themattharris thematthar...@twitter.comwrote:

Thanks to @gotwalt for spotting the missing commas.

Fixed JSON sample ...

[
{
coordinates: null,
truncated: false,
created_at: Thu Oct 14 22:20:15 + 2010,
favorited: false,
entities: {
urls: [
],

[twitter-dev] Re: Snowflake: An update and some very important information

2010-10-18 Thread themattharris

Thanks to @gotwalt for spotting the missing commas.

Fixed JSON sample ...

[
  {
coordinates: null,
truncated: false,
created_at: Thu Oct 14 22:20:15 + 2010,
favorited: false,
entities: {
  urls: [
  ],
  hashtags: [
  ],
  user_mentions: [
{
  name: Matt Harris,
  id: 777925,
  id_str: 777925,
  indices: [
0,
14
  ],
  screen_name: themattharris
}
  ]
},
text: @themattharris hey how are things?,
annotations: null,
contributors: [
  {
id: 819797,
id_str: 819797,
screen_name: episod
  }
],
id: 12738165059,
id_str: 12738165059,
retweet_count: 0,
geo: null,
retweeted: false,
in_reply_to_user_id: 777925,
in_reply_to_user_id_str: 777925,
in_reply_to_screen_name: themattharris,
user: {
  id: 6253282,
  id_str: 6253282
},
source: web,
place: null,
in_reply_to_status_id: 12738040524,
in_reply_to_status_id_str: 12738040524
  }
]

Best,
@themattharris

On Oct 18, 5:19 pm, Matt Harris thematthar...@twitter.com wrote:
 Last week you may remember Twitter planned to enable the new Status ID
 generator - 'Snowflake' but didn't. The purpose of this email is to explain
 the reason why this didn't happen, what we are doing about it, and what the
 new release plan is.

 So what is Snowflake?
 --
 Snowflake is a service we will be using to generate unique Tweet IDs. These
 Tweet IDs are unique 64bit unsigned integers, which, instead of being
 sequential like the current IDs, are based on time. The full ID is composed
 of a timestamp, a worker number, and a sequence number.

 The problem
 -
 Before launch it came to our attention that some programming languages such
 as Javascript cannot support numbers with 53bits. This can be easily
 examined by running a command similar to: (90071992547409921).toString() in
 your browsers console or by running the following JSON snippet through your
 JSON parser.

     {id: 10765432100123456789, id_str: 10765432100123456789}

 In affected JSON parsers the ID will not be converted successfully and will
 lose accuracy. In some parsers there may even be an exception.

 The solution
 
 To allow javascript and JSON parsers to read the IDs we need to include a
 string version of any ID when responding in the JSON format. What this means
 is Status, User, Direct Message and Saved Search IDs in the Twitter API will
 now be returned as an integer and a string in JSON responses. This will
 apply to the main Twitter API, the Streaming API and the Search API.

 For example, a status object will now contain an id and an id_str. The
 following JSON representation of a status object shows the two versions of
 the ID fields for each data point.

 [
   {
     coordinates: null,
     truncated: false,
     created_at: Thu Oct 14 22:20:15 + 2010,
     favorited: false,
     entities: {
       urls: [
       ],
       hashtags: [
       ],
       user_mentions: [
         {
           name: Matt Harris,
           id: 777925,
           id_str: 777925,
           indices: [
             0,
             14
           ],
           screen_name: themattharris
         }
       ]
     },
     text: @themattharris hey how are things?,
     annotations: null,
     contributors: [
       {
         id: 819797,
         id_str: 819797,
         screen_name: episod
       }
     ],
     id: 12738165059,
     id_str: 12738165059,
     retweet_count: 0,
     geo: null,
     retweeted: false,
     in_reply_to_user_id: 777925,
     in_reply_to_user_id_str: 777925,
     in_reply_to_screen_name: themattharris,
     user: {
       id: 6253282
       id_str: 6253282
     },
     source: web,
     place: null,
     in_reply_to_status_id: 12738040524
     in_reply_to_status_id_str: 12738040524
   }
 ]

 What should you do - RIGHT NOW
 --
 The first thing you should do is attempt to decode the JSON snippet above
 using your production code parser. Observe the output to confirm the ID has
 not lost accuracy.

 What you do next depends on what happens:

 * If your code converts the ID successfully without losing accuracy you are
 OK but should consider converting to the _str versions of IDs as soon as
 possible.
 * If your code has lost accuracy, convert your code to using the _str
 version immediately. If you do not do this your code will be unable to
 interact with the Twitter API reliably.
 * In some language parsers, the JSON may throw an exception when reading the
 ID value. If this happens in your parser you will need to ‘pre-parse’ the
 data, removing or replacing ID parameters with their _str versions.

 Summary
 -
 1) If you develop in Javascript, know that you will have to update your code
 to read the string version instead of the integer version.

 2)

37 matches

Mail list logo