Re: [twitter-dev] Re: Historical user information
The short answer is convenience for developers. In most cases showing information about a tweet isn't enough -- it needs to be augmented with user data. Rather than require a separate call to retrieve that user data (or building a cache of user object data on the client side) we provide that data inline for all statuses. In places like home_timeline and friends_timeline this inclusion of full user objects is extremely useful, and from an efficiency standpoint probably not too bad in the common case. In the case of something like user_timeline it is inefficient. There are three solutions here 1) Never populate user objects 2) Sometimes populate user objects 3) Always populate user objects We've decided on 3 as the best blend of usefulness and efficiency for now. In the future we may revisit providing more compact forms of this data. ---Mark http://twitter.com/mccv On Tue, Mar 16, 2010 at 5:32 PM, MaDeuce kheaus...@gmail.com wrote: Twitter newb question: Mark, I want to be clear that I'm not questioning the accuracy of your answer at all; I'm sure you're correct. However, I can't fathom why there would be this much redundancy in the data returned by the Twitter API. If I understand correctly, if I get the 20 most recent status posted by a user, then that user's data will be returned 20 times - one with each returned status. Ed's original reply sounded plausible (and interesting), as it at least provided a reason for the repeated inclusion of user objects, inefficient though it might have been. Even a basic normalization would include just the id in the status as opposed to the entire user object. I have to assume that the user is there for a reason, however it's one that I can't figure out. Can you or someone else educate me about why this data is present? Thanks very much, Kenneth On Mar 15, 3:48 pm, Mark McBride mmcbr...@twitter.com wrote: This is incorrect. The user object returned with a status is intended to be represent the current user object, not a historical one. However. There are currently several bugs open around this, so the user object currently represents a snapshot of the user some time in the fairly recent past. ---Mark http://twitter.com/mccv On Mon, Mar 15, 2010 at 1:04 PM, M. Edward (Ed) Borasky zzn...@gmail.comwrote: Yep ... but you don't get all of their tweets. You get the most recent 3200 of their *original* tweets. If they used the built-in retweet, those retweets won't show up in the pages. Try it on @znmeb (me), who built-in-retweets a lot. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Hmm. So if the API for getting a user's tweets allows you to get _all_ of them (via paging, not in one request), that would be a way to trend their data over time, right? I'd rather not use twitalyzer - I want to use the Twitter API natively if I can. On Mar 15, 2:01 pm, M. Edward (Ed) Borasky zzn...@gmail.com wrote: Well, you can retrieve the user's most recent tweets via statuses/user_timeline. Each returned tweet will have a created_at date/time stamp and an embedded user object. Inside this embedded user object will be the number of friends and followers the user had when the tweet was created. Plot the date/time stamps on the X axis and the friends on the Y axis and do a kernel regression. ;-) Or, you could go to twitalyzer.com, key in the user's Twitter screen name, then use the Trends menu item to display the user's metrics. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Is it possible to get information about a user based on a certain time? For example, the number of friends for an account can easily be returned - but it is for the time of the call itself. Is there a way to get those values from arbitrary date times?
[twitter-dev] Re: Historical user information
Mark, thanks for the reply. I understand it from the aspect of convenience, however, I still don't understand it in terms of actual behavior. I performed an experiment with with a fairly active timeline. I grabbed 20 recent (less than one week old) status entries and 20 older (about 11 months old) status entries from the timeline. A python/ tweepy code snippet and the resulting output is at the bottom of this post. In the recent status entries, the user data is invariant, as I would expect from your original response in this thread. However, the user data is different in almost all of the older status entries. In the latter case, for example, the statuses_count, followers_count, and friends_count all increase monotonically. This leads me to conclude that, at one time, the user information associated with a status entry was the user information which was current at the time of the creation of that entry. I assume that, at some point, the implementation changed to the behavior you describe, namely that the current user information is always returned for statuses created after the change in the implementation. I also conclude that the behavior I see w.r.t. the older statuses is actually a bug. Is that correct? Sorry to be so picky about this, but this behavior is relevant to an application that I'm working on. Thanks for your help, Kenneth def info(status) : dfmt = '%Y%m%d:%H%M%S' s= status u= status.user i = '%d, %s, %.5s..., ' % (s.id, s.created_at.strftime(dfmt), s.text) i += '%d, %d %s, %d, %d' % (u.id, u.statuses_count, u.created_at.strftime(dfmt), u.followers_count, u.friends_count) return i uid = 'aplusk' mids = ['1613771842', '10482537035'] columns = 'StatusID, StatusDate, StatusText' columns += ', UserID, UserStatusCount, UserDate, UserFllowersCount, UserFriendsCount' print columns for mid in mids : for page in tweepy.Cursor(api.user_timeline, id=uid, max_id=mid).pages(1) : for status in page : print info(status) StatusID, StatusDate, StatusText, UserID, UserStatusCount, UserDate, UserFllowersCount, UserFriendsCount 1613771842, 20090425:164237, @Heat..., 19058681, 4910 20090116:074006, 4658089, 331 1613761492, 20090425:164103, @DACh..., 19058681, 4910 20090116:074006, 4658089, 331 1613685079, 20090425:162946, I don..., 19058681, 4910 20090116:074006, 4658089, 331 1613669960, 20090425:162730, @alun..., 19058681, 4910 20090116:074006, 4658089, 331 1613665415, 20090425:162651, Need ..., 19058681, 4910 20090116:074006, 4658089, 331 1613566238, 20090425:161154, 3,000..., 19058681, 4910 20090116:074006, 4658089, 331 1613515652, 20090425:160421, RT @b..., 19058681, 4910 20090116:074006, 4658089, 331 1613440226, 20090425:155309, @botn..., 19058681, 4910 20090116:074006, 4658089, 331 1613413880, 20090425:154904, @KJR1..., 19058681, 4910 20090116:074006, 4658089, 331 1613360994, 20090425:154051, We ca..., 19058681, 4910 20090116:074006, 4658089, 331 1610749076, 20090425:050224, a hot..., 19058681, 4910 20090116:074006, 4658089, 331 1610687590, 20090425:045058, Every..., 19058681, 4910 20090116:074006, 4658089, 331 1610289051, 20090425:034332, My da..., 19058681, 4910 20090116:074006, 4658089, 331 1610187822, 20090425:032751, Feeli..., 19058681, 4910 20090116:074006, 4658089, 331 1606835090, 20090424:195513, Have ..., 19058681, 4910 20090116:074006, 4656357, 331 1606563785, 20090424:192317, send..., 19058681, 4910 20090116:074006, 4657011, 331 1606167603, 20090424:183626, way t..., 19058681, 4910 20090116:074006, 4658089, 331 1604521571, 20090424:152440, twitt..., 19058681, 4910 20090116:074006, 4658089, 331 1604292683, 20090424:145837, this ..., 19058681, 4910 20090116:074006, 4658089, 331 1604224716, 20090424:145044, @mimi..., 19058681, 4910 20090116:074006, 4658089, 331 10482537035, 20100314:195924, @usta..., 19058681, 4884 20090116:074006, 4648824, 329 10482253077, 20100314:195136, Just ..., 19058681, 4883 20090116:074006, 4648825, 329 10481776533, 20100314:193841, @ciar..., 19058681, 4882 20090116:074006, 4648804, 329 10481646817, 20100314:193508, looki..., 19058681, 4881 20090116:074006, 4648804, 329 10480208987, 20100314:185739, how i..., 19058681, 4880 20090116:074006, 4648735, 329 10463774628, 20100314:093345, Don't..., 19058681, 4905 20090116:074006, 4653754, 330 10453515324, 20100314:035050, ok th..., 19058681, 4878 20090116:074006, 4646941, 329 10434802987, 20100313:192140, This ..., 19058681, 4878 20090116:074006, 4647352, 329 10432787211, 20100313:182511, go #t..., 19058681, 4876 20090116:074006, 4645494, 329 10410464418, 20100313:060546, At th..., 19058681, 4875 20090116:074006, 4644465, 329 10408735854, 20100313:051144, @Will..., 19058681, 4874 20090116:074006, 4644057, 329 10407563511, 20100313:043821, doing..., 19058681, 4873 20090116:074006, 4643987, 329 10403562181, 20100313:025442, RT @k..., 19058681, 4872 20090116:074006, 4643809, 329 10388019450, 20100312:201421,
Re: [twitter-dev] Re: Historical user information
This is a rare case. Without going in to gory details of caching and the horrors that lie within, suffice to say that looking at a very very heavily trafficked timeline will show different behavior than less heavily trafficked ones. ---Mark http://twitter.com/mccv On Wed, Mar 17, 2010 at 12:31 PM, MaDeuce k...@east.fm wrote: Mark, thanks for the reply. I understand it from the aspect of convenience, however, I still don't understand it in terms of actual behavior. I performed an experiment with with a fairly active timeline. I grabbed 20 recent (less than one week old) status entries and 20 older (about 11 months old) status entries from the timeline. A python/ tweepy code snippet and the resulting output is at the bottom of this post. In the recent status entries, the user data is invariant, as I would expect from your original response in this thread. However, the user data is different in almost all of the older status entries. In the latter case, for example, the statuses_count, followers_count, and friends_count all increase monotonically. This leads me to conclude that, at one time, the user information associated with a status entry was the user information which was current at the time of the creation of that entry. I assume that, at some point, the implementation changed to the behavior you describe, namely that the current user information is always returned for statuses created after the change in the implementation. I also conclude that the behavior I see w.r.t. the older statuses is actually a bug. Is that correct? Sorry to be so picky about this, but this behavior is relevant to an application that I'm working on. Thanks for your help, Kenneth def info(status) : dfmt = '%Y%m%d:%H%M%S' s= status u= status.user i = '%d, %s, %.5s..., ' % (s.id, s.created_at.strftime(dfmt), s.text) i += '%d, %d %s, %d, %d' % (u.id, u.statuses_count, u.created_at.strftime(dfmt), u.followers_count, u.friends_count) return i uid = 'aplusk' mids = ['1613771842', '10482537035'] columns = 'StatusID, StatusDate, StatusText' columns += ', UserID, UserStatusCount, UserDate, UserFllowersCount, UserFriendsCount' print columns for mid in mids : for page in tweepy.Cursor(api.user_timeline, id=uid, max_id=mid).pages(1) : for status in page : print info(status) StatusID, StatusDate, StatusText, UserID, UserStatusCount, UserDate, UserFllowersCount, UserFriendsCount 1613771842, 20090425:164237, @Heat..., 19058681, 4910 20090116:074006, 4658089, 331 1613761492, 20090425:164103, @DACh..., 19058681, 4910 20090116:074006, 4658089, 331 1613685079, 20090425:162946, I don..., 19058681, 4910 20090116:074006, 4658089, 331 1613669960, 20090425:162730, @alun..., 19058681, 4910 20090116:074006, 4658089, 331 1613665415, 20090425:162651, Need ..., 19058681, 4910 20090116:074006, 4658089, 331 1613566238, 20090425:161154, 3,000..., 19058681, 4910 20090116:074006, 4658089, 331 1613515652, 20090425:160421, RT @b..., 19058681, 4910 20090116:074006, 4658089, 331 1613440226, 20090425:155309, @botn..., 19058681, 4910 20090116:074006, 4658089, 331 1613413880, 20090425:154904, @KJR1..., 19058681, 4910 20090116:074006, 4658089, 331 1613360994, 20090425:154051, We ca..., 19058681, 4910 20090116:074006, 4658089, 331 1610749076, 20090425:050224, a hot..., 19058681, 4910 20090116:074006, 4658089, 331 1610687590, 20090425:045058, Every..., 19058681, 4910 20090116:074006, 4658089, 331 1610289051, 20090425:034332, My da..., 19058681, 4910 20090116:074006, 4658089, 331 1610187822, 20090425:032751, Feeli..., 19058681, 4910 20090116:074006, 4658089, 331 1606835090, 20090424:195513, Have ..., 19058681, 4910 20090116:074006, 4656357, 331 1606563785, 20090424:192317, send..., 19058681, 4910 20090116:074006, 4657011, 331 1606167603, 20090424:183626, way t..., 19058681, 4910 20090116:074006, 4658089, 331 1604521571, 20090424:152440, twitt..., 19058681, 4910 20090116:074006, 4658089, 331 1604292683, 20090424:145837, this ..., 19058681, 4910 20090116:074006, 4658089, 331 1604224716, 20090424:145044, @mimi..., 19058681, 4910 20090116:074006, 4658089, 331 10482537035, 20100314:195924, @usta..., 19058681, 4884 20090116:074006, 4648824, 329 10482253077, 20100314:195136, Just ..., 19058681, 4883 20090116:074006, 4648825, 329 10481776533, 20100314:193841, @ciar..., 19058681, 4882 20090116:074006, 4648804, 329 10481646817, 20100314:193508, looki..., 19058681, 4881 20090116:074006, 4648804, 329 10480208987, 20100314:185739, how i..., 19058681, 4880 20090116:074006, 4648735, 329 10463774628, 20100314:093345, Don't..., 19058681, 4905 20090116:074006, 4653754, 330 10453515324, 20100314:035050, ok th..., 19058681, 4878 20090116:074006, 4646941, 329 10434802987, 20100313:192140, This ..., 19058681, 4878 20090116:074006, 4647352, 329 10432787211, 20100313:182511,
[twitter-dev] Re: Historical user information
Twitter newb question: Mark, I want to be clear that I'm not questioning the accuracy of your answer at all; I'm sure you're correct. However, I can't fathom why there would be this much redundancy in the data returned by the Twitter API. If I understand correctly, if I get the 20 most recent status posted by a user, then that user's data will be returned 20 times - one with each returned status. Ed's original reply sounded plausible (and interesting), as it at least provided a reason for the repeated inclusion of user objects, inefficient though it might have been. Even a basic normalization would include just the id in the status as opposed to the entire user object. I have to assume that the user is there for a reason, however it's one that I can't figure out. Can you or someone else educate me about why this data is present? Thanks very much, Kenneth On Mar 15, 3:48 pm, Mark McBride mmcbr...@twitter.com wrote: This is incorrect. The user object returned with a status is intended to be represent the current user object, not a historical one. However. There are currently several bugs open around this, so the user object currently represents a snapshot of the user some time in the fairly recent past. ---Mark http://twitter.com/mccv On Mon, Mar 15, 2010 at 1:04 PM, M. Edward (Ed) Borasky zzn...@gmail.comwrote: Yep ... but you don't get all of their tweets. You get the most recent 3200 of their *original* tweets. If they used the built-in retweet, those retweets won't show up in the pages. Try it on @znmeb (me), who built-in-retweets a lot. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Hmm. So if the API for getting a user's tweets allows you to get _all_ of them (via paging, not in one request), that would be a way to trend their data over time, right? I'd rather not use twitalyzer - I want to use the Twitter API natively if I can. On Mar 15, 2:01 pm, M. Edward (Ed) Borasky zzn...@gmail.com wrote: Well, you can retrieve the user's most recent tweets via statuses/user_timeline. Each returned tweet will have a created_at date/time stamp and an embedded user object. Inside this embedded user object will be the number of friends and followers the user had when the tweet was created. Plot the date/time stamps on the X axis and the friends on the Y axis and do a kernel regression. ;-) Or, you could go to twitalyzer.com, key in the user's Twitter screen name, then use the Trends menu item to display the user's metrics. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Is it possible to get information about a user based on a certain time? For example, the number of friends for an account can easily be returned - but it is for the time of the call itself. Is there a way to get those values from arbitrary date times?
[twitter-dev] Re: Historical user information
Hmm. So if the API for getting a user's tweets allows you to get _all_ of them (via paging, not in one request), that would be a way to trend their data over time, right? I'd rather not use twitalyzer - I want to use the Twitter API natively if I can. On Mar 15, 2:01 pm, M. Edward (Ed) Borasky zzn...@gmail.com wrote: Well, you can retrieve the user's most recent tweets via statuses/user_timeline. Each returned tweet will have a created_at date/time stamp and an embedded user object. Inside this embedded user object will be the number of friends and followers the user had when the tweet was created. Plot the date/time stamps on the X axis and the friends on the Y axis and do a kernel regression. ;-) Or, you could go to twitalyzer.com, key in the user's Twitter screen name, then use the Trends menu item to display the user's metrics. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Is it possible to get information about a user based on a certain time? For example, the number of friends for an account can easily be returned - but it is for the time of the call itself. Is there a way to get those values from arbitrary date times?
Re: [twitter-dev] Re: Historical user information
Yep ... but you don't get all of their tweets. You get the most recent 3200 of their *original* tweets. If they used the built-in retweet, those retweets won't show up in the pages. Try it on @znmeb (me), who built-in-retweets a lot. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Hmm. So if the API for getting a user's tweets allows you to get _all_ of them (via paging, not in one request), that would be a way to trend their data over time, right? I'd rather not use twitalyzer - I want to use the Twitter API natively if I can. On Mar 15, 2:01 pm, M. Edward (Ed) Borasky zzn...@gmail.com wrote: Well, you can retrieve the user's most recent tweets via statuses/user_timeline. Each returned tweet will have a created_at date/time stamp and an embedded user object. Inside this embedded user object will be the number of friends and followers the user had when the tweet was created. Plot the date/time stamps on the X axis and the friends on the Y axis and do a kernel regression. ;-) Or, you could go to twitalyzer.com, key in the user's Twitter screen name, then use the Trends menu item to display the user's metrics. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Is it possible to get information about a user based on a certain time? For example, the number of friends for an account can easily be returned - but it is for the time of the call itself. Is there a way to get those values from arbitrary date times?
Re: [twitter-dev] Re: Historical user information
This is incorrect. The user object returned with a status is intended to be represent the current user object, not a historical one. However. There are currently several bugs open around this, so the user object currently represents a snapshot of the user some time in the fairly recent past. ---Mark http://twitter.com/mccv On Mon, Mar 15, 2010 at 1:04 PM, M. Edward (Ed) Borasky zzn...@gmail.comwrote: Yep ... but you don't get all of their tweets. You get the most recent 3200 of their *original* tweets. If they used the built-in retweet, those retweets won't show up in the pages. Try it on @znmeb (me), who built-in-retweets a lot. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Hmm. So if the API for getting a user's tweets allows you to get _all_ of them (via paging, not in one request), that would be a way to trend their data over time, right? I'd rather not use twitalyzer - I want to use the Twitter API natively if I can. On Mar 15, 2:01 pm, M. Edward (Ed) Borasky zzn...@gmail.com wrote: Well, you can retrieve the user's most recent tweets via statuses/user_timeline. Each returned tweet will have a created_at date/time stamp and an embedded user object. Inside this embedded user object will be the number of friends and followers the user had when the tweet was created. Plot the date/time stamps on the X axis and the friends on the Y axis and do a kernel regression. ;-) Or, you could go to twitalyzer.com, key in the user's Twitter screen name, then use the Trends menu item to display the user's metrics. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Is it possible to get information about a user based on a certain time? For example, the number of friends for an account can easily be returned - but it is for the time of the call itself. Is there a way to get those values from arbitrary date times?
Re: [twitter-dev] Re: Historical user information
Oh? I thought the embedded user was saved in your tweet database. Thanks for the correction. That explains why one of the sites I visited doesn't work - they were reporting a follower history for me that was oscillating. So their site can't work at all once you fix these bugs. Dang! I forget who it is, so I can't tell them. Also - if the embedded user object is supposed to be constant across all the returned tweets, could it be factored out of the tweet array and only transmitted once? You could send back the *whole* user object once and then pages of arrays of tweets. You'd need to give the API call a new name, of course, but it would be vastly more efficient. -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erdos Quoting Mark McBride mmcbr...@twitter.com: This is incorrect. The user object returned with a status is intended to be represent the current user object, not a historical one. However. There are currently several bugs open around this, so the user object currently represents a snapshot of the user some time in the fairly recent past. ---Mark http://twitter.com/mccv On Mon, Mar 15, 2010 at 1:04 PM, M. Edward (Ed) Borasky zzn...@gmail.comwrote: Yep ... but you don't get all of their tweets. You get the most recent 3200 of their *original* tweets. If they used the built-in retweet, those retweets won't show up in the pages. Try it on @znmeb (me), who built-in-retweets a lot. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Hmm. So if the API for getting a user's tweets allows you to get _all_ of them (via paging, not in one request), that would be a way to trend their data over time, right? I'd rather not use twitalyzer - I want to use the Twitter API natively if I can. On Mar 15, 2:01 pm, M. Edward (Ed) Borasky zzn...@gmail.com wrote: Well, you can retrieve the user's most recent tweets via statuses/user_timeline. Each returned tweet will have a created_at date/time stamp and an embedded user object. Inside this embedded user object will be the number of friends and followers the user had when the tweet was created. Plot the date/time stamps on the X axis and the friends on the Y axis and do a kernel regression. ;-) Or, you could go to twitalyzer.com, key in the user's Twitter screen name, then use the Trends menu item to display the user's metrics. ;-) -- M. Edward (Ed) Borasky borasky-research.net/m-edward-ed-borasky/ A mathematician is a device for turning coffee into theorems. ~ Paul Erd?s Quoting Raymond Camden rcam...@gmail.com: Is it possible to get information about a user based on a certain time? For example, the number of friends for an account can easily be returned - but it is for the time of the call itself. Is there a way to get those values from arbitrary date times?
RE: [twitter-dev] Re: Historical user information
Also - if the embedded user object is supposed to be constant across all the returned tweets, could it be factored out of the tweet array and only transmitted once? You could send back the *whole* user object once and then pages of arrays of tweets. You'd need to give the API call a new name, of course, but it would be vastly more efficient. Put an Accept-Encoding: gzip, deflate on your request, and parse the compressed response. The compression will take care of the redundant user data and more. - Brian