On Wed, Apr 25, 2012 at 3:10 PM, Jared winick <[email protected]> wrote:
> I am not exactly sure how to answer the question about storage size per > tweet as I am not actually storing the original tweet and if a counter > already exists for an n-gram/time period, then incrementing that counter > doesn't increase the storage size. I can follow up with the current storage > I am using though. > I see I can make some estimates based on the information in your talk. The slides are awesome, btw. Using the information you provided: Dec 24 - March 12... that's 88 days. 2.6e9 entries, 3 million-ish tweets per day: 2.6e9 / (3e6 * 88) ~10 entries per tweet. Also, you report disk usage of 72G, which I will interpret as 72 * (1024 ** 3) bytes. So, each tweet, on average occupies: 72G / (88 * 3e6) Or, ~300 bytes. -Eric
