On Wed, Apr 25, 2012 at 3:10 PM, Jared winick <[email protected]> wrote:

> I am not exactly sure how to answer the question about storage size per
> tweet as I am not actually storing the original tweet and if a counter
> already exists for an n-gram/time period, then incrementing that counter
> doesn't increase the storage size. I can follow up with the current storage
> I am using though.
>

I see I can make some estimates based on the information in your talk. The
slides are awesome, btw.

Using the information you provided: Dec 24 - March 12... that's 88 days.
 2.6e9 entries, 3 million-ish tweets per day:

2.6e9 / (3e6 * 88)

~10 entries per tweet.

Also, you report disk usage of 72G,  which I will interpret as 72 * (1024
** 3) bytes.

So, each tweet, on average occupies: 72G / (88 * 3e6) Or, ~300 bytes.

-Eric

Reply via email to