On Sat, Aug 1, 2009 at 2:54 AM, Brian<[email protected]> wrote:
> On Sat, Aug 1, 2009 at 12:47 AM, Gregory Maxwell <[email protected]> wrote:
>> On Sat, Aug 1, 2009 at 12:13 AM, Michael Dale<[email protected]> wrote:
>> Once you factor in the ratio of video to non-video content for the
>> for-seeable future this comes off looking like a time wasting
>> boondoggle.
> I think you vastly underestimate the amount of video that will be uploaded.
> Michael is right in thinking big and thinking distributed. CPU cycles are
> not *that* cheap.

Really rough back of the napkin numbers:

My desktop has a X3360 CPU. You can build systems all day using this
processor for $600 (I think I spent $500 on it 6 months ago).  There
are processors with better price/performance available now, but I can
benchmark on this.

Commons is getting roughly 172076 uploads per month now across all
media types.  Scans of single pages, photographs copied from flickr,
audio pronouncations, videos, etc.

If everyone switched to uploading 15 minute long SD videos instead of
other things there would be 154,868,400 seconds of video uploaded to
commons per-month. Truly a staggering amount. Assuming a 40 hour work
week it would take over 250 people working full time just to *view*
all of it.

That number is an average rate of 58.9 seconds of video uploaded per
second every second of the month.

Using all four cores my desktop video encodes at >16x real-time (for
moderate motion standard def input using the latest theora 1.1 svn).

So you'd need less than four of those systems to keep up with the
entire commons upload rate switched to 15 minute videos.  Okay, it
would be slow at peak hours and you might wish to produce a couple of
versions at different resolutions, so multiply that by a couple.

This is what I meant by processing being cheap.

If the uploads were all compressed at a bitrate of 4mbit/sec and that
users were kind enough to spread their uploads out through the day and
that the distributed system were perfectly efficient (only need to
send one copy of the upload out), and if Wikimedia were only paying
$10/mbit/sec/month for transit out of their primary dataceter... we'd
find that the bandwidth costs of sending that source material out
again would be $2356/month. (58.9 seconds per second * 4mbit/sec *
$10/mbit/sec/month)

(Since transit billing is on the 95th percentile 5 minute average of
the greater of inbound or outbound uploads are basically free, but
sending out data to the 'cloud' costs like anything else).

So under these assumptions sending out compressed video for
re-encoding is likely to cost roughly as much *each month* as the
hardware for local transcoding. ... and the pace of processing speed
up seems to be significantly better than the declining prices for
bandwidth.

This is also what I meant by processing being cheap.

Because uploads won't be uniformly space you'll need some extra
resources to keep things from getting bogged at peak hours. But the
poor peak-to-average ratio also works against the bandwidth costs. You
can't win: Unless you assume that uploads are going to be very low
bitrates local transcoding will always be cheaper with very short
payoff times.

I don't know how to figure out how much it would 'cost' to have human
contributors spot embedded penises snuck into transcodes and then
figure out which of several contributing transcoders are doing it and
blocking them, only to have the bad user switch IPs and begin again.
... but it seems impossibly expensive even though it's not an actual
dollar cost.


> There is a lot of free video out there and as soon as we
> have a stable system in place wikimedians are going to have a heyday
> uploading it to Commons.

I'm not saying that there won't be video; I'm saying there won't be
video if development time is spent on fanciful features rather than
desperately needed short term functionality.  We have tens of
thousands of videos, much of which don't stream well for most people
because they need thumbnailing.

Firefogg was useful upload lubrication. But user-powered cloud
transcoding?  I believe the analysis I provided above demonstrates
that resources would be better applied elsewhere.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to