Re: mirrors, release publishing...again

2012-04-19 Thread Mark Ramm
 A few ways, some worse than others:

 1) Offer several download links:  Download from Apache, from
 SourceForge, from MirrorBrain. Of course that doesn't balance the
 load, but maybe it would if we randomized the order that they are
 listed.

 2) Have a single link, but it is JavaScript that then directs to one
 of the three mirrors systems.  This is easy to distribute the load
 according to a defined schedule.  Marcus prototyped an approach like
 this. It looked like it was working.  I'm not sure, however, whether
 it handled fallbacks.  For example, you randomly select to use the
 Apache mirror, but the particular operator chosen is down.  User
 experience for backing out of that and repeating was as nice as it
 could be.

 3) Some variation on 3 where we handle the fallbacks better, or at
 least handle failures better, so the user just needs to click again.

I would be in favor of a forth option suggested by Andreas in another thread:

* Route autoupdater traffic through one system (MirrorBrain)
* Route web based traffic through another (SF as primary, and Apache
mirrors as secondary)

This eliminates potential problems with which mirror network is
having a problem kinds of debugging which would be particularly
pernicious if we randomized anything about the process. It also has
the benefit of most closely matching Joe's original suggestion of how
to use SF.net, and provides a clear accountability/support chain for
users when downloads fail.

SF.net will as previously mentioned provide an API to collect stats on
downloads from our system, and we'd be happy to help host a bouncer
that forwards requests to a MirrorBrain server so that updater stats
can be collected as well if that helps the team measure the release
download volume more effectively.

--Mark Ramm

This e- mail message is intended only for the named recipient(s) above. It may 
contain confidential and privileged information. If you are not the intended 
recipient you are hereby notified that any dissemination, distribution or 
copying of this e-mail and any attachment(s) is strictly prohibited. If you 
have received this e-mail in error, please immediately notify the sender by 
replying to this e-mail and delete the message and any attachment(s) from your 
system. Thank you.



Re: Live testing of SF download mirrors

2012-04-18 Thread Mark Ramm
Hi Joost,

 the download numbers are impressive and they reflect mostly the activity we
 had between releases. Directly after a release the numbers will have peak
 values. I'm not sure if sf.net's load balancers will be able to handle this
 kind of load alone. MirrorBrain was quite stable and the service had to be
 restarted occasionally.

Thanks for bringing up some good technical questions. I definitely
respect what MirrorBrain has done for the OO project, and think we
should all work together to best serve the AOO users.

Let me start out by saying that we have ops support 24/7, contracts
with partners and all the human resources needed to fully support the
AOO  release and manage any issues that might come up.

On the load side I am confident that our mirror redirectors will
handle an order of magnitude higher than current total load (more than
5 million downloads/day). And most of that load is not AOO related,
and is very stable and predictable.  We have had no trouble handling
peaks from VLC and other projects that get millions of downloads per
month with the current hardware.

Additionally the redirectors and associated load balancers are
horizontally scalable, and should greater capacity be needed we can
quickly deploy more virtual servers to increase capacity.

So, I don't believe there is any reason to be concerned about risk at
the mirror redirector level, and we should not see any need for
restarts, or any instability there.

 How many mirrors does sf.net use ?

I don't believe there is any reason to be concerned at the individual
mirror provider level either.

We have 30+ mirror partners -- but many of those are organizations
with multiple delivery points. For example, one mirror provider has
more than 20 global delivery points, and enough bandwidth committed to
us to handle 8x current AOO download traffic.

 Does it provide a bandwidth comparable to the current OOo mirror network
 (AFAIR it was 2.5 TBit/s) ? For more information about the mirrors see
 http://download.services.openoffice.org/mirrors/

Total mirror network bandwidth is an elusive number, and somewhat
besides the point; because other factors like the number of concurrent
connections per mirror can quickly become the choke point for a
smaller mirror provider.  And of course individual mirror bandwidth
can create a local choke point even while there is plenty of available
global bandwidth.Which is why we needed a test to get more detailed
insight into the traffic patterns. Given that test is complete we are
now sure that we have a strong enough global network of well
provisioned mirrors with more available bandwidth, more concurrent
connection capacity, and more resources in general than we believe the
AOO release could possibly consume.

With that said, because we don't know the exact burst levels, and we
want to be ready for anything, we have also engaged a partnership with
a large global CDN provider to give us enough additional burst
capacity to handle severa terabits per second on the CDN alone.

So, the combination of the existing mirror network and the ability to
handle burst loads using additional CDN services puts us in very solid
shape to be able to handle a whatever traffic gets sent our way.

Because we are deploying some resources are which are expensive, we
have just asked that we be notified in advance of any changes to plan,
so that we don't end up incurring costs that aren't necessary create
value for the open source community and our users.

But, our main goal is to always to help open source projects be
successful. And in this case we want to help the AOO team to serve
their users, and to have the freedom not to have to worry about
bandwidth, concurrent connections, or any of the other details of
providing reliable download infrastructure, and to focus on promoting
this release and growing the AOO user and developer communities.

--Mark Ramm

This e- mail message is intended only for the named recipient(s) above. It may 
contain confidential and privileged information. If you are not the intended 
recipient you are hereby notified that any dissemination, distribution or 
copying of this e-mail and any attachment(s) is strictly prohibited. If you 
have received this e-mail in error, please immediately notify the sender by 
replying to this e-mail and delete the message and any attachment(s) from your 
system. Thank you.



Re: Live testing of SF download mirrors

2012-04-06 Thread Mark Ramm

 2) I have not done any timings, but the download speed seems slower,
 at least the time taken until the download is actually started.


We are working through some issues where the CDN edge servers are not
pre-populated and the traffic is overwhelming the origin server and are
resolving the issue with the vendor right now.

In the short term we'll be putting some changes in place to improve the
file download throughput, and should have the CDN
service pre-populated soon, so we've got both short and long term solutions
for this in place.

-- 
*Mark Ramm*
Director of Engineering,
SourceForge Developer Experience
phone: 734-707-7266
email: m...@geek.net
skype: geekmark


This e- mail message is intended only for the named recipient(s) above. It may 
contain confidential and privileged information. If you are not the intended 
recipient you are hereby notified that any dissemination, distribution or 
copying of this e-mail and any attachment(s) is strictly prohibited. If you 
have received this e-mail in error, please immediately notify the sender by 
replying to this e-mail and delete the message and any attachment(s) from your 
system. Thank you.


Re: Feedback Requested: Proposed SourceForce Mirror of AOO 3.4

2012-03-25 Thread Mark Ramm
   - SourceForge.net would be the “recommended default download” on the 
 website.

 What would that look like?  On what page do we make this branch?   In
 most of our communications we will point the public to this URL:

 http://download.openoffice.org

 (That then redirects to http://www.openoffice.org/download/)

 The download link then provided to the user is matched to their
 platform and language, based on their request headers.

My thoughts would be that we split based on user preference at this
page, by showing two links.  One for the sf.net download, and another
for the apache mirror network based download.

 Some subset (and we don't know what % since we're not running Google
 Analytics here) don't want the default and click through to the full
 matrix of downloads available:

 http://www.openoffice.org/download/other.html

We can handle that however you want.   We can create a sf.net page
matching that matrix, put sf.net links in the matrix along with normal
mirror network links, or just leave it as is. We are open to whatever
helps the project the most.

 I'm assuming that we want to avoid duplicating effort maintaining the
 logic for automatically matching users to the right download, as well
 as avoid SF needing to tracking in detail a large matrix of downloads,
 availability of new translations, etc.  You just want to mirror our
 dist/incubator/ooo directory.

Sourceforge.net already had user agent string + file name heuristics
to figure out the right platform for the user and the best match
download, which should work automatically.   We also allow projects to
manually choose the best release for any given platform.  So, I
think a simple link to
sourceforge.net/projects/AOO/files/download/latest (for example) would
be enough.

So it should be easy enough for that page to display both the sf.net
link and one going to the Apache mirror network, and those can be
displayed in whatever way makes the most sense for marketing the
release and managing download traffic.

Mirroring more files is not a problem for us at all as long as we can
use rsync or some other automated mechanism to keep the files up to
date as there are changes.

Maintaining an alternative version/platform matrix page would take a
little bit more work, but if it's helpful we could certianly create
something that matches that experience on the sf.net side.

 Ideally (and this is my opinion.  others may have better opinions), we
 would check the user's request header, get the language and platform
 from that, determine the recommended download, and pass that request
 onto either of the mirror networks, along with the IP address for
 locating the nearest mirror. The branch between Apache and SF mirrors
 could be done randomly, based on a tune-able parameter.  if
 rand()0.25 doApache() else doSF() would send 25% of the download
 requests to Apache, and the remaining 75% to SF.

We can certainly do this as well.  Either approach is fine, but the
approach outlined above has the advantage of requiring almost no
integration work on either side -- so it would be my preference.
That said, the approach you describe here could be implemented on the
sf.net side in a day or two, so if it's your preference we're more
than happy to accomodate that.

 The nice thing about this approach is it allows each mirror network to
 do their own geographic optimization, while allowing the OpenOffice
 project to control how users are recommended a particular version of
 AOO. It allows us to maintain the matrix of downloads in one place.
 And it does not introduce any new mouse clicks for the user.

I agree that we should try to maintain the current number of clicks.
I also agree that we should give the OO project control of how the
options are presented, and I like this idea.

But the downside is that people might randomly get sf.net sometimes
and apache mirrors the next, and have an inconsistent user experience.
 And I also think users should have some control over what download
experience they get.

So, overall I think Joe's suggestion of a recommended download link
that states that it's going to sourceforge.net, and a second
alternate link that goes the the Apache mirrors would probably
provide a better user experience.

 Is it technically feasible?

Absolutely.  I think I speak for Roberto and the rest of the sf.net
team when I say we are open to whatever solution works best for the
AOO project, and are more than willing to be guided by the PPMC's
opinion on this.

--
Mark Ramm
Director of Engineering,
SourceForge Developer Experience
email: m...@geek.net

This e- mail message is intended only for the named recipient(s) above. It may 
contain confidential and privileged information. If you are not the intended 
recipient you are hereby notified that any dissemination, distribution or 
copying of this e-mail and any attachment(s) is strictly prohibited. If you 
have received this e-mail in error, please immediately notify the sender by 
replying

Re: Sourceforge and AOO 3.4 distribution

2012-03-22 Thread Mark Ramm
*We have finally assessed the capacity and capabilities needed to serve the
surge of Apache OpenOffice 3.4 release-time traffic.  Before we could
commit to delivering the full download volume, we wanted to produce a
vetted plan, including a clear timeline and backing technical
implementation plans.

First let me quickly recap my understanding of the problems we are trying
to solve for:

   - Apache OpenOffice 3.4 will be released in mid April and we want to
   assure capacity to handle that traffic both in terms of bandwidth and
   simultaneous connections.
   - The Apache OpenOffice project would benefit to be able to promote the
   release heavily without worrying about capacity.


Given those needs and the fact the Apache Infrastructure team said they’d
welcome our assistance, we at SourceForge think we can help and that there
would be mutual benefit.
What we are proposing is an elaboration of Joe’s ‘hybrid’ approach:

   - Both AOO and SF.net mirror networks would be used to provide download
   capacity for the 3.4 release.
   - SourceForge.net would be the “recommended default download” on the
   website.
   - Apache Mirror network would be an alternate download option.
   - Apache OpenOffice team and Infrastructure team will maintain control
   of the the auto-update URL’s and possibly follow Rob’s suggestion to
   stagger automatic updates.


SourceForge.net will manage the full burst capacity for web-based downloads
through our global network of OSS mirrors, global CDN network(s) and cloud
file server providers.   Using these resources, we anticipate our capacity
is well above the expected delivery requirements for the upcoming release.

In addition to basic download capacity, SourceForge will provide detailed
download statistics, which will support future product, infrastructure and
marketing plans.  We will commit to make stats available on the
SourceForge.net website and provide stats delivery APIs.  We are able to
capture initiated downloads, not just page views, and will provide them
split by geography and operating system.  We’re also willing to consider
additional stats needs.

Proposed Timeline:

   - Immediately: SourceForge sets up Apache Infra team with credentials on
   an AOO mirror project in sf.net
   - First week:  SourceForge updates contracts with CDN and other
   providers to handle full AOO peak release traffic
   - Second Week: AOO Infra team works with sf.net operations team to ramp
   traffic to sf.net in a controlled way in order to gather statistical
   data, verify assumptions, and give the Apache infrastrucure team time to
   verify our capacity.
   - 1-2 days post test:  SF.net analyzes traffic data, assures that our
   assumptions about geographic mix, and interactive vs automated download
   mix, are valid and we can do this in a fiscally responsible way.
   - 1-2 days post test: AOO infrastructure team analyses traffic data,
   lets sf.net team know any additonal data needs, and validates that the
   system will work for them


Once everything is tested and vetted on both sides, we will need to make a
CDN bandwidth commit, and would like the AOO team to commit to notifying us
30 days prior to shutting down the flow of traffic, so that we can update
our contracts and avoid penalties.

We believe that the combination of SF.net mirrors, and CDN based burst
capacity will provide a fast and stable download experience for AOO users,
and **will allow the AOO team to publicize the release in an agressive
manner.*

On Wed, Mar 21, 2012 at 10:55 AM, Mark Ramm m...@geek.net wrote:

 And finally: would you have any objection to us using a mix of fixed
 mirrors, elastic file delivery services (like s3), and commercial CDN
 service to handle spikes in download gracefully and assure that global
 users get good download performance when local mirrors are overloaded
 or not available?


 No, we may even be willing to budget some amount for this purpose.
 Cost estimates would be appreciated as our budget numbers for FY2012
 need to be finalized next week.


 Sorry that it's taken a bit to get back to you.   We are working on
 getting pricing from a variety of providers, and my personal goal is to
 find a way for us to fund the CDN and S3 costs, and to provide this to the
 community as a free (as in beer) service.

 Thanks everybody who provided anecdotal information on historical traffic
 peaks, and particularly for the steady state run rate information.   That
 has been invaluable as we talk with vendors about the suplemental capacity
 we need to acquire to handle peak loads.

 There's one key input to figuring out if I can pay for all of this out of
 ad revenue, which is what percentage of the daily downloads are expected to
 come from auto-updater software or other non-browser scripts?   Would that
 traffic still be pointed primarily at AOO owned domains and mirrors, or
 would we be handling some of that from the sf.net service?

 And finally, I'd also be interested in finding out

Re: Sourceforge and AOO 3.4 distribution

2012-03-21 Thread Mark Ramm

 And finally: would you have any objection to us using a mix of fixed
 mirrors, elastic file delivery services (like s3), and commercial CDN
 service to handle spikes in download gracefully and assure that global
 users get good download performance when local mirrors are overloaded
 or not available?


 No, we may even be willing to budget some amount for this purpose.
 Cost estimates would be appreciated as our budget numbers for FY2012
 need to be finalized next week.


Sorry that it's taken a bit to get back to you.   We are working on getting
pricing from a variety of providers, and my personal goal is to find a way
for us to fund the CDN and S3 costs, and to provide this to the community
as a free (as in beer) service.

Thanks everybody who provided anecdotal information on historical traffic
peaks, and particularly for the steady state run rate information.   That
has been invaluable as we talk with vendors about the suplemental capacity
we need to acquire to handle peak loads.

There's one key input to figuring out if I can pay for all of this out of
ad revenue, which is what percentage of the daily downloads are expected to
come from auto-updater software or other non-browser scripts?   Would that
traffic still be pointed primarily at AOO owned domains and mirrors, or
would we be handling some of that from the sf.net service?

And finally, I'd also be interested in finding out if you know percentage
of traffic is from North America vs the rest of the world because some
providers give very different rates for different locations, for example
Cloudfront publishes $0.02/gb US and $0.12/gb in South America.

Thanks again for to everybody who helped with data so far!

--Mark Ramm


This e- mail message is intended only for the named recipient(s) above. It may 
contain confidential and privileged information. If you are not the intended 
recipient you are hereby notified that any dissemination, distribution or 
copying of this e-mail and any attachment(s) is strictly prohibited. If you 
have received this e-mail in error, please immediately notify the sender by 
replying to this e-mail and delete the message and any attachment(s) from your 
system. Thank you.


Re: Sourceforge and AOO 3.4 distribution

2012-03-20 Thread Mark Ramm
On Mon, Mar 19, 2012 at 4:04 PM, Joe Schaefer joe_schae...@yahoo.com wrote:

 FWIW the ballpark figures we have today Roberto

 are roughly 12GB worth of release artifacts and

 about100TB / day worth of download traffic.

Thanks for the information.

I'm working with Roberto to make sure all the right technical
resources are aligned behind him, and that we have the resources to
provide a great experience to your users. So, I'm here to help out,
and validate everything to make sure we are prepared to handle AOO's
peak load.

Based on the file size data in the previous e-mail, and this bandwidth
information, I believe we are talking about something around 700k
download per day.

Is that peak load, or is that sustained load? If it's sustained, do
you have any ideas about what peak load would look like?  If not, do
you have any ideas about what sustained load would look like?

And finally: would you have any objection to us using a mix of fixed
mirrors, elastic file delivery services (like s3), and commercial CDN
service to handle spikes in download gracefully and assure that global
users get good download performance when local mirrors are overloaded
or not available?

I'm looking forward to working with all of you to make sure that users
have a reliable and fast download source for the upcoming Apache Open
Office release.   Let me know if there's any questions I can answer
for you, or anything else I can do to help.

--Mark Ramm

This e- mail message is intended only for the named recipient(s) above. It may 
contain confidential and privileged information. If you are not the intended 
recipient you are hereby notified that any dissemination, distribution or 
copying of this e-mail and any attachment(s) is strictly prohibited. If you 
have received this e-mail in error, please immediately notify the sender by 
replying to this e-mail and delete the message and any attachment(s) from your 
system. Thank you.