Re: mirrors, release publishing...again
A few ways, some worse than others: 1) Offer several download links: Download from Apache, from SourceForge, from MirrorBrain. Of course that doesn't balance the load, but maybe it would if we randomized the order that they are listed. 2) Have a single link, but it is JavaScript that then directs to one of the three mirrors systems. This is easy to distribute the load according to a defined schedule. Marcus prototyped an approach like this. It looked like it was working. I'm not sure, however, whether it handled fallbacks. For example, you randomly select to use the Apache mirror, but the particular operator chosen is down. User experience for backing out of that and repeating was as nice as it could be. 3) Some variation on 3 where we handle the fallbacks better, or at least handle failures better, so the user just needs to click again. I would be in favor of a forth option suggested by Andreas in another thread: * Route autoupdater traffic through one system (MirrorBrain) * Route web based traffic through another (SF as primary, and Apache mirrors as secondary) This eliminates potential problems with which mirror network is having a problem kinds of debugging which would be particularly pernicious if we randomized anything about the process. It also has the benefit of most closely matching Joe's original suggestion of how to use SF.net, and provides a clear accountability/support chain for users when downloads fail. SF.net will as previously mentioned provide an API to collect stats on downloads from our system, and we'd be happy to help host a bouncer that forwards requests to a MirrorBrain server so that updater stats can be collected as well if that helps the team measure the release download volume more effectively. --Mark Ramm This e- mail message is intended only for the named recipient(s) above. It may contain confidential and privileged information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you.
Re: Live testing of SF download mirrors
Hi Joost, the download numbers are impressive and they reflect mostly the activity we had between releases. Directly after a release the numbers will have peak values. I'm not sure if sf.net's load balancers will be able to handle this kind of load alone. MirrorBrain was quite stable and the service had to be restarted occasionally. Thanks for bringing up some good technical questions. I definitely respect what MirrorBrain has done for the OO project, and think we should all work together to best serve the AOO users. Let me start out by saying that we have ops support 24/7, contracts with partners and all the human resources needed to fully support the AOO release and manage any issues that might come up. On the load side I am confident that our mirror redirectors will handle an order of magnitude higher than current total load (more than 5 million downloads/day). And most of that load is not AOO related, and is very stable and predictable. We have had no trouble handling peaks from VLC and other projects that get millions of downloads per month with the current hardware. Additionally the redirectors and associated load balancers are horizontally scalable, and should greater capacity be needed we can quickly deploy more virtual servers to increase capacity. So, I don't believe there is any reason to be concerned about risk at the mirror redirector level, and we should not see any need for restarts, or any instability there. How many mirrors does sf.net use ? I don't believe there is any reason to be concerned at the individual mirror provider level either. We have 30+ mirror partners -- but many of those are organizations with multiple delivery points. For example, one mirror provider has more than 20 global delivery points, and enough bandwidth committed to us to handle 8x current AOO download traffic. Does it provide a bandwidth comparable to the current OOo mirror network (AFAIR it was 2.5 TBit/s) ? For more information about the mirrors see http://download.services.openoffice.org/mirrors/ Total mirror network bandwidth is an elusive number, and somewhat besides the point; because other factors like the number of concurrent connections per mirror can quickly become the choke point for a smaller mirror provider. And of course individual mirror bandwidth can create a local choke point even while there is plenty of available global bandwidth.Which is why we needed a test to get more detailed insight into the traffic patterns. Given that test is complete we are now sure that we have a strong enough global network of well provisioned mirrors with more available bandwidth, more concurrent connection capacity, and more resources in general than we believe the AOO release could possibly consume. With that said, because we don't know the exact burst levels, and we want to be ready for anything, we have also engaged a partnership with a large global CDN provider to give us enough additional burst capacity to handle severa terabits per second on the CDN alone. So, the combination of the existing mirror network and the ability to handle burst loads using additional CDN services puts us in very solid shape to be able to handle a whatever traffic gets sent our way. Because we are deploying some resources are which are expensive, we have just asked that we be notified in advance of any changes to plan, so that we don't end up incurring costs that aren't necessary create value for the open source community and our users. But, our main goal is to always to help open source projects be successful. And in this case we want to help the AOO team to serve their users, and to have the freedom not to have to worry about bandwidth, concurrent connections, or any of the other details of providing reliable download infrastructure, and to focus on promoting this release and growing the AOO user and developer communities. --Mark Ramm This e- mail message is intended only for the named recipient(s) above. It may contain confidential and privileged information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you.
Re: Live testing of SF download mirrors
2) I have not done any timings, but the download speed seems slower, at least the time taken until the download is actually started. We are working through some issues where the CDN edge servers are not pre-populated and the traffic is overwhelming the origin server and are resolving the issue with the vendor right now. In the short term we'll be putting some changes in place to improve the file download throughput, and should have the CDN service pre-populated soon, so we've got both short and long term solutions for this in place. -- *Mark Ramm* Director of Engineering, SourceForge Developer Experience phone: 734-707-7266 email: m...@geek.net skype: geekmark This e- mail message is intended only for the named recipient(s) above. It may contain confidential and privileged information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you.
Re: Feedback Requested: Proposed SourceForce Mirror of AOO 3.4
- SourceForge.net would be the “recommended default download” on the website. What would that look like? On what page do we make this branch? In most of our communications we will point the public to this URL: http://download.openoffice.org (That then redirects to http://www.openoffice.org/download/) The download link then provided to the user is matched to their platform and language, based on their request headers. My thoughts would be that we split based on user preference at this page, by showing two links. One for the sf.net download, and another for the apache mirror network based download. Some subset (and we don't know what % since we're not running Google Analytics here) don't want the default and click through to the full matrix of downloads available: http://www.openoffice.org/download/other.html We can handle that however you want. We can create a sf.net page matching that matrix, put sf.net links in the matrix along with normal mirror network links, or just leave it as is. We are open to whatever helps the project the most. I'm assuming that we want to avoid duplicating effort maintaining the logic for automatically matching users to the right download, as well as avoid SF needing to tracking in detail a large matrix of downloads, availability of new translations, etc. You just want to mirror our dist/incubator/ooo directory. Sourceforge.net already had user agent string + file name heuristics to figure out the right platform for the user and the best match download, which should work automatically. We also allow projects to manually choose the best release for any given platform. So, I think a simple link to sourceforge.net/projects/AOO/files/download/latest (for example) would be enough. So it should be easy enough for that page to display both the sf.net link and one going to the Apache mirror network, and those can be displayed in whatever way makes the most sense for marketing the release and managing download traffic. Mirroring more files is not a problem for us at all as long as we can use rsync or some other automated mechanism to keep the files up to date as there are changes. Maintaining an alternative version/platform matrix page would take a little bit more work, but if it's helpful we could certianly create something that matches that experience on the sf.net side. Ideally (and this is my opinion. others may have better opinions), we would check the user's request header, get the language and platform from that, determine the recommended download, and pass that request onto either of the mirror networks, along with the IP address for locating the nearest mirror. The branch between Apache and SF mirrors could be done randomly, based on a tune-able parameter. if rand()0.25 doApache() else doSF() would send 25% of the download requests to Apache, and the remaining 75% to SF. We can certainly do this as well. Either approach is fine, but the approach outlined above has the advantage of requiring almost no integration work on either side -- so it would be my preference. That said, the approach you describe here could be implemented on the sf.net side in a day or two, so if it's your preference we're more than happy to accomodate that. The nice thing about this approach is it allows each mirror network to do their own geographic optimization, while allowing the OpenOffice project to control how users are recommended a particular version of AOO. It allows us to maintain the matrix of downloads in one place. And it does not introduce any new mouse clicks for the user. I agree that we should try to maintain the current number of clicks. I also agree that we should give the OO project control of how the options are presented, and I like this idea. But the downside is that people might randomly get sf.net sometimes and apache mirrors the next, and have an inconsistent user experience. And I also think users should have some control over what download experience they get. So, overall I think Joe's suggestion of a recommended download link that states that it's going to sourceforge.net, and a second alternate link that goes the the Apache mirrors would probably provide a better user experience. Is it technically feasible? Absolutely. I think I speak for Roberto and the rest of the sf.net team when I say we are open to whatever solution works best for the AOO project, and are more than willing to be guided by the PPMC's opinion on this. -- Mark Ramm Director of Engineering, SourceForge Developer Experience email: m...@geek.net This e- mail message is intended only for the named recipient(s) above. It may contain confidential and privileged information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying
Re: Sourceforge and AOO 3.4 distribution
*We have finally assessed the capacity and capabilities needed to serve the surge of Apache OpenOffice 3.4 release-time traffic. Before we could commit to delivering the full download volume, we wanted to produce a vetted plan, including a clear timeline and backing technical implementation plans. First let me quickly recap my understanding of the problems we are trying to solve for: - Apache OpenOffice 3.4 will be released in mid April and we want to assure capacity to handle that traffic both in terms of bandwidth and simultaneous connections. - The Apache OpenOffice project would benefit to be able to promote the release heavily without worrying about capacity. Given those needs and the fact the Apache Infrastructure team said they’d welcome our assistance, we at SourceForge think we can help and that there would be mutual benefit. What we are proposing is an elaboration of Joe’s ‘hybrid’ approach: - Both AOO and SF.net mirror networks would be used to provide download capacity for the 3.4 release. - SourceForge.net would be the “recommended default download” on the website. - Apache Mirror network would be an alternate download option. - Apache OpenOffice team and Infrastructure team will maintain control of the the auto-update URL’s and possibly follow Rob’s suggestion to stagger automatic updates. SourceForge.net will manage the full burst capacity for web-based downloads through our global network of OSS mirrors, global CDN network(s) and cloud file server providers. Using these resources, we anticipate our capacity is well above the expected delivery requirements for the upcoming release. In addition to basic download capacity, SourceForge will provide detailed download statistics, which will support future product, infrastructure and marketing plans. We will commit to make stats available on the SourceForge.net website and provide stats delivery APIs. We are able to capture initiated downloads, not just page views, and will provide them split by geography and operating system. We’re also willing to consider additional stats needs. Proposed Timeline: - Immediately: SourceForge sets up Apache Infra team with credentials on an AOO mirror project in sf.net - First week: SourceForge updates contracts with CDN and other providers to handle full AOO peak release traffic - Second Week: AOO Infra team works with sf.net operations team to ramp traffic to sf.net in a controlled way in order to gather statistical data, verify assumptions, and give the Apache infrastrucure team time to verify our capacity. - 1-2 days post test: SF.net analyzes traffic data, assures that our assumptions about geographic mix, and interactive vs automated download mix, are valid and we can do this in a fiscally responsible way. - 1-2 days post test: AOO infrastructure team analyses traffic data, lets sf.net team know any additonal data needs, and validates that the system will work for them Once everything is tested and vetted on both sides, we will need to make a CDN bandwidth commit, and would like the AOO team to commit to notifying us 30 days prior to shutting down the flow of traffic, so that we can update our contracts and avoid penalties. We believe that the combination of SF.net mirrors, and CDN based burst capacity will provide a fast and stable download experience for AOO users, and **will allow the AOO team to publicize the release in an agressive manner.* On Wed, Mar 21, 2012 at 10:55 AM, Mark Ramm m...@geek.net wrote: And finally: would you have any objection to us using a mix of fixed mirrors, elastic file delivery services (like s3), and commercial CDN service to handle spikes in download gracefully and assure that global users get good download performance when local mirrors are overloaded or not available? No, we may even be willing to budget some amount for this purpose. Cost estimates would be appreciated as our budget numbers for FY2012 need to be finalized next week. Sorry that it's taken a bit to get back to you. We are working on getting pricing from a variety of providers, and my personal goal is to find a way for us to fund the CDN and S3 costs, and to provide this to the community as a free (as in beer) service. Thanks everybody who provided anecdotal information on historical traffic peaks, and particularly for the steady state run rate information. That has been invaluable as we talk with vendors about the suplemental capacity we need to acquire to handle peak loads. There's one key input to figuring out if I can pay for all of this out of ad revenue, which is what percentage of the daily downloads are expected to come from auto-updater software or other non-browser scripts? Would that traffic still be pointed primarily at AOO owned domains and mirrors, or would we be handling some of that from the sf.net service? And finally, I'd also be interested in finding out
Re: Sourceforge and AOO 3.4 distribution
And finally: would you have any objection to us using a mix of fixed mirrors, elastic file delivery services (like s3), and commercial CDN service to handle spikes in download gracefully and assure that global users get good download performance when local mirrors are overloaded or not available? No, we may even be willing to budget some amount for this purpose. Cost estimates would be appreciated as our budget numbers for FY2012 need to be finalized next week. Sorry that it's taken a bit to get back to you. We are working on getting pricing from a variety of providers, and my personal goal is to find a way for us to fund the CDN and S3 costs, and to provide this to the community as a free (as in beer) service. Thanks everybody who provided anecdotal information on historical traffic peaks, and particularly for the steady state run rate information. That has been invaluable as we talk with vendors about the suplemental capacity we need to acquire to handle peak loads. There's one key input to figuring out if I can pay for all of this out of ad revenue, which is what percentage of the daily downloads are expected to come from auto-updater software or other non-browser scripts? Would that traffic still be pointed primarily at AOO owned domains and mirrors, or would we be handling some of that from the sf.net service? And finally, I'd also be interested in finding out if you know percentage of traffic is from North America vs the rest of the world because some providers give very different rates for different locations, for example Cloudfront publishes $0.02/gb US and $0.12/gb in South America. Thanks again for to everybody who helped with data so far! --Mark Ramm This e- mail message is intended only for the named recipient(s) above. It may contain confidential and privileged information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you.
Re: Sourceforge and AOO 3.4 distribution
On Mon, Mar 19, 2012 at 4:04 PM, Joe Schaefer joe_schae...@yahoo.com wrote: FWIW the ballpark figures we have today Roberto are roughly 12GB worth of release artifacts and about100TB / day worth of download traffic. Thanks for the information. I'm working with Roberto to make sure all the right technical resources are aligned behind him, and that we have the resources to provide a great experience to your users. So, I'm here to help out, and validate everything to make sure we are prepared to handle AOO's peak load. Based on the file size data in the previous e-mail, and this bandwidth information, I believe we are talking about something around 700k download per day. Is that peak load, or is that sustained load? If it's sustained, do you have any ideas about what peak load would look like? If not, do you have any ideas about what sustained load would look like? And finally: would you have any objection to us using a mix of fixed mirrors, elastic file delivery services (like s3), and commercial CDN service to handle spikes in download gracefully and assure that global users get good download performance when local mirrors are overloaded or not available? I'm looking forward to working with all of you to make sure that users have a reliable and fast download source for the upcoming Apache Open Office release. Let me know if there's any questions I can answer for you, or anything else I can do to help. --Mark Ramm This e- mail message is intended only for the named recipient(s) above. It may contain confidential and privileged information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you.