Hey, my only role with Apache infrastructure is as court jester, so I'm not the one to help here: folks will need to organize specific proposals and work with the Apache infra team.

Note this is *not* about which is the fastest way to serve content. The real question is how the Apache infra team - working within a non-profit org, with limited physical and admin resources - can effectively and reliably manage all services our many projects ask for.

Information about Apache infrastructure is available:

  https://www.apache.org/dev/machines.html
  https://people.apache.org/~henkp/  (lots of links)
  https://people.apache.org/~vgritsenko/stats/daily.html
  https://www.apache.org/mirrors/

- Shane

P.S. the CMS by default stores both the mdtext and the final html output both in SVN.

The mdtext is in your working SVN area, with diffs to the appropriate project lists, so that the team can see what changes are being made in terms of the content people develop.

The final html is in the staging/production website SVN areas, mainly for backup recovery and security. If the server's lost, just install httpd somewhere, and then svn checkout the html tree.

On 8/12/2011 12:19 PM, Terry Ellison wrote:
Rob,

I support your general point. Using static HTML files to achieve might
have been a sound argument in the 1990s, but it isn't really credible
with today platform technologies. What are the transactional rates for
the Apache site? How many requests per second even just roughly?

Taking your example of the MediaWiki engine, this is scaled to meet the
transactional and data volume demand of wikipedia.org, one of the
busiest websites on the planet. (There are typically ~100 updates per
second and goodness know how many pageviews.) See
http://www.mediawiki.org/wiki/Cache and the few dozen subsidiary pages.
There are many high performance caching products that address this issue
-- Apache even does one: http://trafficserver.apache.org/ -- and the
mediaWiki engine already integrates with a couple of the leaders: Squid
and Varnish.

Apache's "heartland" is its "number one HTTP server on the Internet".
Are we rally saying that the best way to manage content is through
static HTML files? This is just daft IHMO. Has anyone ever heard of
current CMS technology.

How many content editors and contributors can read HTML these days?

One other point: yes SVN or any equivalent versioning repository can
store most types of content, but versioning should take place at the
highest level of abstraction and language that the content providers
work in. Take an extreme example to emphasise this point. svn can store
object modules, but does this mean that we should use these are the
master control and disassemble back to assembly code to update programs.
Of course not. But to many editors, HTML is little more that binary
machine code.

Non-functional (infrastructure) requirements help drive the design and
implementation cycles but they shouldn't unnecessarily limit the true
functional requirements of the system. To do so is madness. Is is really
an approach Apache wants to advocate?

Regards
Terry
On Fri, Aug 12, 2011 at 10:10 AM, Shane Curcuru<[email protected]>
wrote:
(To provide a little context while Gav may be asleep)

On 8/12/2011 9:26 AM, Rob Weir wrote:
On Fri, Aug 12, 2011 at 3:41 AM, Gavin McDonald<[email protected]>
wrote:
On Thu, Aug 11, 2011 at 12:12 PM, Kay Schenk<[email protected]>
...snip snip snip...

Just a thought: Could you do the entire website in MediaWiki, with
only
exception cases (download page, etc.) done in HTML?
Just to put a blocker on this right away, we will not be using the
wiki
as the
main website or the main entrance into the OOo world.

Since it is not self-evident to me why a wiki would be a problem for
the main website, could you explain this a little further? Is there a
technical problem? Remember, the wiki already comprises several
thousand pages of website content, so in a very real sense the "main"
website is already the wiki.
Performance. As I understand it, the bulk of all apache.org content is
served statically as html files. Putting a major project's homepage
website
like the future office.a.o (or whatever name) up as a wiki would add a
significant amount of load to our servers, even for a highly
efficient wiki
engine.

Thanks, that gives some context. So "main" in this case is not
necessarily only the top level page, i.e., an eventually
openoffice.apache.org or the current www.openoffice.org. Certainly
those pages would be some of the most highly-trafficked pages. But we
probably have some others that are also, FAQ's, Release notes,
download page, etc.

But that still leaves the long tail of the thousands of other pages
that are individually accessed rarely, but may add up to significant
load.

I'm surprised there is no caching mechanism for MediaWiki to simply
write out up static versions of pages and then invalidate the cache
for a particular page when it is changed. In theory you could have
the rarely-changed pages be just as efficient as static HTML. Plugins
exist that do this for WordPress, for example.


The beauty of the CMS is that while it's easy to work on the pages
(either
via SVN or browser), the final result is simply checked into SVN and
then
the resulting .html file is just stuck on the production webserver site.
Some projects use a wiki to manage their homepages (i.e. project.a.o,
separate from any community wiki they may have), but the physical
homepage
that end-users see is typically static html that's been exported from
their
wiki site.

Gav or infra folk can provide more details, but you should plan on
adhering
to whatever performance restrictions the infra team requires for the
main
website.

- Shane


Reply via email to