Sorry to top post, but this week I am at my work HQ and am busy. I think that we should create a 404 page and then ask infra to point to that.
Sent from my iPhone On Aug 1, 2012, at 7:45 PM, Kay Schenk <[email protected]> wrote: > > > On 08/01/2012 04:29 PM, Rob Weir wrote: >> On Wed, Aug 1, 2012 at 7:06 PM, Kay Schenk <[email protected]> wrote: >>> Hello all -- >>> >>> I am exploring the www.openoffice.site using the Google Webmaster tool that >>> Rob told us about on Jul 19. >>> >>> I am ONLY getting started by looking at the 62,962 404 errors (!!!!!) >>> >>> Many of these are links to VERY old docs which we no longer have -- like >>> source trees for 1.0.1, 1.0.2 etc.-- or have to do with the OLD >>> architecture -- servlet references etc. >>> >> >> If I understand this correctly, Google is looking at links on >> webpages, not just our webpages, but also links from 3rd party >> websites, and if they point to an openoffice.org page that doesn't >> exist, it shows up on this list. This could happen for any reason. >> In some cases the original link might have had a typo. > > yes, this is correct, and you are right about this too...some of the 404s > reference pages we probably NEVER had. > >> >>> Some of this issues could be solved with rather extensive use of sym links >>> (yes, you can actually use these in svn -- kind of) and of course some not >>> -- many missing old security bulletins. >>> >> >> For the security bulletins, I wonder if this is actually a redirection >> error. We have many of them here: >> >> http://www.openoffice.org/security/bulletin.html > > ah...yes, they are there...the problem is we would need to construct a LOT of > just "redirect" pages to right some of these since they all seem to have the > form > > "/security/cvs-bulletin-number".html > >> >> But we're redirecting security.openoffice.org to >> http://incubator.apache.org/openofficeorg/security.html >> >> So if there are outstanding URL's that are of the form >> security.openoffice.org/foo.html then they might be broken now. > > see above...it's the actual placement of the bulletins within the tree that's > the problem I think > > >> >>> So, to those of you using this tool, I may mark many of these as "fixed". >>> Of course they are not -- and they may show up again. Some of them only >>> show up in BZ issues!! (Google is amazingly thorough). >>> >>> I don't know how long it will take for them to "show up" again. The problem >>> is some of these are very very very old references, and not likely we can >>> do anything about at this point in time. >>> If you're not using this tool, you probably don't care about this. If you >>> are using it, and have another opinion before I start chunking away at >>> hiding these, please weigh in. >>> >> >> The way I understand it the links at the top of the list are the ones >> Google considers the most important. I think this is based on the >> number of links to that page. Maybe they factor in other things as >> well. So I'd recommend looking more at the top 100 or so broken >> links, make this a manageable task. > > Well the problem is "how" to make it manageable... :( > >> >> Or -- and here is a challenge for the algorithm experts -- maybe there >> is an easy way to take that entire list of 62,962 links and determine >> what the top base paths are that are broken. > > if only this were so :( They're all over the place. > > In other words, if the >> links are: >> >> foo.openoffice.org/bar/baz1 >> foo.openoffice.org/bar/baz2 >> foo.openoffice.org/bar/baz2 >> foo.openoffice.org/bar2/baz1 >> foo2.openoffice.org/bar1/baz1 >> >> Then this would tell us that foo.openoffice.org/bar/* was a top source >> of broken links. This might indicate important patterns of where the >> most broken links are. >> >> It seems like this could be done via a prefix tree (a "trie"): >> http://en.wikipedia.org/wiki/Trie >> >> Maybe other (simpler) ways as well. > > I'll look at this article. It's a daunting task any way you look at it. > >> >> Regards, > > What happens when things get moved a LOT with no regard for the end user. > Don't get me started on the ways I've had to deal with this in the past. > >> >> -Rob >> >>> >>> >>> -- >>> ---------------------------------------------------------------------------------------- >>> MzK >>> >>> "I'm just a normal jerk who happens to make music. >>> As long as my brain and fingers work, I'm cool." >>> -- Eddie Van Halen > > -- > ------------------------------------------------------------------------ > MzK > > "I'm just a normal jerk who happens to make music. > As long as my brain and fingers work, I'm cool." > -- Eddie Van Halen > >
