Re: [Wikitech-l] Wikitech-l Digest, Vol 65, Issue 34

Ján Uličný Sun, 28 Dec 2008 05:27:27 -0800

SOS...SOS...SOS...HELP...Slovakia-Slovensko,dakujem za E-mail ale neviem čo tam 
je napísané,lebo neovládam Váš jazyk -prosím Slovenčinu alebo češtinu...


______________________________________________________________
> Od: [email protected]
> Komu: [email protected]
> Datum: 28.12.2008 04:16
> Předmět: Wikitech-l Digest, Vol 65, Issue 34
>
>Send Wikitech-l mailing list submissions to
> [email protected]
>
>To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>or, via email, send a message with subject or body 'help' to
> [email protected]
>
>You can reach the person managing the list at
> [email protected]
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Wikitech-l digest..."
>
>
>Today's Topics:
>
>   1. Data center move in Amsterdam: expect some downtime (Mark Bergsma)
>   2. Re: IBM DB2 patch for MediaWiki (Jes?s Quiroga)
>   3. Re: Anchors haven't id attribute (Danny B.)
>   4. Re: Anchors haven't id attribute (Brion Vibber)
>   5. Re: IBM DB2 patch for MediaWiki (Aryeh Gregor)
>   6. Re: Anchors haven't id attribute (Aryeh Gregor)
>   7. Re: Anchors haven't id attribute (Danny B.)
>   8. Re: Anchors haven't id attribute (Aryeh Gregor)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Fri, 26 Dec 2008 22:05:17 +0100
>From: Mark Bergsma <[email protected]>
>Subject: [Wikitech-l] Data center move in Amsterdam: expect some
> downtime
>To: Wikimedia developers <[email protected]>, Wikimedia
> Foundation Mailing List <[email protected]>
>Message-ID: <[email protected]>
>Content-Type: text/plain; charset=ISO-8859-1
>
>In the upcoming days until new years we will be moving our servers and
>other equipment in the Amsterdam data center location to a new data
>center. Unfortunately this might result in some down time and hiccups of
>certain web sites &amp; services, although we will try to keep this to a
>minimum.
>
>On Sunday the 28th, between 09:00 and 11:00 UTC we will migrate our
>network in Amsterdam to new equipment. All services located there will
>be unreachable for a brief period. Traffic for the main wikis will be
>rerouted to the Florida cluster however, and should remain unaffected.
>
>In the days after we will be moving the servers themselves. Some
>services, such as the mailing lists server, the subversion server and
>the toolserver cluster, will be down for a number of hours while the
>equipment is being moved. Traffic for the wikis should again remain
>largely unaffected.
>
>We hope to have the entire migration finished before we enter the last
>few hours of 2008... and start 2009 with a clean sheet. Happy Holidays
>everyone!
>
>-- 
>Mark Bergsma <[email protected]>
>System &amp; Network Administrator, Wikimedia Foundation
>
>
>
>------------------------------
>
>Message: 2
>Date: Sat, 27 Dec 2008 07:23:00 +0100
>From: Jes?s Quiroga <[email protected]>
>Subject: Re: [Wikitech-l] IBM DB2 patch for MediaWiki
>To: Wikimedia developers <[email protected]>
>Message-ID: <[email protected]>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
>Hello.
>
>After a few days of pondering the issues, I would like to explain what I 
>suggested in my previous message, in more detail and (hopefully) more 
>clearly.
>
>What I'm about to say is pretty abstract, so it's difficult to convey 
>the right meaning. Please forgive me if I say something you already 
>know, or just nonsense :-)
>
>
>Jes?s Quiroga escribi?:
>> I believe a better solution is to design a domain-specific language, an 
>> idea not very different from your first one.
>> This DSL would model the interaction between the application and the DB 
>> as it is now, and would be designed to evolve. That's it.
>>   
>
>The problem I discuss is how to best access the data store from an 
>application. I believe the right answer is different for each project, 
>but it's not difficult to evaluate the alternatives, one by one, in a 
>given context. I think it is worthwhile to do that in the context of 
>MediaWiki.
>
>I will refer to wiki modules and databases as if they were 'hosts' 
>connected to a 'network', to highlight the role of languages in the 
>operation of the system at runtime.
>
>
>The first way to access the data store is the 'direct' one:
>
>    [polyglot wiki] <--- mysDataL ---> [mysql]
>    [polyglot wiki] <--- posDataL ---> [postgresql]
>    [polyglot wiki] <--- db2DataL ---> [db2]
>
>Here, the polyglot wiki module talks to every database using the proper 
>languages. 'mysDataL' means 'the data language understood by MySQL', 
>'posDataL' means 'the data language understood by PostgreSQL', etc.
>
>The polyglot wiki promises to learn several languages and to speak them 
>correctly forever, so, if a new database comes along or any of their 
>data languages evolves, the polyglot wiki is forced to adapt at a 
>potentially great cost. Besides, any change to the database schema can 
>trigger lots of updates to the wiki code, and be very costly too.
>
>The advantages of this way are well known: it is fast, no need to do 
>design, easy to understand.
>The drawbacks are apparently few, but devastating: verbose and complex 
>code in multiple places in the wiki module, very costly to maintain, 
>even more costly to evolve. All changes cost a lot, in time and effort.
>
>
>
>The second way to access the data store that is usually considered is 
>the 'indirect' one:
>
>    [wiki] <--- wikiDataL ---> [polyglot translator]
>
>    [polyglot translator] <--- mysDataL ---> [mysql]
>    [polyglot translator] <--- posDataL ---> [postgresql]
>    [polyglot translator] <--- db2DataL ---> [db2]
>
>Here, wikiDataL means 'some relational data definition and manipulation 
>language suitable for use by the wiki'.
>
>The polyglot translator promises to learn wikiDataL and the other 
>dialects and to evolve with them, so it has all the problems the wiki 
>had in the direct way, but now the cost is lower because a lot of 
>complexity is 'hidden' inside the translator and can't reach the wiki. 
>As a result, wiki code is not updated as much, and it's much cleaner and 
>less verbose.
>
>The advantages of this way are: wiki module code is simpler, cost of 
>evolution is reduced.
>The drawbacks are apparently many: it's slower, design is needed, harder 
>to understand, a new language (wikiDataL), translator can be very 
>complex. However, the need to reduce the cost to achieve change is 
>usually so great that these inconveniences are minor in comparison.
>
>
>
>Now the interesting bit begins. A third possible way to access the data 
>store, the 'interpreted' one:
>
>    [wiki] <--- wikiNeedL ---> [polyglot interpreter]
>
>    [polyglot interpreter] <--- mysDataL ---> [mysql]
>    [polyglot interpreter] <--- posDataL ---> [postgresql]
>    [polyglot interpreter] <--- db2DataL ---> [db2]
>
>Here, wikiNeedL means 'some language adequate for the wiki to express 
>its data access needs and nothing else'.
>
>wikiNeedL is the domain-specific language I wrote about in my previous 
>message.
>
>The differences between wikiDataL and wikiNeedL are mainly these:
>   - wikiNeedL would contain just enough wiki concepts to express the 
>wiki's needs, so it's effectively confined to that domain. wikiDataL 
>belongs to the relational data model domain, which is quite different.
>   - in general, wikiNeedL would have different semantics than the 
>dialects understood by the databases, so the translation step becomes 
>more like interpretation, rather than just syntactic transformations. 
>wikiDataL usually has the same semantics than the dialects.
>   - wikiNeedL would contain just enough concepts to satisfy current 
>needs, and will be open to extension. wikiDataL aims to be 
>general-purpose and to fulfill current and future needs.
>
>The main reason to consider the 'interpreted' way is, of course, that it 
>helps reduce even more the cost to achieve change.
>
>
>
>So that's what I was talking about. I will say more about the 
>differences between the indirect and the interpreted ways in a future 
>message.
>
>
>
>Thanks for your attention.
>
>
>
>
>
>------------------------------
>
>Message: 3
>Date: Sat, 27 Dec 2008 13:05:53 +0100 (CET)
>From: Danny B.<[email protected]>
>Subject: Re: [Wikitech-l] Anchors haven't id attribute
>To: Wikimedia developers<[email protected]>
>Message-ID: <[email protected]>
>Content-Type: text/plain; charset="iso-8859-2"
>
>> ------------ P?vodn? zpr?va ------------
>> Od: Brion Vibber <[email protected]>
>> P?edm?t: Re: [Wikitech-l] Anchors haven't id attribute
>> Datum: 26.12.2008 06:30:00
>> ----------------------------------------
>> On 12/25/08 4:32 AM, Danny B. wrote:
>> > I have reverted both revisions in r45021 and r45022 because it caused 
>> > massive
>> invalidity of pages.
>> 
>> Given that we've been outputting these as "id" attributes for the last 
>> few years already (as output by Tidy), I have reverted your revert in 
>> r45044 pending further discussion.
>> 
>> -- brion
>
>Well, the id was added _only_ to those tags, where name was transferable to id 
>- thus had to start with ASCII letter. _Never_ to those, which did not conform 
>this rule (the regexp mentioned in my previous post). Easily provable by 
>either running older revision of MediaWiki or testing in Tidy directly:
>
>Take this code excerpt (and wrap it with minimal XHTML document stuff) and run 
>it through Tidy:
>
><a name="X"></a><h2> <span class="mw-headline"> X </span></h2>
><a name="1X"></a><h2> <span class="mw-headline"> 1X </span></h2>
><a name=".C3.81X"></a><h2> <span class="mw-headline"> ?X </span></h2>
><a name="-X"></a><h2> <span class="mw-headline"> -X </span></h2>
>
>The result will be:
>
><a name="X" id="X"></a><h2><span class="mw-headline">X</span></h2>
><a name="1X"></a><h2><span class="mw-headline">1X</span></h2>
><a name=".C3.81X"></a><h2><span class="mw-headline">?X</span></h2>
><a name="-X"></a><h2><span class="mw-headline">-X</span></h2>
>
>Now, let me repeat, how the "id" is defined:
>
>1: XHTML is reformulation of HTML 4 as an XML 1.0 application.
>2: That means it takes every single definition from HTML 4 and keeps it unless 
>it is overriden in XHTML.
>3: The id and name has been defined in HTML 4 as /[A-Za-z][A-Za-z0-9:_.-]*/  
>[1] [2]
>4: The name has been redefined to NMTOKEN  [2] [3]
>5: The id has never been redefined thus stays on definition mentioned in point 
>3 above.
>
>This is how the id in XHTML was always handled since the XHTML is out. I also 
>think that such important thing like handling of id is, was fixed in validator 
>during so many years if it wasn't correct.
>
>So currently, all non-latin-chars wikis are now totally invalid according to 
>W3C validator. Major parts of non-ASCII-chars wikis are invalid as well. 
>Therefore is very hard to find other invalid mistakes in code when having 
>worthless positives on every other page. :-(
>
>Also one thing at the end: I think that the current rendering with 
>controversial ids brought more negatives (such as much lowering down the 
>ability to find the real invalid parts of the code) than positives - well, it 
>was working correctly before, so what benefit it actually brought? On the 
>other hand it brought this controversy.
>
>I take the point that I (and majority of people over the world, the validator, 
>Tidy and so many other tools etc.) _may_ be wrong with the interpretation of 
>definition of id. But I guess unless the authority tools, as validator or Tidy 
>are, are fixed in this issue - thus can be proved we render the page correctly 
>- we should not render that way. As I mentioned above - it was working 
>correctly before so there is no urge to force the new rendering since it is 
>not correcting any mistake or misfunctionality.
>
>[1] http://www.w3.org/TR/html401/types.html#type-name
>[2] http://www.w3.org/TR/xhtml1/#C_8
>[3] http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Nmtoken
>
>
>Kind regards
>
>
>Danny B.
>
>
>
>------------------------------
>
>Message: 4
>Date: Sat, 27 Dec 2008 12:14:33 -0800
>From: Brion Vibber <[email protected]>
>Subject: Re: [Wikitech-l] Anchors haven't id attribute
>To: Wikimedia developers <[email protected]>
>Message-ID: <[email protected]>
>Content-Type: text/plain; charset=ISO-8859-2; format=flowed
>
>[snip]
>
>Maybe we should just fix the normalization function the way we'd already 
>planned to, so that it'll work right the way we'd already planned to?
>
>-- brion
>
>
>
>------------------------------
>
>Message: 5
>Date: Sat, 27 Dec 2008 18:25:10 -0500
>From: "Aryeh Gregor" <[email protected]>
>Subject: Re: [Wikitech-l] IBM DB2 patch for MediaWiki
>To: "Wikimedia developers" <[email protected]>
>Message-ID:
> <[email protected]>
>Content-Type: text/plain; charset=UTF-8
>
>On Sat, Dec 27, 2008 at 1:23 AM, Jes?s Quiroga <[email protected]> wrote:
>> The second way to access the data store that is usually considered is
>> the 'indirect' one:
>>
>>    [wiki] <--- wikiDataL ---> [polyglot translator]
>>
>>    [polyglot translator] <--- mysDataL ---> [mysql]
>>    [polyglot translator] <--- posDataL ---> [postgresql]
>>    [polyglot translator] <--- db2DataL ---> [db2]
>>
>> Here, wikiDataL means 'some relational data definition and manipulation
>> language suitable for use by the wiki'.
>
>This is what we currently use, and I don't think we're going to
>seriously consider changing it without some very compelling arguments
>being presented.  Incremental improvements to our current way of doing
>things (cutting back on raw queries, moving MySQL-specific stuff from
>Database to DatabaseMySql, defining more clearly what Database methods
>mean and avoiding undefined behavior) seem entirely sufficient to
>allow support for any number of additional database backends.
>
>> The differences between wikiDataL and wikiNeedL are mainly these:
>>   - wikiNeedL would contain just enough wiki concepts to express the
>> wiki's needs, so it's effectively confined to that domain. wikiDataL
>> belongs to the relational data model domain, which is quite different.
>>   - in general, wikiNeedL would have different semantics than the
>> dialects understood by the databases, so the translation step becomes
>> more like interpretation, rather than just syntactic transformations.
>> wikiDataL usually has the same semantics than the dialects.
>>   - wikiNeedL would contain just enough concepts to satisfy current
>> needs, and will be open to extension. wikiDataL aims to be
>> general-purpose and to fulfill current and future needs.
>
>In practice, wikiNeedL would be drastically more complicated, if I
>understand you correctly.  Its basic semantic units would be things
>like articles, users, revisions, etc., instead of rows, columns, and
>tables.  We *have* a wikiNeedL, in fact: it's called "calling the
>appropriate Article method" or whatever.  Most code doesn't have to
>manually do queries.  Further abstraction of the database queries
>would be possible, but I question its usefulness.
>
>------------------------------
>
>Message: 6
>Date: Sat, 27 Dec 2008 19:06:24 -0500
>From: "Aryeh Gregor" <[email protected]>
>Subject: Re: [Wikitech-l] Anchors haven't id attribute
>To: "Wikimedia developers" <[email protected]>
>Message-ID:
> <[email protected]>
>Content-Type: text/plain; charset=UTF-8
>
>On Sat, Dec 27, 2008 at 3:14 PM, Brion Vibber <[email protected]> wrote:
>> [snip]
>>
>> Maybe we should just fix the normalization function the way we'd already
>> planned to, so that it'll work right the way we'd already planned to?
>
>Done in r45109.  I notice, by the way, that HTML5 allows any string
>not containing whitespace for id's . . . yet another case where it
>clearly wins the "don't gratuitously cause pain to developers"
>contest.
>
>
>
>------------------------------
>
>Message: 7
>Date: Sun, 28 Dec 2008 03:02:26 +0100 (CET)
>From: Danny B.<[email protected]>
>Subject: Re: [Wikitech-l] Anchors haven't id attribute
>To: Wikimedia developers<[email protected]>
>Message-ID: <[email protected]>
>Content-Type: text/plain; charset="iso-8859-2"
>
>> ------------ P?vodn? zpr?va ------------
>> Od: Aryeh Gregor <[email protected]>
>> P?edm?t: Re: [Wikitech-l] Anchors haven't id attribute
>> Datum: 28.12.2008 01:07:08
>> ----------------------------------------
>> On Sat, Dec 27, 2008 at 3:14 PM, Brion Vibber <[email protected]> wrote:
>> > [snip]
>> >
>> > Maybe we should just fix the normalization function the way we'd already
>> > planned to, so that it'll work right the way we'd already planned to?
>> 
>> Done in r45109.  I notice, by the way, that HTML5 allows any string
>> not containing whitespace for id's . . . yet another case where it
>> clearly wins the "don't gratuitously cause pain to developers"
>> contest.
>
>*sigh*
>
>Why do we have to hunt for some other solution when we have fully working, 
>fully valid and fully intuitive one?
>
>OK, let's make some summary about three versions we have:
>
>Terms used:
>- old version - the for-many-years used version until r44896
>- mid version - r44896 way
>- new version - r45109 way
>
>Old version was used for many years. It was fully valid - ids were only there 
>where they could have been copied from name AND comply to the regexp mentioned 
>in previous posts. It has been done automatically by Tidy. And it was fully 
>intuitive - you just wrote [[#Foo]] and it linked to section named Foo. Or 
>you've added #Foo in URL in address bar and you got to the proper section as 
>well. And it was fully working properly.
>
>The mid version brought the "feature" that all name attributes have been 
>duplicated to ids. That caused massive invalidity of pages, especially 
>non-latin and non-ASCII. However, the intuitivity of anchors creation has 
>still been kept.
>
>The new version prepends x to all anchors to solve the problem which was 
>spread here in mid version - the massive invalidity of pages. So it solved one 
>problem (which actually didn't have to be solved if we kept the old version) 
>but brought at least two major other:
>First major problem is, that this change is breaking millions of existing 
>links to sections. Links used on pages on wikis, links used on external sites, 
>links in people's bookmarks, in emails, forum threads etc. Well, OK, let's 
>discount all external stuff, since we don't have any influence on it, but we 
>still have millions of links left on our own wikis which won't work anymore 
>since r45109.
>The other major problem is, that since this point further the anchor links are 
>no longer intuitive - we are now pushing people to constantly think about 
>prepending x when creating anchor links. No more simple copy pasting of the 
>headline.
>As a side effect we are now adding unnecessary work to people from non-latin 
>wikis by pushing them to always switch to latin keyboard, or to click on 
>edittools or whatever just to get the one "x" character in editbox to create 
>the anchor link.
>
>So let me summarize in points:
>* First we did not have any problem at all.
>* Second we had one problem.
>* Third we "solved" the problem but created at least two new.
>I am pretty scared what's coming next... :-/
>
>One question for the end: What is the benefit of either mid or new version 
>over the old one - what new functionality or feature it brings or which 
>existing bug it fixes?
>
>
>Kind regards
>
>
>Danny B.
>
>
>
>------------------------------
>
>Message: 8
>Date: Sat, 27 Dec 2008 22:15:24 -0500
>From: "Aryeh Gregor" <[email protected]>
>Subject: Re: [Wikitech-l] Anchors haven't id attribute
>To: "Wikimedia developers" <[email protected]>
>Message-ID:
> <[email protected]>
>Content-Type: text/plain; charset=UTF-8
>
>2008/12/27 Danny B. <[email protected]>:
>> *sigh*
>>
>> Why do we have to hunt for some other solution when we have fully working, 
>> fully valid and fully intuitive one?
>
>Because:
>
>1) Our previous behavior arguably violated the XHTML 1 specification
>by allowing name attributes to begin with nonletters.  Please don't
>ignore this argument because you think it's wrong.  I think you're
>wrong on this issue too, but I don't just ignore your opinion when
>discussing what the software that we *both* develop should do.  Note
>"arguably" in the first sentence here -- your opinion counts as much
>as mine.
>
>2) It's not arguable at all that the XHTML 1 specification strongly
>recommends that <a> elements with a name attribute also have an id
>attribute.  In fact, section 4.10 states: "In order to ensure that
>XHTML 1.0 documents are well-structured XML documents, XHTML 1.0
>documents MUST use the id attribute when defining fragment identifiers
>on the elements listed above [including <a>]."
>
>I'm not saying these reasons outweigh the reasons against, but those
>are the reasons it was done.  In particular, I don't think I've seen
>an argument from you against (2).
>
>> Old version was used for many years. It was fully valid
>
>Could you *please* stop pretending that a debate doesn't even exist
>here?  It's obnoxious and uncivil, and you keep on doing it.
>
>> First major problem is, that this change is breaking millions of existing 
>> links to sections. Links used on pages on wikis, links used on external 
>> sites, links in people's bookmarks, in emails, forum threads etc. Well, OK, 
>> let's discount all external stuff, since we don't have any influence on it, 
>> but we still have millions of links left on our own wikis which won't work 
>> anymore since r45109.
>
>First of all, all auto-generated internal links (in TOCs) will
>automatically switch to the new format.  Second of all, it should be
>one extra line of code to fix up all manually-created internal links
>as well, so that the x is automatically added as part of the encoding
>process.  (I didn't find where this needed to be done at a quick
>glance.)  So we're only talking about external links here.
>
>This is a one-time cost and I don't think it's a big problem -- at
>worst, a few users will end up on the wrong part of the page.  It
>should be pointed out that this will affect *all* section links on
>non-Latin wikis (since they get encoded to begin with dots and then
>need to start with a letter), but again, only as a one-time cost, and
>only external links (links from external sites or links using external
>link syntax), and it will still get viewers to almost the right place.
>
>> The other major problem is, that since this point further the anchor links 
>> are no longer intuitive - we are now pushing people to constantly think 
>> about prepending x when creating anchor links. No more simple copy pasting 
>> of the headline.
>> As a side effect we are now adding unnecessary work to people from non-latin 
>> wikis by pushing them to always switch to latin keyboard, or to click on 
>> edittools or whatever just to get the one "x" character in editbox to create 
>> the anchor link.
>
>Again, not an issue if internal links are fixed to work correctly.  I
>didn't think about that aspect, but it should be very simple to fix
>(I'd do it now except I'm going to bed).
>
>It seems to me that there are only weak reasons in favor (following
>recommended best practice with no practical effect) and only weak
>reasons against (small one-time transition cost -- unless you're
>correct that there will be longer-term costs, in which case please
>clarify why you think this).  Normally I would say that standards
>compliance by itself (as opposed to standards compliance that brings
>concrete benefit) is worth small one-time costs, although not large
>enough one-time costs and probably not even fairly small recurring
>costs.  So as it stands, without further arguments, I'd still be
>weakly in favor of keeping the current state of trunk, of course with
>the fix for anchors on internal links.
>
>
>
>------------------------------
>
>_______________________________________________
>Wikitech-l mailing list
>[email protected]
>https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>End of Wikitech-l Digest, Vol 65, Issue 34
>******************************************
> 

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Wikitech-l Digest, Vol 65, Issue 34

Reply via email to