Re: [Wikitech-l] File licensing information support

2011-01-25 Thread Dmitriy Sintsov
* Michael Dale md...@wikimedia.org [Mon, 24 Jan 2011 13:18:00 -0600]:
 We should focus on apis for template editing,
 Extension:Page_Object_Model seemed like a step in the right direction
 but not  Something that let you edit structured data across nested
 template objects and we could stack validation ontop of that would let
 us leverage everything that has been done and keep things wide open 
for
 what's done in the future.

 Most importantly we need clean high level apis that we can build GUIs
 on, so that the flexibility of the system does not hurt usability 
and
 functionality.

Michael is correct - API module to extract data from already existing 
nested templates and to replace the data (when needed) probably is the 
only thing that is required to make Wikipedia more structural and 
semantical. Then, the whole collecting and analyzing of triples can be 
off-loaded to externals bots and tools. Great idea, imho.
Dmitriy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Simple Page Object model using #lst

2011-01-25 Thread Alex Brollo
I'd like to share an idea. If you think that I don't know of what I am
speaking of, probably you're right; nevertheless I'll try.

Labeled section trasclusion, I presume, simply runs as a substring search
into raw wiki code of a page; it gives back a piece of the page as it is
(but removing any section... tag inside). Imagine that this copy and
paste of chunks of wiki code would be the first parsing step, the result
being a new wiki text, then parsed for template code and other wiki code.

If this would happen, I imagine that the original page could be considered
an object, t.i. a collection of attributes (fragments of text)  and
methods (template chunks). So, you could write template pages with
collections of different template functions,. or pages with collections of
different data, or mixed pages with both data and functions, any of them
being accessible from any wiki page of the same project (while waiting for
interwiki transclusionn).

Then, simply adding carefully a self-tranclusion permission to use chunks
of code of a page into the same page , the conversion of a page into a
true,even if simple,  object would be complete.

Alex
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Simple Page Object model using #lst

2011-01-25 Thread Jesse (Pathoschild)
On Tue, Jan 25, 2011 at 8:14 AM, Alex Brollo alex.bro...@gmail.com wrote:
 If this would happen, I imagine that the original page could be considered
 an object, t.i. a collection of attributes (fragments of text)  and
 methods (template chunks).

Labeled Section Transclusion can be used this way, but it's not very
efficient for this. Internally it uses generated regular expressions
to extract sections; you can peek at its source code at
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/LabeledSectionTransclusion/lst.php?view=markup.

--
Yours cordially,
Jesse (Pathoschild)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Simple Page Object model using #lst

2011-01-25 Thread Alex Brollo
2011/1/25 Jesse (Pathoschild) pathosch...@gmail.com

 On Tue, Jan 25, 2011 at 8:14 AM, Alex Brollo alex.bro...@gmail.com
 wrote:
  If this would happen, I imagine that the original page could be
 considered
  an object, t.i. a collection of attributes (fragments of text)  and
  methods (template chunks).

 Labeled Section Transclusion can be used this way, but it's not very
 efficient for this. Internally it uses generated regular expressions
 to extract sections; you can peek at its source code at
 
 http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/LabeledSectionTransclusion/lst.php?view=markup
 .


Thanks, but I'm far from understanding such a php code, nor I have any idea
about the whole exotic thing of wiki code parsing and html generation.
But, if I'd write something like #lst, I'd index text using section code
simply as delimiters, building something hidden like this into the wiki code
ot into another field of database:

!-- sections
s1[0:100]
s2 [120:20]
s3[200:150]
 --

where s1,s2,s3 are the section names and numbers the offset/length of the
text between section tags into the wiki page string; or something similar
to this, built to be extremely simple/fast  to parse and to give back
substrings of the page in the fastest, most efficient way. Such data should
be calculated only when a page content is changed. I guess, that efficiency
of sections would increase a lot, incouraging a larger use of #lst.

If such parsing of section text would be the first step of page parsing,
even segments of text delimited by noinclude tags could be retrieved.

Alex
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MATH markup question

2011-01-25 Thread phoebe ayers
On Sun, Jan 23, 2011 at 7:44 AM, Platonides platoni...@gmail.com wrote:
 Aryeh Gregor wrote:
 When I load their homepage, the formulas don't appear for about two
 seconds of 100% CPU usage, on Firefox 4b9.  And that's for two small
 formulas.  I'm not impressed.  IMO, the correct way forward is to work
 on native MathML support -- Gecko and WebKit both support it these
 days, and Opera somewhat does too.  I'm sure the support is a bit
 spotty, but if Wikipedia used it (even as an off-by-default option)
 that would surely drive a lot of progress.  These days (with the
 deployment of HTML5 parsers) it can be embedded directly into HTML,
 it's not limited to XML.

 Looking at http://www.mathjax.org/demos/tex-samples/ it may indeed take
 a couple of seconds to convert from TeX to the graphical view, but
 without 100% CPU usage or looking blocked. I'm not using 49b but
 3.6.12, though. I see a similar result in chromium.
 A disadvantage is that the showing the formula needs to reposition the
 content, instead of reserving the space in advance.

Delurking to say that while I don't know if it's useful for us at all,
Mathjax is getting lots of buzz in other settings (like publishing and
the science library world); and also I just today came across this
http://detexify.kirelabs.org/classify.html

It's not directly applicable but it is a fun usability idea for
turning symbols into LaTeX (and by extension I can imagine symbols to
markup, letters to unicode, etc.)

-- phoebe

-- 
* I use this address for lists; send personal messages to phoebe.ayers
at gmail.com *

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Simple Page Object model using #lst

2011-01-25 Thread Alex Brollo
2011/1/25 Alex Brollo alex.bro...@gmail.com

Just to test effectiveness of such a strange idea, I added some formal
section tags into a 6 Kby text section.txt, then I wrote a simple script to
create a data area , this is the result (a python dictionary into a html
comment code) appended to the section.txt file:

!--SECTIONS:{'section begin=1 /': [(152, 990), (1282, 2406), (4078,
4478)], 'section begin=6 /': [(19, 115)], 'section begin=2 /': [(2443,
2821), (2859, 3256)], 'section begin=4 /': [(1555, 1901)], 'section
begin=5 /': [(171, 477)], 'section begin=3 /': [(3704, 4042)]}--

then I run these lines from python idle:

 for i in range(1000):
f=open(section.txt).read()
indici=eval(find_stringa(f,!--SECTIONS:,--))
t=
for i in indici[section begin=1 /]:
t+=f[i[0]:i[1]]

As you see the code, for 1000 times:
opens the file and loads it
selects data area (find_stringa is a personal, string seach tool to get
strings), and converts it into a dictionary
retrieves all the text inside multiple sections named 1 (the worst case in
the list: section 1 has three instances: [(152, 990), (1282, 2406), (4078,
4478)]

Time to do 1000 cicles: more or less, 3 seconds on a far from powerful pc.
:-)
Fast, in my opinion!

So, it can be done, and it runs, in an effective way too. Doesn't it?

Alex
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Simple Page Object model using #lst

2011-01-25 Thread Platonides
Had LST used section name=foo /section to mark sections,
instead of section begin=foo /contentsection end=foo /, it
would be as easy as traversing the preprocessor output, which would
already have the sections splitted.

Alex Brollo wrote:
 2011/1/25 Alex Brollo alex.bro...@gmail.com
 
 Just to test effectiveness of such a strange idea, I added some formal
 section tags into a 6 Kby text section.txt, then I wrote a simple script to
 create a data area , this is the result (a python dictionary into a html
 comment code) appended to the section.txt file:
 
 !--SECTIONS:{'section begin=1 /': [(152, 990), (1282, 2406), (4078,
 4478)], 'section begin=6 /': [(19, 115)], 'section begin=2 /': [(2443,
 2821), (2859, 3256)], 'section begin=4 /': [(1555, 1901)], 'section
 begin=5 /': [(171, 477)], 'section begin=3 /': [(3704, 4042)]}--
 
 then I run these lines from python idle:
 
 for i in range(1000):
 f=open(section.txt).read()
 indici=eval(find_stringa(f,!--SECTIONS:,--))
 t=
 for i in indici[section begin=1 /]:
 t+=f[i[0]:i[1]]
 
 As you see the code, for 1000 times:
 opens the file and loads it
 selects data area (find_stringa is a personal, string seach tool to get
 strings), and converts it into a dictionary
 retrieves all the text inside multiple sections named 1 (the worst case in
 the list: section 1 has three instances: [(152, 990), (1282, 2406), (4078,
 4478)]
 
 Time to do 1000 cicles: more or less, 3 seconds on a far from powerful pc.
 :-)
 Fast, in my opinion!
 
 So, it can be done, and it runs, in an effective way too. Doesn't it?
 
 Alex

It can obviously be done. But you should compare it against the original
implementation. 3 seconds by itself isn't meaningful.
Another thing to test would be using stripos() instead of those regex,
in case it is faster.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Simple Page Object model using #lst

2011-01-25 Thread Brion Vibber
On Tue, Jan 25, 2011 at 10:27 AM, Platonides platoni...@gmail.com wrote:

 Had LST used section name=foo /section to mark sections,
 instead of section begin=foo /contentsection end=foo /, it
 would be as easy as traversing the preprocessor output, which would
 already have the sections splitted.


It was done this way in order to allow overlapping sections: LST was created
so arbitrary parts of a document on Wikisource can be quoted while retaining
a direct link to the original document as it continues to be edited.

Basically, the section markers are permanent markers for the source of a
copy-and-paste operation. One person might be copying from paragraph 1 to
paragraph 4; another might copy from paragraph 3 to paragraph 5; your page
structure looks like this:

  [page]
[section-open 1/]
[para 1/] !-- in section 1 only --
[para 2/] !-- in section 1 only --
[section-open 2/]
[para 3/] !-- in both section 1 and 2 --
[para 4/] !-- in both section 1 and 2 --
[section-close 1/]
[para 5/] !-- in section 2 only --
[section-close 2/]
  [/page]

Since the LST sections overlap, they don't really fit well in the
hierarchical structures that the preprocessor deals in except as standalone
start/end markers.

*BUT* ... it's probably possible to actually redo things to use that above
structure in a sensible way, instead of doing text regexes:

  iterate through the node tree:
if found desired section start node:
  start saving our spot
if found desired section end node:
  if start node was at same level:
grab everything in between
RETURN that to upstream parser
  else:
find the closed common parent node of start and end
build a node tree that has the parts of the start's parent before
the start trimmed, and the parts of the end's parent after the end trimmed
RETURN that to upstream parser

One could also pull the markers out of the original text and store them as
separate metadata in some way, which seems to be part of the suggestions
earlier in thread. The main problem here is that we could easily end up
losing track of the markers during editing; we have no persistent identity
for pieces of text, so if there's not a visible node in there for editors to
move  copy along with their alterations, they not be able to persist
automatically.

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] user email validation ready

2011-01-25 Thread Maciej Jaros
Brion Vibber (2011-01-25 02:51):
 On Mon, Jan 24, 2011 at 4:02 PM, Platonidesplatoni...@gmail.com  wrote:

 The original spec had feedback based precisely on enwiki numbers.
 http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/00.html

 So about 100? Note that there are invalid addresses marked as confirmed
 in wikipedia.

 Ok so from the breakdown at
 http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022237.html
 with 202 email address records that were marked as confirmed, but failed
 the proposed validation check at the time and couldn't be corrected by 
 stripping
 whitespace:
 [...]

Could you check for validated address containing commas in user names 
part? The RegExp from mediawiki.util.js did/does allow them.

Regards,
Nux.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] This template has to be warmed up before it can be used, for some reason

2011-01-25 Thread jidanni
Innocently browsing today, I encountered this HTML comment that gets
rendered due to -- instead of !-- ... I couldn't even find the
template that caused it, and leave it in your hands, as I've got to go,
as Dana Dane said.

$ w3m -dump http://en.wikipedia.org/wiki/Flatworm |head
Flatworm

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Good article
Flatworm!-- This template has to be warmed up before it can be used, for
some reason --

   Platyhelminth worms
Fossil range: 40–0 Ma^[1]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] This template has to be warmed up before it can be used, for some reason

2011-01-25 Thread MZMcBride
jida...@jidanni.org wrote:
 $ w3m -dump http://en.wikipedia.org/wiki/Flatworm |head
Flatworm

Simple typo in a template, fixed by OverlordQ:
http://en.wikipedia.org/w/index.php?diff=410094043oldid=408536727

Valid HTML comments in wikitext do not appear in the page source of rendered
pages. It _might_ be nice if HTML Tidy caught this error (omitting an
exclamation point), though.

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] user email validation ready

2011-01-25 Thread Ashar Voultoiz
On 25/01/11 23:37, Maciej Jaros wrote:
snip
 Could you check for validated address containing commas in user names
 part? The RegExp from mediawiki.util.js did/does allow them.

 Regards,
 Nux.

Nux opened bug 26948 for the comma issue (assigned myself).

   https://bugzilla.wikimedia.org/26948

-- 
Ashar Voultoiz


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l