Re: [Wikitech-l] Using wiki pages as databases

2013-02-22 Thread Johnuniq
On Feb 20, 2013 at 3:54 pm, Tim Starling wrote:
 The idea of storing a database in a large string literal could
 be made to be fairly efficient and user-friendly if a helper
 module was written to do parsing and a binary search.

I have implemented the above suggestion with some promising results.
Packing a large table in a string and unpacking it on demand appears
to work well, and the data is accessed as if it were stored in a
standard table. Using the table from Wiktionary Module:Languages
mentioned earlier in this thread, testing shows that accessing the
packed data is 20 times faster. Info is at

http://test2.wikipedia.org/wiki/User_talk:Johnuniq#Big_tables

Johnuniq

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-22 Thread Brad Jorsch
On Fri, Feb 22, 2013 at 4:41 AM, Johnuniq wp.johnu...@gmail.com wrote:
 On Feb 20, 2013 at 3:54 pm, Tim Starling wrote:
 The idea of storing a database in a large string literal could
 be made to be fairly efficient and user-friendly if a helper
 module was written to do parsing and a binary search.

 I have implemented the above suggestion with some promising results.
 Packing a large table in a string and unpacking it on demand appears
 to work well, and the data is accessed as if it were stored in a
 standard table. Using the table from Wiktionary Module:Languages
 mentioned earlier in this thread, testing shows that accessing the
 packed data is 20 times faster. Info is at

 http://test2.wikipedia.org/wiki/User_talk:Johnuniq#Big_tables

Note that https://gerrit.wikimedia.org/r/#/c/50299/ added a
mw.loadData() function that should solve the problem for normal
tables. It works like require, but can only handle simple data (no
functions, tables with metatables, or tables with tables as keys), the
returned data structure is made read-only, and it avoids having to
re-execute the module chunk on every #invoke.

Speaking of which, I need to update the documentation.

-- 
Brad Jorsch
Software Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-20 Thread Ori Livneh



On Tuesday, February 19, 2013 at 4:27 AM, Tim Starling wrote:

 On 19/02/13 21:11, MZMcBride wrote:
  Hi.
  
  In the context of https://bugzilla.wikimedia.org/show_bug.cgi?id=10621,
  the concept of using wiki pages as databases has come up. We're already
  beginning to see this:
  
  * https://en.wiktionary.org/wiki/Module:languages (over 30,000 lines)
  * https://en.wikipedia.org/wiki/Module:Convertdata (over 7,400 lines)
  
  At large enough sizes, the in-browser syntax highlighting is currently
  problematic.
  
 
 
 We can disable syntax highlighting over some size.

https://gerrit.wikimedia.org/r/#/c/49985/ disables the highlighting of symbols 
if it looks like there may be a lot of them. Patched in SyntaxHighlight_GeSHi 
since the problem is not specific to Lua or Scribunto.

--
Ori Livneh

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-19 Thread Johnuniq
On Feb 19, 2013 at 9:11 PM, MZMcBride wrote:
 https://en.wikipedia.org/wiki/Module:Convertdata

I'm guilty of that, and what's been worrying me is that there are
hundreds more units to add. Some guidance on using Lua as a database
would be very desirable.

Quick tests suggest that if {{convert}} is used 100 times on a page
(where that template invokes Module:Convert, which requires
Module:Convertdata), then Convertdata is loaded 100 times. I've
wondered if there might be a pragma in a module like that to set read
only (at least a promise of read only, even if it were not enforced),
then more aggressively cache the bytecode so it is loaded once only
per page render, or even once only until the cache memory is flushed.

Or, if performance due to such module abuse is a problem, the data
could be split into, say, ten modules, and the code accessing the data
could work out which of the smaller data modules needed to be
required. I'm not going to worry about that until I have to, but some
guidance would be good.

I just had a quick look at one test page which invokes the module 66
times, and the NewPP limit report in the html source says Lua time
usage: 0.324s (5 ms/invoke).
http://en.wikipedia.org/wiki/Template:Convert/testcases/bytype/time

Johnuniq

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-19 Thread Tim Starling
On 19/02/13 21:11, MZMcBride wrote:
 Hi.
 
 In the context of https://bugzilla.wikimedia.org/show_bug.cgi?id=10621,
 the concept of using wiki pages as databases has come up. We're already
 beginning to see this:
 
 * https://en.wiktionary.org/wiki/Module:languages (over 30,000 lines)
 * https://en.wikipedia.org/wiki/Module:Convertdata (over 7,400 lines)
 
 At large enough sizes, the in-browser syntax highlighting is currently
 problematic.

We can disable syntax highlighting over some size.

 But it's also becoming clear that the larger underlying
 problem is that using a single request wiki page as a database isn't
 really scalable or sane.

The performance of #invoke should be OK for modules up to
$wgMaxArticleSize (2MB). Whether the edit interface is usable at such
a size is another question.

 (ParserFunction #switch's performance used to prohibit most ideas of using
 a wiki page as a database, as I understand it.)

Both Lua and #switch have O(N) time order in this use case, but the
constant you multiply by N is hundreds of times smaller for Lua.

 Has any thought been given to what to do about this? Will it require
 manually paginating the data over collections of wiki pages? Will this be
 something to use Wikidata for?

Ultimately, I would like it to be addressed in Wikidata. In the
meantime, multi-megabyte datasets will have to be split up, for
$wgMaxArticleSize if nothing else.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-19 Thread Tyler Romeo
So unfortunately I don't have a clear idea of what the problem is,
primarily because I don't know anything about the Parser and its inner
workings, but as far as having all the data in one page, here's something.
Maybe this is a bad idea, but how about having a PHP-array content type. In
other words, MyNamespace:MyPage would render the entire data structure, but
MyNamespace:MyPage/index/test/0 would take $arr['index']['test'][0]. In the
database, it would be stored as individual sub-pages, and leaf sub-pages
would render exactly like a normal page would, but non-leaf pages would
build the array from all child sub-pages and display it to the user. Would
this solve the problem? Because if so, I've put some thought into it and
would be willing to maybe draft an extension giving such a capability.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com


On Tue, Feb 19, 2013 at 7:27 AM, Tim Starling tstarl...@wikimedia.orgwrote:

 On 19/02/13 21:11, MZMcBride wrote:
  Hi.
 
  In the context of https://bugzilla.wikimedia.org/show_bug.cgi?id=10621
 ,
  the concept of using wiki pages as databases has come up. We're already
  beginning to see this:
 
  * https://en.wiktionary.org/wiki/Module:languages (over 30,000 lines)
  * https://en.wikipedia.org/wiki/Module:Convertdata (over 7,400 lines)
 
  At large enough sizes, the in-browser syntax highlighting is currently
  problematic.

 We can disable syntax highlighting over some size.

  But it's also becoming clear that the larger underlying
  problem is that using a single request wiki page as a database isn't
  really scalable or sane.

 The performance of #invoke should be OK for modules up to
 $wgMaxArticleSize (2MB). Whether the edit interface is usable at such
 a size is another question.

  (ParserFunction #switch's performance used to prohibit most ideas of
 using
  a wiki page as a database, as I understand it.)

 Both Lua and #switch have O(N) time order in this use case, but the
 constant you multiply by N is hundreds of times smaller for Lua.

  Has any thought been given to what to do about this? Will it require
  manually paginating the data over collections of wiki pages? Will this be
  something to use Wikidata for?

 Ultimately, I would like it to be addressed in Wikidata. In the
 meantime, multi-megabyte datasets will have to be split up, for
 $wgMaxArticleSize if nothing else.

 -- Tim Starling


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-19 Thread Denny Vrandečić
2013/2/19 Tim Starling tstarl...@wikimedia.org

 On 19/02/13 21:11, MZMcBride wrote: Has any thought been given to what to
 do about this? Will it require
  manually paginating the data over collections of wiki pages? Will this be
  something to use Wikidata for?

 Ultimately, I would like it to be addressed in Wikidata. In the
 meantime, multi-megabyte datasets will have to be split up, for
 $wgMaxArticleSize if nothing else.



I expect that, in time, Wikidata will be able to serve some of those
usecase, e.g. the one given by the languages Module on Wiktionary. I am
quite excited about the possibilities that access to Wikidata together with
Lua will be enabling within a year or so... :)

Not all use cases though should and will be handled by Wikidata obviously,
but some of those huge switches definitively can be saved in Wikidata items.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-19 Thread Brad Jorsch
In the long term, Wikidata is probably the way to go on something like this.

In the short term, as far as dividing things up, note that you can
implement on-demand loading in Lua easily enough using the __index
metamethod.

  local obj = {}

  setmetatable( obj, {
  __index = function ( t, k )
  -- This will get called on access of obj[k] if it is not already
set.
  -- Do whatever you might need, e.g. require() a submodule,
  -- assign things to t for future lookups, then return the
requested k.
  end
  } )

  return obj

Also note that you can save space at the expense of code complexity by
accessing obj.us_name or obj.name rather than storing the same string in
both fields; remember in Lua only nil (unset) and boolean false are
considered false, the number 0 and the empty string are both considered
true.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-19 Thread Platonides
On 19/02/13 13:56, Tyler Romeo wrote:
 So unfortunately I don't have a clear idea of what the problem is,
 primarily because I don't know anything about the Parser and its inner
 workings, but as far as having all the data in one page, here's something.
 Maybe this is a bad idea, but how about having a PHP-array content type. In
 other words, MyNamespace:MyPage would render the entire data structure, but
 MyNamespace:MyPage/index/test/0 would take $arr['index']['test'][0]. In the
 database, it would be stored as individual sub-pages, and leaf sub-pages
 would render exactly like a normal page would, but non-leaf pages would
 build the array from all child sub-pages and display it to the user. Would
 this solve the problem? Because if so, I've put some thought into it and
 would be willing to maybe draft an extension giving such a capability.

You can already use subpages to store data. Access is then O(1) The
problem is that then you have one page per entry.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-19 Thread Tim Starling
I wrote:
 The performance of #invoke should be OK for modules up to
 $wgMaxArticleSize (2MB). Whether the edit interface is usable at such
 a size is another question.

The Wiktionary folk are gnashing their teeth today when they
discovered that in fact, loading a 742KB module 1200 times in a single
page does in fact take a long time, and it trips the CPU limit after
about 450 invocations . So, sorry for raising expectations about that.

-- Tim Starling



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-19 Thread Tyler Romeo

 You can already use subpages to store data. Access is then O(1) The
 problem is that then you have one page per entry.

I know. What I'm suggesting is an interface where the sub-pages aggregate
up the hierarchy, meaning you can still edit the main top-level page, and
the backend will simply update the sub-pages as appropriate.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com


On Tue, Feb 19, 2013 at 5:52 PM, Platonides platoni...@gmail.com wrote:

 On 19/02/13 13:56, Tyler Romeo wrote:
  So unfortunately I don't have a clear idea of what the problem is,
  primarily because I don't know anything about the Parser and its inner
  workings, but as far as having all the data in one page, here's
 something.
  Maybe this is a bad idea, but how about having a PHP-array content type.
 In
  other words, MyNamespace:MyPage would render the entire data structure,
 but
  MyNamespace:MyPage/index/test/0 would take $arr['index']['test'][0]. In
 the
  database, it would be stored as individual sub-pages, and leaf sub-pages
  would render exactly like a normal page would, but non-leaf pages would
  build the array from all child sub-pages and display it to the user.
 Would
  this solve the problem? Because if so, I've put some thought into it and
  would be willing to maybe draft an extension giving such a capability.

 You can already use subpages to store data. Access is then O(1) The
 problem is that then you have one page per entry.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-19 Thread Victor Vasiliev
On 02/19/2013 06:21 PM, Tim Starling wrote:
 The Wiktionary folk are gnashing their teeth today when they
 discovered that in fact, loading a 742KB module 1200 times in a single
 page does in fact take a long time, and it trips the CPU limit after
 about 450 invocations . So, sorry for raising expectations about that.
 
 -- Tim Starling
 

Aren't modules which are already loaded cached, so if they load it 1200
times on a single page, how does it manage to affect CPU time that badly?

-- Victor.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Using wiki pages as databases

2013-02-19 Thread Tim Starling
On 20/02/13 15:07, Victor Vasiliev wrote:
 On 02/19/2013 06:21 PM, Tim Starling wrote:
 The Wiktionary folk are gnashing their teeth today when they
 discovered that in fact, loading a 742KB module 1200 times in a single
 page does in fact take a long time, and it trips the CPU limit after
 about 450 invocations . So, sorry for raising expectations about that.

 -- Tim Starling

 
 Aren't modules which are already loaded cached, so if they load it 1200
 times on a single page, how does it manage to affect CPU time that badly?

Execution of the module chunk seems to be the main reason. I
benchmarked it locally at 10.6ms, so 450 of those would be 4.8s.

Lua has a lot of O(N) work to do when a large table literal is
executed. I'm experimenting with using large string literals instead:

https://en.wiktionary.org/w/index.php?title=Module:Languages_string_dbaction=edit

That module takes about 2us for module chunk execution, when I run it
locally, and around 30us for each lookup in a tight loop on the server
side. But when I use it in a large article, it seems to use about
1.4ms per #invoke, so maybe there's still some overhead that needs to
be tracked down.

The idea of storing a database in a large string literal could be made
to be fairly efficient and user-friendly if a helper module was
written to do parsing and a binary search.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l