Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)
FWIW, we do index the full text of (PDF and?) DjVu files on Commons (because it's stored in img_metadata). It's probably the biggest improvement CirrusSearch brought for Commons. And we also index office documents via Tika (*.doc and similar). And I think it should not be a feature of the search engine at all! It's a separate feature that's completely independent of the search engine used (that's how it's implemented in my TikaMW). So, is there any replacement for the SearchUpdate hook to modify the indexed text? Of course I can just return SearchUpdate back by including a patch in our distribution mediawiki4intranet, but I would prefer if TikaMW didn't require patching... ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)
I've written about my problem ~2 years ago: http://wikitech-l.wikimedia.narkive.com/6G0YPmWQ/need-a-way-to-modify-text-before-indexing-was-searchupdate It seems I've lost the latest message, so I want to answer to it now: With lsearchd and Elasticsearch, we absolutely wouldn't want to munge file text into page content (with sql-backed search, you might maybe). Why?? Aren't these also just the fulltext search backends? As I understand they're much faster than sql-backed search engines. What would prevent them to store file texts? Personally I use Sphinx (http://sphinxsearch.com) with TikaMW, and of course everything is fine. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)
Hi! Change https://gerrit.wikimedia.org/r/#/c/79025/ that was merged to 1.22 breaks my TikaMW extension - I used that hook to extract contents from binary files so the user can then search on it. Maybe you can add some other hook for this purpose? See also https://github.com/mediawiki4intranet/TikaMW/issues/2 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] On your python vs php talk
It's not bad design. It's bad only theoretically and just different from strongly-typed languages. I like its inconsistent function names - for a lot of functions they're similar to C and in most cases they're very easy to remember, as opposed to some other languages, including python (!!). Of course there are some nuances, but they're in any language. And I personally think 10 is semantically equal to 10 in most cases, so comparison is not a problem, either. You just need to be slightly more accurate while writing things. And my main idea is that only a statically typed should try to be strict. And python very oddly tries to be strict in some places while being dynamically typed. Look, it doesn't concatenate string and long - even Java does that! ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ???!!! ResourceLoader loading extension CSS DYNAMICALLY?!!
Hi! Sorry for not answering via normal Reply, it's because I'm getting messages in digests. But I want to say thanks for clarification and for position=top advice - it's OK with position=top. Thanks :) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] ???!!! ResourceLoader loading extension CSS DYNAMICALLY?!!
Hello! I've got a serious issue with ResourceLoader. WHAT FOR it's made to load extension styles_ DYNAMICALLY using JavaScript? It's a very bad idea, it leads to page style flickering during load. I.e. first the page is displayed using only skin CSS and then you see how extension styles are dynamically applied to it. Of course it's still rather fast, but it's definitely noticeable, even in Chrome. Why didn't you just output link rel=stylesheet href=load.php?ALL MODULES / ?? Am I free to implement it and submit a patch? -- With best regards, Vitaliy Filippov ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Publish-staying-in-editmode feature for WikiEditor
Hello! I have implemented an idea for WikiEditor extension: replace step-by-step publish feature with another one - publish staying in edit mode via AJAX. You can see a demo at http://wiki.4intra.net/ if you want. It works simply by sending an API save article request while NOT closing the article being edited. Also it handles section edits correctly via re-requesting section content after editing, so you'll stay with consistent edit form even if you add sections. The idea is to give authors the ability to save intermediate results. My question is - does anyone really need step-by-step publishing feature that is in WikiEditor? I think it's useless because it just duplicates the existing functionality, just submits the form using normal POST request, and makes editing harder as you have to do more clicks. I would submit a patch to Gerrit if you're interested in replacing it with publish-staying-in-editmode. -- With best regards, Vitaliy Filippov ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Removing the Hooks class
You can't cache program state and loaded code like that in PHP. We explicitly have to abuse the autoloader and develop other patterns to avoid loading unused portions of code because if we don't our initialization is unreasonably long. Yeah, I understand it, the idea was to serialize globals like $wgHooks $wgAutoloadClasses and etc - and load them in the beginning of each request... So each extension would be separated in two parts: (1) metadata, executed once and then cached and (2) classes, cached by opcode cacher and loaded by a slim autoloader. By this approach you'll get rid of executing even the main file of each extension; the downside is that it of course would require some extension rewriting. I'm curious is such feature going to result in any performance benefit or not :) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Removing the Hooks class
Hey, I'm curious what the list thinks of deprecating and eventually removing the Hooks class. Some relevant info: /** * Hooks class. * * Used to supersede $wgHooks, because globals are EVIL. * * @since 1.18 */ https://github.com/wikimedia/mediawiki-core/blob/master/includes/Hooks.php#L30 I personally find the comment hilarious and hope you see why when looking at the class. Looks like usage in core and extensions is not to extensive, so switching to something more sane seems quite feasible. I second that! Also I have an idea: maybe it would be good for mediawiki it the initialisation state along with all constants/global variables/extension metadata/preinitialised parser/preloaded PHP files, could be cached somewhere as the whole and just loaded on each request instead of being sequentially initialised? (extension metadata = hooks/special pages/i18n files/resource modules/credits/etc) If it's a good idea then I think sequential setting of hooks via a method call is not that good? (because hooks become even less declarative?) Or it's not worth it and the initialisation overhead is small? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WebRequest and PHP bug 31892 fixed 6 years ago
fixing bug 32621 is a todo. The first attempt failed and some tweaks are needed to use the PathRouter to fix that bug. PathRouter allows for the use of custom paths to expand. NamespacePaths is an example of one thing you can do (say giving Help: pages a /help/ path) but you could also apply that to special pages, etc... whatever. It's also the precursor to MW being able to handle 404s natively. The plan is in the future you'll just be able to throw everything that's not a file right at index.php and pretty urls, 404 pages, 404 thumbnail handlers, etc... will all just work natively without any special configuration. And by 404, I don't mean standard 404 pages like this: http://wiki.commonjs.org/404 I mean nice in-site 404 pages that actually help visitors find what they were looking for: http://www.dragonballencyclopedia.com/404 Not sure how PATH_INFO being unmangled fixes anything. There are other servers where PATH_INFO won't easily be outputted. REQUEST_URI handling works better in every case. And ?title=$1 in rewrite rules are evil. Determining what urls run what code has always been the job of the application in every good language, not the webserver. And we can do it using REQUEST_URI much more reliably than some webservers. Anyways, I wish I could just get rid of the PATH_INFO code. I have yet to hear of someone actually using it now that practically every webserver there is outputs REQUEST_URI meaning the PATH_INFO code is never reached. Thanks for answering! But wasn't all that possible with just using something like $wgActionPaths? Unmangled PATH_INFO allows for a single rewrite rule like (.*) - index.php/$1 to and you won't need to strip base from URIs (yet of course it's not hard) And you say PATH_INFO is unavailable on some configurations - can you please clarify what are these configurations? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WebRequest and PHP bug 31892 fixed 6 years ago
And what is the point of making pretty urls in case of MediaWiki? I think they're already pretty much pretty in MediaWiki :) /edit/$1 is slightly prettier than ?action=edit, but as I understand it doesn't affect anything, even like SEO. And I don't think /help/$1 is any better than /Help:$1 at all... ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] WebRequest and PHP bug 31892 fixed 6 years ago
Hello! WebRequest::getPathInfo() still depends on PHP bug 31892 fixed 6 years ago. I.e. WebRequest uses REQUEST_URI instead of mangled PATH_INFO which is not mangled since PHP 5.2.4. Yeah, Apache still replaces multiple /// with single /, but afaik it's done for REQUEST_URI as well as PATH_INFO. Maybe that part of the code should be removed? Also I don't understand the need for PathRouter - my IMHO is that it's just an unnecessary sophistication. As I understand EVERYTHING worked without it and there is no feature in MediaWiki which depends on a router. Am I correct? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WebRequest and PHP bug 31892 fixed 6 years ago
I doubt Daniel would have introduced it if it was un-necessary or pointless, I believe from memory it was to improve the handling of paths over a wide range of set-ups and environments (where sometimes it would fail). You would need to git blame the file and find the revision where it was introduced to confirm if that is truly the case (or if i'm mistaking it for other code) I've looked at the annotations and what I've seen is that PathRouter only fixes https://bugzilla.wikimedia.org/show_bug.cgi?id=32621 by using path weights. Actually, I've started looking at the routing code after hitting this same bug with img_auth.php action path. But as I understand, it could be fixed much simpler just by reordering two parts of existing code and examining $wgArticlePath after $wgActionPaths :) And a single extension using the PathRouter is http://www.mediawiki.org/wiki/Extension:NamespacePaths ... Of course I support new features, there are some features that I myself would want to be in MW core :-) And I'm sure my point of view may be incorrect :-) but MW trunk (i.e. master) slightly frigtens me when compared to previous versions - the codebase seems to grow and grow and grow having more and more and more different helpers... And it becomes more and more complex with no simplification effort... (or maybe I'm just not aware of it) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Seemingly proprietary Javascript
I would just like to note that while it may be silly or useless to insert licenses into minified JavaScript, it is nonetheless *legally required* to do so, regardless of the technical aspect of it. My 2 points - during my own research about free licenses, I've decided that for JS, a good license is MPL 2.0: http://www.mozilla.org/MPL/2.0/ Its advantages are: 1) It's strong file-level copyleft. File-level is good for JS, because it eliminates any problems of deciding whether a *.js file is or is not a part of a derivative work, and any problems of using together with differently licensed JS. 2) It's explicitly compatible with GPLv2+, LGPLv2.1+ or AGPLv3+. Incompatibility problem of MPL 1.1 caused triple licensing of Firefox (GPL/LGPL/MPL). 3) It does not require you to include long notices into every file. You only must inform recipients that the Source Code Form of the Covered Software is governed by the terms of this License, and how they can obtain a copy of this License. You may even not include any notice in files themselves provided that you include it in some place where a recipient would be likely to look for such a notice. Also, what I've understood also was that CC-BY-SA is not good for source code at all, at least because it's incompatible with GPL. So CC-BY-SA licensed JS may be a problem. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WikiEditor caching (??)
It's also annoying that while the toolbar (normal or advanced) loads I can't type in the header (for section=new) or the edit area, at least on Firefox:* is this the same problem? (*) Might also be a recent regression: https://bugzilla.mozilla.org/show_bug.cgi?id=795232 Maybe... It's also anyway annoying that the toolbar jumps down some moments after loading the page... If WikiEditor wasn't implemented in _pure_ JS, the panel would be generated by php so this problem wouldn't exist... ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Creating WOFF files -- sub-setting larger fonts
By the way, I've just tried to use ttf2woff from fontutils to convert Ubuntu TTF font to WOFF format for use in one of my projects. And the resulting WOFF produced by this utility is not usable in any Linux browsers (tried Firefox, Chrome, Opera). Don't know if it works on Windows. And at the same time some random online font converter produced normal WOFF from the same TTF. I've reported this bug at CPAN: https://rt.cpan.org/Public/Bug/Display.html?id=83377 Links to font files, for the reference: * Source TTF: http://vmx.yourcmc.ru/var/ttf2woff-bug/ubuntu.ttf * Bad WOFF (by ttf2woff): http://vmx.yourcmc.ru/var/ttf2woff-bug/ubuntu-bad.woff * Good WOFF (by online converter): http://vmx.yourcmc.ru/var/ttf2woff-bug/ubuntu-good.woff ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Creating WOFF files -- sub-setting larger fonts
By the way, I've just tried to use ttf2woff from fontutils to convert Ubuntu TTF font to WOFF format for use in one of my projects. And the resulting WOFF produced by this utility is not usable in any Linux browsers (tried Firefox, Chrome, Opera). Don't know if it works on Windows. And at the same time some random online font converter produced normal WOFF from the same TTF. I've reported this bug at CPAN: https://rt.cpan.org/Public/Bug/Display.html?id=83377 The files are attached for reference - source TTF, WOFF created by ttf2woff (ubuntu-bad.woff) and normal WOFF created by online converter (ubuntu-good.woff).___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Creating WOFF files -- sub-setting larger fonts
Fontforge has an option to export the fonts to WOFF format. Thanks, Fontforge worked even better than the online converter - usable WOFF, and the size is 50kb instead of 54kb :-) [1] http://code.google.com/p/sfntly/ [2] http://code.google.com/p/sfntly/wiki/MicroTypeExpress As I understood, sfntly is just the library, so do you use some utility of your own? Is it available somewhere? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WikiEditor caching (??)
vita...@yourcmc.ru wrote 2013-02-14 21:38: Hello Wiki Developers! I have a question: I think it's slightly annoying that WikiEditor shows up only some moment after the editing page loads and that the textarea gets moved down (because WikiEditor is only built dynamically via JS). Do you think it's possible to cache the generated WikiEditor HTML code in some way to speed up loading? Anyone? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Corporate needs are different (RE: How can we help Corporations use MW?)
There are so many extensions useful to the enterprise but probably also so many which are not useful at all or not maintained and if I wanted to start a corporate wiki right now I would probably be very lost what to look at and how people do things, so it seemed like a good idea to list the extensions that ARE actually used. Also, I guess one team solved a certain problem one way, while another solved it differenty, using a different extension or set of extensions, so writing this out might help everybody get new ideas/ avoid reinventing the wheel. But I guess I either asked on the wrong list or there is not much interest at all. So, you're talking about some basic set of extensions that are thought to be definitely useful for ALL people? It may be useful, but I think that it anyway may require testing of a complete distribution (MW version X + all these extensions) to recommend it to companies... And this returns us to the idea of a pre-built distribution like our one :-)) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Corporate needs are different (RE: How can we help Corporations use MW?)
I guess this would not directly solve any of the problems listed, but would it be helpful to bring back to life https://www.mediawiki.org/wiki/Enterprise_hub ? It was started by somebody an year or two ago but seems to have been abandoned at a draft stage. I am thinking if everybody adds some information about extensions/pages they find particularly useful in the enterprise world, it will help future users but also help current enterprise wikis exchange experience. Does this seem worthwhile? IMHO there are so much useful extensions that I think it can be a little much for that page. For example if I edited that article I would put almost all extensions from our distribution there... so I'm documenting them on http://wiki.4intra.net/Category:Mediawiki4Intranet_extensions :-) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] WikiEditor caching (??)
Hello Wiki Developers! I have a question: I think it's slightly annoying that WikiEditor shows up only some moment after the editing page loads and that the textarea gets moved down (because WikiEditor is only built dynamically via JS). Do you think it's possible to cache the generated WikiEditor HTML code in some way to speed up loading? -- With best regards, Vitaliy Filippov ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Stable PHP API for MediaWiki ?
I understand from your comments that keeping things stable and preserving compatibiliy HAS been a priority for core developers at least since Daniel's email. Is this really the case? If this is the case, it makes me wonder why I hear some complaints about it. Mariya, but did you hear that many complaints? :-) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Stable PHP API for MediaWiki ?
1) removal of global $action 2) removal of Xml::hidden() 3) broken Output::add() (had to migrate to resource loader) 4) various parser tag bugs 5) removal of MessageCache::addMessage() 6) removal of ts_makeSortable() (javascript) 7) brokage of WikiEditor adaptation 8) MediaWiki:common.js no more loading by default (security) 9) addHandler() javascript broken in IE8 Most of these were deprecations, am I correct? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Corporate needs are different (RE: How can we help Corporations use MW?)
1. A desire for a department to have their own space on the wiki. In our organisation (CUSTIS, Russia) we easily solve it by creating one primary wiki + separate ones for different departments. It's just a normal wiki family with shared code. Very simple solution without any extensions. The main disadvantage is inability to search on all wikis with a single search request, but in practice I've had very little requests for this feature. So it's probably not that oftenly needed. I'm not talking about access control And we also have IntraACL for access control (forked from HaloACL). Still not an ideal solution, but we'll probably improve it more. 2. Hierarchy. Departments want not only their own space, they want subspaces beneath it. For example, Human Resources wiki area with sub-areas of Payroll, Benefits, and Recruiting. I realize Confluence supports this... but we decided against Confluence because you have to choose an article's area when you create it (at least when we evaluated Confluence years ago). This is a mental barrier to creating an article, if you don't know where you want to put it yet. MediaWiki is so much better in this regard -- if you want an article, just make it, and don't worry where it goes since the main namespace is flat. I've been thinking about writing an extension that superimposes a hierarchy on existing namespaces, and what the implications would be for the rest of the MediaWiki UI. It's an interesting problem. Anyone tried it? 3. Tools for organizing large groups of articles. Categories and namespaces are great, and the DPL extension helps a lot. But when (say) the Legal department creates 700 articles that all begin with the words Legal department (e.g., Legal department policies, Legal department meeting 2012-07-01, Legal department lunch, etc.), suddenly the AJAX auto-suggest search box becomes a real pain for finding Legal department articles. This is SO COMMON in a corporate environment with many departments, as people try to game the search box by titling all their articles with Legal department... until suddenly it doesn't scale and they're stuck. I'd like to see tools for easily retitling and recategorizing large numbers of articles at once. Recategorising is very simple with global search-and-replace. Our implementation is called BatchEditor https://github.com/mediawiki4intranet/BatchEditor 4. Integration with popular corporate tools like MS Office, MS Exchange, etc. We've spent thousands of hours doing this: for example, an extension that embeds an Excel spreadsheet in a wiki page (read-only, using a $10,000 commercial Excel-to-HTML translator as a back-end), and we're looking at embedding Exchange calendars in wiki pages next. O_O $1 excel-to-html? O_OOO Why not just copy-paste into for example wikEd (google://wikEd)? :-))) Not that beautiful, but it works. 5. Corporate reorganizations and article titles. In any company, the names and relationships of departments change. What do you do when 10,000 wiki links refer to the old department name? Sure, you can move the article Finance department to Global Finance department and let redirects handle the rest: now your links work. But they still have the old department name, and global search-and-replace is truly scary when wikitext might get altered by accident. Also, there's the category called Finance department. You can't rename categories easily. I know you can do it with Pywikipedia, but it's slow and risky (e.g., Pywikipedia used to have a bug that killed noinclude tags around categories it changed). Categories should be fully first-class so renames are as simple as article title changes. Mass editing tool = BatchEditor, as I've already said. But I agree that Mediawiki needs better mass editing, page selection and page exchanging (import/export) tools. In our distribution (mediawiki4intranet) we partially solve it by implementing selection on Special:Export. BatchEditor uses this implementation when it's available. (you can see examples at http://wiki.4intra.net/Special:Export and http://wiki.4intra.net/Special:BatchEditor) (also we have an improved import/export functionality but unfortunately it's a code bomb and reworking to get it in trunk will take a lot of time...) But it's only a partial solution, because it has no standard interface. So we also have a variation of DPL, also we have SemanticMediaWiki. And all of them has partially the same - but not totally the same - functionality. And it would be good if there existed a single, standartized, optimised and cacheable method of page selection. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Corporate needs are different (RE: How can we help Corporations use MW?)
In practice, we have found this doesn't work well for us (with thousands of employees). Yeah, our company doesn't have thousands of employees :-) Each department winds up writing its own wiki page about the same topic (say, Topic X), and they're all different. So it means most of your departments work on something very similar? Probably we don't have this problem because our departments and projects strongly differ, so everyone just writes their specific articles to their own wikis and general information to the primary CustisWiki. We have ~7 wikis for the whole company (~200 employees). Users don't know which one is the real or right article. We find it better to have one central wiki with one definitive article per topic. No redundancy, no coupling, and no version skew between wikis. Just an idea - you can also setup the replication process between wikis to ease fighting Thanks, I'll check it out. Categorization can get very complicated on a MediaWiki system though. Consider this fairly simple template example: {{#if:{{{department|}}} | [[Category:{{{department}}} projects]]}} I would be amazed if any global search-and-replace could handle this! Such examples of course are much harder, but if there is not much chaos, you can handle it with regexps... Not a task for an average user, but he can ask someone who knows regexps to do it :-) With our extension, the Excel spreadsheet is rendered live in the wiki page. Ooh, I see, of course it's a big feature! Also another question - didn't you try to use some automation using excel itself to save xls as an html? We started looking into Semantic MediaWiki - it has impressive features. But we got scared off by stories that it slows down the wiki too much. Maybe we should give it another look. As someone already said, it should not affect performance noticeably if you don't abuse it. And also, even if use abuse it - it has a very good feature: concept caching, i.e. caching of semantic query results with correct invalidation (as I understand it has some limitations though). (http://semantic-mediawiki.org/wiki/Help:Concept_caching) Overall, it's very nice to see that a big company like yours has successful MediaWiki usage experience (I assume it's successful, yeah? :)) Do you have any extensions or modifications that you would like to make public free open source? Or maybe you even already did it with something? :-) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Why are we still using captchas on WMF sites?
Per the previous comments in this post, anything over 1% precision should be regarded as failure, and our Fancy Captcha was at 25% a year ago. So yeah, approximately all, and our captcha is well known to actually suck. Maybe you'll just use recaptcha instead of fancycaptcha? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Why are we still using captchas on WMF sites?
The problem is that reCaptcha (a) used as a service, would pass private user data to a third party (b) is closed source, so we can' t just put up our own instance. Has anyone reimplemented it or any of it? There's piles of stuff on Wikisource we could feed it, for example. OK, then we can take KCaptcha and integrate it as an extension. It's russian project, I've used it many times. Seems to be rather strong. http://www.captcha.ru/en/kcaptcha/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Why are we still using captchas on WMF sites?
Luke Welling WMF писал 2013-01-22 21:59: Even ignoring openness and privacy, exactly the same problems are present with reCAPTCHA as with Fancy Captcha. It's often very hard or impossible for humans to read, and is a big enough target to have been broken by various people. It's very good to discuss, but what are the other options to minimize spam? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Why are we still using captchas on WMF sites?
It's very good to discuss, but what are the other options to minimize spam? (maybe I know one: find XRumer authors and tear their arms off... :-)) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki Extension Bundles and Template Bundles
On 01/14/2013 10:20 AM, Yuvi Panda wrote: Is there a sort of 'Extension Bundle' that gets you baseline stuff that people who are used to wikipedia 'expect'? ParserFunctions and Cite come to mind, but I'm sure there are plenty others. I don't know if this would be relevant to your question, but I have to say that in our company we maintain and use our own MediaWiki distribution Mediawiki4Intranet (http://github.com/mediawiki4intranet, http://wiki.4intra.net/) for all MW installations. It includes ~75 extensions, the set is not totally similar to WMF one, but we think it's good for corporate (intranet) usage. You can try it out if you want, yet some extensions are documented only in russian :-) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Fwd: Re: How to speed up the review in gerrit?
Actually registration is open to everyone now by simple form submission. So actually, any one developer could get any change they wanted merged. All they need to do is trivially register a second labs account. Okay, but current situation is also a problem, because with it reviewing and merging takes much more time. And as I've said, I think most extensions aren't as important as the core, and limitting approve for them to core developers is just a waste... Maybe you should add some group similar to previous (SVN) commit access to extensions, so a wider group of people could merge changes to the extensions? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Fwd: Re: How to speed up the review in gerrit?
Sorry, I've replied to Sumana directly instead of the mailing list. So now duplicating into the mailing list. Sumana Harihareswara писал 2012-12-19 22:30: Try these tips: https://www.mediawiki.org/wiki/Git/Code_review/Getting_reviews Sumana, it's all very good but: 1) I think it's not so comfortable to push other developers personally when adding them as the reviewers... And I don't know whom to add as the reviewer, so I just choose randomly. But what if that guy doesn't want to do review for that extension? For example what if he is already very busy in working on mediawiki _core_, and I ask him to review a trivial extension? 2) Who can verify changes in extensions? There is no CI. So, people who can verify changes and people who can put +2 - are they the same people? But it again leads to short-circuiting all the work to the core people, and aren't they already busy? (I assume they are as they don't review all the changes) 3) As a solution, I think it would be good if - at least in not-so-important-as-the-core extensions - the changes merged automatically after getting, for example, 2x +1... Or will you end up with changes reviewed by not merged by anyone? And also, maybe it would also be good if the system automatically added some reviewers - randomly or based on some ownership rules... ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] How to speed up the review in gerrit?
Hello! 28 SEPTEMBER I've pushed minor changes to the gerrit, to the Drafts extensions. Since then I've corrected two of them (uploaded patch set 2), but after that, nobody did the review. As I understand, Gerrit will abandon changes after a month of inactivity, and it will come tomorrow... The changes are really simple. How to ask someone to really do the review? Does Gerrit have such function? Thanks in advance, Vitaliy Filippov ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How to speed up the review in gerrit?
Matma Rex писал 2012-12-19 15:01: You could add people as reviewers, or personally ask someone to review, prefereably someone who worked on the extension in the past. Okay, I've just done it... So, do you mean all committers just add random reviewers when they see no reaction? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How to speed up the review in gerrit?
Antoine Musso писал 2012-12-19 16:19: Le 19/12/12 11:57, vita...@yourcmc.ru wrote: Hello! 28 SEPTEMBER I've pushed minor changes to the gerrit, to the Drafts extensions. Since then I've corrected two of them (uploaded patch set 2), but after that, nobody did the review. As I understand, Gerrit will abandon changes after a month of inactivity, and it will come tomorrow... The changes are really simple. How to ask someone to really do the review? Does Gerrit have such function? And the changes are: https://gerrit.wikimedia.org/r/#/c/39369/ add a dependency on mediawiki.legacy.wikibits. https://gerrit.wikimedia.org/r/#/c/25629/ Fix a bug: drafts didn't show up when creating new pages https://gerrit.wikimedia.org/r/#/c/25628/ Always display user's drafts on the edit form https://gerrit.wikimedia.org/r/#/c/25627/ Fix for PHP 5.4: add to function prototype Yes, exactly! I've just added the first one (added dependency..). Others are older.. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Question about 2-phase dump
Page history structure isn't quite immutable; revisions may be added or deleted, pages may be renamed, etc etc. Shelling out to an external process means when that process dies due to a dead database connection etc, we can restart it cleanly. Brion, thanks for clarifying it. Also, I want to ask you and other developers about the idea of packing export XML file along with all exported uploads to ZIP archive (instead of putting them to XML in base64) - what do you think about it? We use it in our Mediawiki installations (mediawiki4intranet) and find it quite convenient. Actually, ZIP was the idea of Tim Starling, before ZIP we used very strange multipart/related archives (I don't know why we did it :)) I want to try to get this change reviewed at last... What do you think about it? Other improvements include advanced page selection (based on namespaces, categories, dates, imagelinks, templatelinks and pagelinks) and an advanced import report (including some sort of conflict detection). I should probably need to split them to separate patches in Gerrit for the ease of review? Also, do all the archiving methods (7z) really need to be built in the Export.php as dump filters? (especially when using ZIP?) I.e. with simple XML dumps you could just pipe the output to the compressor. Or are they really needed to save the temporary disk space during export? I ask because my version of import/export does not build the archive on-the-fly - it puts all the contents to a temporary directory and then archives it fully. Is it an acceptable method? -- With best regards, Vitaliy Filippov ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Question about 2-phase dump
Hello! While working on my improvements to MediaWiki ImportExport, I've discovered a feature that is totally new for me: 2-phase backup dump. I.e. the first pass dumper creates XML file without page texts, and the second pass dumper adds page texts. I have several questions about it - what it is intended for? Is it a sort of optimisation for large databases and why such method of optimisation was chosen? Also, does anyone use it? (does Wikimedia use it?) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Question about 2-phase dump
Brion Vibber wrote 2012-11-21 23:20: While generating a full dump, we're holding the database connection open for a long, long time. Hours, days, or weeks in the case of English Wikipedia. There's two issues with this: * the DB server needs to maintain a consistent snapshot of data since when we started the connection, so it's doing extra work to keep old data around * the DB connection needs to actually remain open; if the DB goes down or the dump process crashes, whoops! you just lost all your work. So, grabbing just the page and revision metadata lets us generate a file with a consistent snapshot as quickly as possible. We get to let the databases go, and the second pass can die and restart as many times as it needs while fetching actual text, which is immutable (thus no worries about consistency in the second pass). We definitely use this system for Wikimedia's data dumps! Oh, thanks, now I understand! But the revisions are also immutable - isn't it simpler just to select maximum revision ID in the beginning of dump and just discard newer page and image revisions during dump generation? Also, I have the same question about 'spawn' feature of backupTextPass.inc :) what is it intended for? :) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l