Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

2015-10-14 Thread vitalif

FWIW, we do index the full text of (PDF and?) DjVu files on Commons
(because it's stored in img_metadata). It's probably the biggest
improvement CirrusSearch brought for Commons.


And we also index office documents via Tika (*.doc and similar).

And I think it should not be a feature of the search engine at all! It's 
a separate feature that's completely independent of the search engine 
used (that's how it's implemented in my TikaMW).


So, is there any replacement for the SearchUpdate hook to modify the 
indexed text?


Of course I can just return SearchUpdate back by including a patch in 
our distribution mediawiki4intranet, but I would prefer if TikaMW didn't 
require patching...


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

2015-10-14 Thread vitalif

I've written about my problem ~2 years ago:

http://wikitech-l.wikimedia.narkive.com/6G0YPmWQ/need-a-way-to-modify-text-before-indexing-was-searchupdate

It seems I've lost the latest message, so I want to answer to it now:

With lsearchd and Elasticsearch, we absolutely wouldn't want to munge 
file text into page content (with sql-backed search, you might maybe).


Why?? Aren't these also just the fulltext search backends? As I 
understand they're much faster than sql-backed search engines. What 
would prevent them to store file texts?


Personally I use Sphinx (http://sphinxsearch.com) with TikaMW, and of 
course everything is fine.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

2014-01-14 Thread vitalif

Hi!

Change https://gerrit.wikimedia.org/r/#/c/79025/ that was merged to 1.22 
breaks my TikaMW extension - I used that hook to extract contents from 
binary files so the user can then search on it.


Maybe you can add some other hook for this purpose?

See also https://github.com/mediawiki4intranet/TikaMW/issues/2

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] On your python vs php talk

2013-07-28 Thread vitalif
It's not bad design. It's bad only theoretically and just different 
from strongly-typed languages. I like its inconsistent function names 
- for a lot of functions they're similar to C and in most cases they're 
very easy to remember, as opposed to some other languages, including 
python (!!).


Of course there are some nuances, but they're in any language. And I 
personally think 10 is semantically equal to 10 in most cases, so 
comparison is not a problem, either. You just need to be slightly more 
accurate while writing things.


And my main idea is that only a statically typed should try to be 
strict. And python very oddly tries to be strict in some places while 
being dynamically typed. Look, it doesn't concatenate string and long - 
even Java does that!


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] ???!!! ResourceLoader loading extension CSS DYNAMICALLY?!!

2013-06-07 Thread vitalif

Hi!

Sorry for not answering via normal Reply, it's because I'm getting 
messages in digests.


But I want to say thanks for clarification and for position=top advice 
- it's OK with position=top.


Thanks :)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] ???!!! ResourceLoader loading extension CSS DYNAMICALLY?!!

2013-06-05 Thread vitalif

Hello!

I've got a serious issue with ResourceLoader.

WHAT FOR it's made to load extension styles_ DYNAMICALLY using 
JavaScript?


It's a very bad idea, it leads to page style flickering during load. 
I.e. first the page is displayed using only skin CSS and then you see 
how extension styles are dynamically applied to it. Of course it's still 
rather fast, but it's definitely noticeable, even in Chrome.


Why didn't you just output link rel=stylesheet href=load.php?ALL 
MODULES / ??


Am I free to implement it and submit a patch?

--
With best regards,
  Vitaliy Filippov

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Publish-staying-in-editmode feature for WikiEditor

2013-05-24 Thread vitalif

Hello!

I have implemented an idea for WikiEditor extension: replace 
step-by-step publish feature with another one - publish staying in 
edit mode via AJAX. You can see a demo at http://wiki.4intra.net/ if 
you want. It works simply by sending an API save article request while 
NOT closing the article being edited. Also it handles section edits 
correctly via re-requesting section content after editing, so you'll 
stay with consistent edit form even if you add sections.


The idea is to give authors the ability to save intermediate results.

My question is - does anyone really need step-by-step publishing 
feature that is in WikiEditor? I think it's useless because it just 
duplicates the existing functionality, just submits the form using 
normal POST request, and makes editing harder as you have to do more 
clicks. I would submit a patch to Gerrit if you're interested in 
replacing it with publish-staying-in-editmode.


--
With best regards,
  Vitaliy Filippov

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Removing the Hooks class

2013-04-05 Thread vitalif

You can't cache program state and loaded code like that in PHP. We
explicitly have to abuse the autoloader and develop other patterns to
avoid loading unused portions of code because if we don't our
initialization is unreasonably long.


Yeah, I understand it, the idea was to serialize globals like $wgHooks 
$wgAutoloadClasses and etc - and load them in the beginning of each 
request...
So each extension would be separated in two parts: (1) metadata, 
executed once and then cached and (2) classes, cached by opcode cacher 
and loaded by a slim autoloader.
By this approach you'll get rid of executing even the main file of each 
extension; the downside is that it of course would require some 
extension rewriting.
I'm curious is such feature going to result in any performance benefit 
or not :)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Removing the Hooks class

2013-04-04 Thread vitalif

Hey,

I'm curious what the list thinks of deprecating and eventually 
removing the

Hooks class. Some relevant info:

/**
 * Hooks class.
 *
 * Used to supersede $wgHooks, because globals are EVIL.
 *
 * @since 1.18
 */


https://github.com/wikimedia/mediawiki-core/blob/master/includes/Hooks.php#L30

I personally find the comment hilarious and hope you see why when 
looking

at the class. Looks like usage in core and extensions is not to
extensive, so switching to something more sane seems quite feasible.


I second that!

Also I have an idea: maybe it would be good for mediawiki it the 
initialisation state along with all constants/global 
variables/extension metadata/preinitialised parser/preloaded PHP files, 
could be cached somewhere as the whole and just loaded on each request 
instead of being sequentially initialised? (extension metadata = 
hooks/special pages/i18n files/resource modules/credits/etc)
If it's a good idea then I think sequential setting of hooks via a 
method call is not that good? (because hooks become even less 
declarative?)

Or it's not worth it and the initialisation overhead is small?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] WebRequest and PHP bug 31892 fixed 6 years ago

2013-03-15 Thread vitalif

fixing bug 32621 is a todo. The first attempt failed and some tweaks
are  needed to use the PathRouter to fix that bug.

PathRouter allows for the use of custom paths to expand.
NamespacePaths is  an example of one thing you can do (say giving
Help: pages a /help/ path)  but you could also apply that to special
pages, etc... whatever. It's also  the precursor to MW being able to
handle 404s natively. The plan is in the  future you'll just be able
to throw everything that's not a file right at  index.php and pretty
urls, 404 pages, 404 thumbnail handlers, etc... will  all just work
natively without any special configuration.

And by 404, I don't mean standard 404 pages like this:
http://wiki.commonjs.org/404
I mean nice in-site 404 pages that actually help visitors find what
they  were looking for:
http://www.dragonballencyclopedia.com/404

Not sure how PATH_INFO being unmangled fixes anything. There are
other  servers where PATH_INFO won't easily be outputted. REQUEST_URI
handling  works better in every case. And ?title=$1 in rewrite rules
are evil.  Determining what urls run what code has always been the 
job

of the  application in every good language, not the webserver. And we
can do it  using REQUEST_URI much more reliably than some webservers.
Anyways, I wish I could just get rid of the PATH_INFO code. I have
yet to  hear of someone actually using it now that practically every
webserver  there is outputs REQUEST_URI meaning the PATH_INFO code is
never reached.


Thanks for answering!
But wasn't all that possible with just using something like 
$wgActionPaths?
Unmangled PATH_INFO allows for a single rewrite rule like (.*) - 
index.php/$1 to and you won't need to strip base from URIs (yet of 
course it's not hard)
And you say PATH_INFO is unavailable on some configurations - can you 
please clarify what are these configurations?



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] WebRequest and PHP bug 31892 fixed 6 years ago

2013-03-15 Thread vitalif

And what is the point of making pretty urls in case of MediaWiki?
I think they're already pretty much pretty in MediaWiki :)
/edit/$1 is slightly prettier than ?action=edit, but as I understand it 
doesn't affect anything, even like SEO.

And I don't think /help/$1 is any better than /Help:$1 at all...

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] WebRequest and PHP bug 31892 fixed 6 years ago

2013-03-13 Thread vitalif

Hello!

WebRequest::getPathInfo() still depends on PHP bug 31892 fixed 6 years 
ago. I.e. WebRequest uses REQUEST_URI instead of mangled PATH_INFO 
which is not mangled since PHP 5.2.4. Yeah, Apache still replaces 
multiple /// with single /, but afaik it's done for REQUEST_URI as well 
as PATH_INFO.

Maybe that part of the code should be removed?

Also I don't understand the need for PathRouter - my IMHO is that it's 
just an unnecessary sophistication. As I understand EVERYTHING worked 
without it and there is no feature in MediaWiki which depends on a 
router. Am I correct?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] WebRequest and PHP bug 31892 fixed 6 years ago

2013-03-13 Thread vitalif

I doubt Daniel would have introduced it if it was un-necessary or
pointless, I believe from memory it was to improve the handling of
paths over a wide range of set-ups and environments (where sometimes
it would fail). You would need to git blame the file and find the
revision where it was introduced to confirm if that is truly the case
(or if i'm mistaking it for other code)


I've looked at the annotations and what I've seen is that PathRouter 
only fixes https://bugzilla.wikimedia.org/show_bug.cgi?id=32621 by using 
path weights. Actually, I've started looking at the routing code after 
hitting this same bug with img_auth.php action path. But as I 
understand, it could be fixed much simpler just by reordering two parts 
of existing code and examining $wgArticlePath after $wgActionPaths :)
And a single extension using the PathRouter is 
http://www.mediawiki.org/wiki/Extension:NamespacePaths ...


Of course I support new features, there are some features that I myself 
would want to be in MW core :-)


And I'm sure my point of view may be incorrect :-) but MW trunk (i.e. 
master) slightly frigtens me when compared to previous versions - the 
codebase seems to grow and grow and grow having more and more and more 
different helpers... And it becomes more and more complex with no 
simplification effort... (or maybe I'm just not aware of it)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Seemingly proprietary Javascript

2013-03-05 Thread vitalif
I would just like to note that while it may be silly or useless 
to

insert licenses into minified JavaScript, it is nonetheless *legally
required* to do so, regardless of the technical aspect of it.


My 2 points - during my own research about free licenses, I've decided 
that for JS, a good license is MPL 2.0: http://www.mozilla.org/MPL/2.0/


Its advantages are:
1) It's strong file-level copyleft. File-level is good for JS, 
because it eliminates any problems of deciding whether a *.js file is or 
is not a part of a derivative work, and any problems of using together 
with differently licensed JS.
2) It's explicitly compatible with GPLv2+, LGPLv2.1+ or AGPLv3+. 
Incompatibility problem of MPL 1.1 caused triple licensing of Firefox 
(GPL/LGPL/MPL).
3) It does not require you to include long notices into every file. You 
only must inform recipients that the Source Code Form of the Covered 
Software is governed by the terms of this License, and how they can 
obtain a copy of this License. You may even not include any notice in 
files themselves provided that you include it in some place where a 
recipient would be likely to look for such a notice.


Also, what I've understood also was that CC-BY-SA is not good for 
source code at all, at least because it's incompatible with GPL. So 
CC-BY-SA licensed JS may be a problem.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] WikiEditor caching (??)

2013-02-18 Thread vitalif

It's also annoying that while the toolbar (normal or advanced) loads
I can't type in the header (for section=new) or the edit area, at
least on Firefox:* is this the same problem?
(*) Might also be a recent regression:
https://bugzilla.mozilla.org/show_bug.cgi?id=795232


Maybe...
It's also anyway annoying that the toolbar jumps down some moments 
after loading the page...


If WikiEditor wasn't implemented in _pure_ JS, the panel would be 
generated by php so this problem wouldn't exist...


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Creating WOFF files -- sub-setting larger fonts

2013-02-17 Thread vitalif
By the way, I've just tried to use ttf2woff from fontutils to convert 
Ubuntu TTF font to WOFF format for use in one of my projects.
And the resulting WOFF produced by this utility is not usable in any 
Linux browsers (tried Firefox, Chrome, Opera). Don't know if it works on 
Windows.
And at the same time some random online font converter produced normal 
WOFF from the same TTF.
I've reported this bug at CPAN: 
https://rt.cpan.org/Public/Bug/Display.html?id=83377

Links to font files, for the reference:
* Source TTF: http://vmx.yourcmc.ru/var/ttf2woff-bug/ubuntu.ttf
* Bad WOFF (by ttf2woff): 
http://vmx.yourcmc.ru/var/ttf2woff-bug/ubuntu-bad.woff
* Good WOFF (by online converter): 
http://vmx.yourcmc.ru/var/ttf2woff-bug/ubuntu-good.woff



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Creating WOFF files -- sub-setting larger fonts

2013-02-17 Thread vitalif
By the way, I've just tried to use ttf2woff from fontutils to convert 
Ubuntu TTF font to WOFF format for use in one of my projects.
And the resulting WOFF produced by this utility is not usable in any 
Linux browsers (tried Firefox, Chrome, Opera). Don't know if it works on 
Windows.
And at the same time some random online font converter produced normal 
WOFF from the same TTF.
I've reported this bug at CPAN: 
https://rt.cpan.org/Public/Bug/Display.html?id=83377
The files are attached for reference - source TTF, WOFF created by 
ttf2woff (ubuntu-bad.woff) and normal WOFF created by online converter 
(ubuntu-good.woff).___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Creating WOFF files -- sub-setting larger fonts

2013-02-17 Thread vitalif

Fontforge has an option to export the fonts to WOFF format.


Thanks, Fontforge worked even better than the online converter - usable 
WOFF, and the size is 50kb instead of 54kb :-)



[1] http://code.google.com/p/sfntly/
[2] http://code.google.com/p/sfntly/wiki/MicroTypeExpress


As I understood, sfntly is just the library, so do you use some utility 
of your own? Is it available somewhere?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] WikiEditor caching (??)

2013-02-16 Thread vitalif

vita...@yourcmc.ru wrote 2013-02-14 21:38:

Hello Wiki Developers!

I have a question: I think it's slightly annoying that WikiEditor
shows up only some moment after the editing page loads and that the
textarea gets moved down (because WikiEditor is only built 
dynamically

via JS).

Do you think it's possible to cache the generated WikiEditor HTML
code in some way to speed up loading?


Anyone?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Corporate needs are different (RE: How can we help Corporations use MW?)

2013-02-15 Thread vitalif
There are so many extensions useful to the enterprise but probably 
also so
many which are not useful at all or not maintained and if I wanted to 
start
a corporate wiki right now I would probably be very lost what to look 
at

and how people do things, so it seemed like a good idea to list the
extensions that ARE actually used. Also, I guess one team solved a 
certain
problem one way, while another solved it differenty, using a 
different
extension or set of extensions, so writing this out might help 
everybody
get new ideas/ avoid reinventing the wheel. But I guess I either 
asked on

the wrong list or there is not much interest at all.


So, you're talking about some basic set of extensions that are 
thought to be definitely useful for ALL people?


It may be useful, but I think that it anyway may require testing of a 
complete distribution (MW version X + all these extensions) to recommend 
it to companies... And this returns us to the idea of a pre-built 
distribution like our one :-))


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Corporate needs are different (RE: How can we help Corporations use MW?)

2013-02-14 Thread vitalif
I guess this would not directly solve any of the problems listed, but 
would

it be helpful to bring back to life
https://www.mediawiki.org/wiki/Enterprise_hub ? It was started by 
somebody
an year or two ago but seems to have been abandoned at a draft stage. 
I am
thinking if everybody adds some information about extensions/pages 
they
find particularly useful in the enterprise world, it will help future 
users
but also help current enterprise wikis exchange experience.  Does 
this seem

worthwhile?


IMHO there are so much useful extensions that I think it can be a 
little much for that page.


For example if I edited that article I would put almost all extensions 
from our distribution there... so I'm documenting them on 
http://wiki.4intra.net/Category:Mediawiki4Intranet_extensions :-)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] WikiEditor caching (??)

2013-02-14 Thread vitalif

Hello Wiki Developers!

I have a question: I think it's slightly annoying that WikiEditor shows 
up only some moment after the editing page loads and that the textarea 
gets moved down (because WikiEditor is only built dynamically via JS).


Do you think it's possible to cache the generated WikiEditor HTML code 
in some way to speed up loading?


--
With best regards,
  Vitaliy Filippov

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Stable PHP API for MediaWiki ?

2013-02-12 Thread vitalif
I understand from your comments that keeping things stable and 
preserving

compatibiliy HAS been a priority for core developers at least since
Daniel's email. Is this really the case? If this is the case, it 
makes me

wonder why I hear some complaints about it.


Mariya, but did you hear that many complaints? :-)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Stable PHP API for MediaWiki ?

2013-02-11 Thread vitalif

1) removal of global $action
2) removal of Xml::hidden()
3) broken Output::add() (had to migrate to resource loader)
4) various parser tag bugs
5) removal of MessageCache::addMessage()
6) removal of ts_makeSortable() (javascript)
7) brokage of WikiEditor adaptation
8) MediaWiki:common.js no more loading by default (security)
9) addHandler() javascript broken in IE8


Most of these were deprecations, am I correct?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Corporate needs are different (RE: How can we help Corporations use MW?)

2013-02-08 Thread vitalif

1. A desire for a department to have their own space on the wiki.


In our organisation (CUSTIS, Russia) we easily solve it by creating one 
primary wiki + separate ones for different departments.

It's just a normal wiki family with shared code.
Very simple solution without any extensions.
The main disadvantage is inability to search on all wikis with a single 
search request, but in practice I've had very little requests for this 
feature. So it's probably not that oftenly needed.



I'm not talking about access control


And we also have IntraACL for access control (forked from HaloACL).
Still not an ideal solution, but we'll probably improve it more.


2. Hierarchy. Departments want not only their own space, they want
subspaces beneath it. For example, Human Resources wiki area with
sub-areas of Payroll, Benefits, and Recruiting.  I realize Confluence
supports this... but we decided against Confluence because you have 
to

choose an article's area when you create it (at least when we
evaluated Confluence years ago). This is a mental barrier to creating
an article, if you don't know where you want to put it yet.  
MediaWiki

is so much better in this regard -- if you want an article, just make
it, and don't worry where it goes since the main namespace is flat.

I've been thinking about writing an extension that superimposes a
hierarchy on existing namespaces, and what the implications would be
for the rest of the MediaWiki UI. It's an interesting problem. Anyone
tried it?



3. Tools for organizing large groups of articles. Categories and
namespaces are great, and the DPL extension helps a lot. But when
(say) the Legal department creates 700 articles that all begin with
the words Legal department (e.g., Legal department policies,
Legal department meeting 2012-07-01, Legal department lunch,
etc.), suddenly the AJAX auto-suggest search box becomes a real pain
for finding Legal department articles. This is SO COMMON in a
corporate environment with many departments, as people try to game 
the

search box by titling all their articles with Legal department...
until suddenly it doesn't scale and they're stuck. I'd like to see
tools for easily retitling and recategorizing large numbers of
articles at once.


Recategorising is very simple with global search-and-replace.
Our implementation is called BatchEditor 
https://github.com/mediawiki4intranet/BatchEditor



4. Integration with popular corporate tools like MS Office, MS
Exchange, etc. We've spent thousands of hours doing this: for 
example,

an extension that embeds an Excel spreadsheet in a wiki page
(read-only, using a $10,000 commercial Excel-to-HTML translator as a
back-end), and we're looking at embedding Exchange calendars in wiki
pages next.


O_O $1 excel-to-html? O_OOO
Why not just copy-paste into for example wikEd (google://wikEd)? :-)))
Not that beautiful, but it works.


5. Corporate reorganizations and article titles. In any company, the
names and relationships of departments change. What do you do when
10,000 wiki links refer to the old department name?  Sure, you can
move the article Finance department to Global Finance department
and let redirects handle the rest: now your links work. But they 
still

have the old department name, and global search-and-replace is truly
scary when wikitext might get altered by accident. Also, there's the
category called Finance department. You can't rename categories
easily. I know you can do it with Pywikipedia, but it's slow and 
risky

(e.g., Pywikipedia used to have a bug that killed noinclude tags
around categories it changed). Categories should be fully first-class
so renames are as simple as article title changes.


Mass editing tool = BatchEditor, as I've already said.
But I agree that Mediawiki needs better mass editing, page selection 
and page exchanging (import/export) tools.


In our distribution (mediawiki4intranet) we partially solve it by 
implementing selection on Special:Export. BatchEditor uses this 
implementation when it's available. (you can see examples at 
http://wiki.4intra.net/Special:Export and 
http://wiki.4intra.net/Special:BatchEditor)
(also we have an improved import/export functionality but unfortunately 
it's a code bomb and reworking to get it in trunk will take a lot of 
time...)


But it's only a partial solution, because it has no standard interface. 
So we also have a variation of DPL, also we have SemanticMediaWiki. And 
all of them has partially the same - but not totally the same - 
functionality. And it would be good if there existed a single, 
standartized, optimised and cacheable method of page selection.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Corporate needs are different (RE: How can we help Corporations use MW?)

2013-02-08 Thread vitalif

In practice, we have found this doesn't work well for us (with
thousands of employees).


Yeah, our company doesn't have thousands of employees :-)


Each department winds up writing its own wiki page about the same
topic (say, Topic X), and they're all different.


So it means most of your departments work on something very similar?
Probably we don't have this problem because our departments and 
projects strongly differ, so everyone just writes their specific 
articles to their own wikis and general information to the primary 
CustisWiki.

We have ~7 wikis for the whole company (~200 employees).


Users don't know which one is the real or right article.
We find it better to have one central wiki with one definitive
article per topic.
No redundancy, no coupling, and no version skew between wikis.


Just an idea - you can also setup the replication process between wikis 
to ease fighting



Thanks, I'll check it out. Categorization can get very complicated on
a MediaWiki system though.
Consider this fairly simple template example:

{{#if:{{{department|}}} | [[Category:{{{department}}} projects]]}}

I would be amazed if any global search-and-replace could handle this!


Such examples of course are much harder, but if there is not much 
chaos, you can handle it with regexps... Not a task for an average user, 
but he can ask someone who knows regexps to do it :-)


With our extension, the Excel spreadsheet is rendered live in the 
wiki page.


Ooh, I see, of course it's a big feature!
Also another question - didn't you try to use some automation using 
excel itself to save xls as an html?


We started looking into Semantic MediaWiki - it has impressive 
features.

But we got scared off by stories that it slows down the
wiki too much. Maybe we should give it another look.


As someone already said, it should not affect performance noticeably if 
you don't abuse it.
And also, even if use abuse it - it has a very good feature: concept 
caching, i.e. caching of semantic query results with correct 
invalidation (as I understand it has some limitations though). 
(http://semantic-mediawiki.org/wiki/Help:Concept_caching)


Overall, it's very nice to see that a big company like yours has 
successful MediaWiki usage experience (I assume it's successful, yeah? 
:))


Do you have any extensions or modifications that you would like to make 
public  free  open source? Or maybe you even already did it with 
something? :-)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Why are we still using captchas on WMF sites?

2013-01-22 Thread vitalif

Per the previous comments in this post, anything over 1% precision
should be regarded as failure, and our Fancy Captcha was at 25% a 
year

ago. So yeah, approximately all, and our captcha is well known to
actually suck.


Maybe you'll just use recaptcha instead of fancycaptcha?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Why are we still using captchas on WMF sites?

2013-01-22 Thread vitalif

The problem is that reCaptcha (a) used as a service, would pass
private user data to a third party (b) is closed source, so we can' t
just put up our own instance. Has anyone reimplemented it or any of
it? There's piles of stuff on Wikisource we could feed it, for
example.


OK, then we can take KCaptcha and integrate it as an extension.
It's russian project, I've used it many times. Seems to be rather 
strong.

http://www.captcha.ru/en/kcaptcha/


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Why are we still using captchas on WMF sites?

2013-01-22 Thread vitalif

Luke Welling WMF писал 2013-01-22 21:59:
Even ignoring openness and privacy, exactly the same problems are 
present
with reCAPTCHA as with Fancy Captcha.  It's often very hard or 
impossible

for humans to read, and is a big enough target to have been broken by
various people.


It's very good to discuss, but what are the other options to minimize 
spam?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Why are we still using captchas on WMF sites?

2013-01-22 Thread vitalif
It's very good to discuss, but what are the other options to minimize 
spam?


(maybe I know one: find XRumer authors and tear their arms off... :-))

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MediaWiki Extension Bundles and Template Bundles

2013-01-14 Thread vitalif

On 01/14/2013 10:20 AM, Yuvi Panda wrote:
Is there a sort of 'Extension Bundle' that gets you baseline stuff 
that
people who are used to wikipedia 'expect'? ParserFunctions and Cite 
come to

mind, but I'm sure there are plenty others.


I don't know if this would be relevant to your question, but I have to 
say that in our company we maintain and use our own MediaWiki 
distribution Mediawiki4Intranet (http://github.com/mediawiki4intranet, 
http://wiki.4intra.net/) for all MW installations. It includes ~75 
extensions, the set is not totally similar to WMF one, but we think it's 
good for corporate (intranet) usage. You can try it out if you want, yet 
some extensions are documented only in russian :-)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Fwd: Re: How to speed up the review in gerrit?

2012-12-26 Thread vitalif

Actually registration is open to everyone now by simple form
submission.  So actually, any one developer could get any change they
wanted merged.  All they need to do is trivially register a second
labs account.


Okay, but current situation is also a problem, because with it 
reviewing and merging takes much more time.
And as I've said, I think most extensions aren't as important as the 
core, and limitting approve for them to core developers is just a 
waste...


Maybe you should add some group similar to previous (SVN) commit 
access to extensions, so a wider group of people could merge changes to 
the extensions?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Fwd: Re: How to speed up the review in gerrit?

2012-12-22 Thread vitalif
Sorry, I've replied to Sumana directly instead of the mailing list. So 
now duplicating into the mailing list.


Sumana Harihareswara писал 2012-12-19 22:30:

Try these tips:
https://www.mediawiki.org/wiki/Git/Code_review/Getting_reviews


Sumana, it's all very good but:
1) I think it's not so comfortable to push other developers personally 
when adding them as the reviewers... And I don't know whom to add as the 
reviewer, so I just choose randomly. But what if that guy doesn't want 
to do review for that extension? For example what if he is already very 
busy in working on mediawiki _core_, and I ask him to review a trivial 
extension?
2) Who can verify changes in extensions? There is no CI. So, people who 
can verify changes and people who can put +2 - are they the same people? 
But it again leads to short-circuiting all the work to the core 
people, and aren't they already busy? (I assume they are as they don't 
review all the changes)
3) As a solution, I think it would be good if - at least in 
not-so-important-as-the-core extensions - the changes merged 
automatically after getting, for example, 2x +1... Or will you end up 
with changes reviewed by not merged by anyone? And also, maybe it would 
also be good if the system automatically added some reviewers - randomly 
or based on some ownership rules...



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] How to speed up the review in gerrit?

2012-12-19 Thread vitalif

Hello!

28 SEPTEMBER I've pushed minor changes to the gerrit, to the Drafts 
extensions.
Since then I've corrected two of them (uploaded patch set 2), but 
after that, nobody did the review. As I understand, Gerrit will abandon 
changes after a month of inactivity, and it will come tomorrow...

The changes are really simple.
How to ask someone to really do the review? Does Gerrit have such 
function?


Thanks in advance,
Vitaliy Filippov

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to speed up the review in gerrit?

2012-12-19 Thread vitalif

Matma Rex писал 2012-12-19 15:01:

You could add people as reviewers, or personally ask someone to
review, prefereably someone who worked on the extension in the past.


Okay, I've just done it...
So, do you mean all committers just add random reviewers when they see 
no reaction?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How to speed up the review in gerrit?

2012-12-19 Thread vitalif

Antoine Musso писал 2012-12-19 16:19:

Le 19/12/12 11:57, vita...@yourcmc.ru wrote:

Hello!

28 SEPTEMBER I've pushed minor changes to the gerrit, to the Drafts
extensions.
Since then I've corrected two of them (uploaded patch set 2), but
after that, nobody did the review. As I understand, Gerrit will 
abandon

changes after a month of inactivity, and it will come tomorrow...
The changes are really simple.
How to ask someone to really do the review? Does Gerrit have such 
function?


And the changes are:

https://gerrit.wikimedia.org/r/#/c/39369/
 add a dependency on mediawiki.legacy.wikibits.

https://gerrit.wikimedia.org/r/#/c/25629/
 Fix a bug: drafts didn't show up when creating new pages

https://gerrit.wikimedia.org/r/#/c/25628/
 Always display user's drafts on the edit form

https://gerrit.wikimedia.org/r/#/c/25627/
 Fix for PHP 5.4: add  to function prototype


Yes, exactly!
I've just added the first one (added dependency..).
Others are older..


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Question about 2-phase dump

2012-11-25 Thread vitalif
Page history structure isn't quite immutable; revisions may be added 
or

deleted, pages may be renamed, etc etc.

Shelling out to an external process means when that process dies due 
to a

dead database connection etc, we can restart it cleanly.


Brion, thanks for clarifying it.

Also, I want to ask you and other developers about the idea of packing 
export XML file along with all exported uploads to ZIP archive (instead 
of putting them to XML in base64) - what do you think about it? We use 
it in our Mediawiki installations (mediawiki4intranet) and find it 
quite convenient. Actually, ZIP was the idea of Tim Starling, before ZIP 
we used very strange multipart/related archives (I don't know why we 
did it :))


I want to try to get this change reviewed at last... What do you think 
about it?


Other improvements include advanced page selection (based on 
namespaces, categories, dates, imagelinks, templatelinks and pagelinks) 
and an advanced import report (including some sort of conflict 
detection). I should probably need to split them to separate patches in 
Gerrit for the ease of review?


Also, do all the archiving methods (7z) really need to be built in the 
Export.php as dump filters? (especially when using ZIP?) I.e. with 
simple XML dumps you could just pipe the output to the compressor.


Or are they really needed to save the temporary disk space during 
export? I ask because my version of import/export does not build the 
archive on-the-fly - it puts all the contents to a temporary directory 
and then archives it fully. Is it an acceptable method?


--
With best regards,
  Vitaliy Filippov


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Question about 2-phase dump

2012-11-21 Thread vitalif

Hello!

While working on my improvements to MediaWiki ImportExport, I've 
discovered a feature that is totally new for me: 2-phase backup dump. 
I.e. the first pass dumper creates XML file without page texts, and the 
second pass dumper adds page texts.


I have several questions about it - what it is intended for? Is it a 
sort of optimisation for large databases and why such method of 
optimisation was chosen?


Also, does anyone use it? (does Wikimedia use it?)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Question about 2-phase dump

2012-11-21 Thread vitalif

Brion Vibber wrote 2012-11-21 23:20:

While generating a full dump, we're holding the database connection
open for a long, long time. Hours, days, or weeks in the case of
English Wikipedia.

There's two issues with this:
* the DB server needs to maintain a consistent snapshot of data since 
when
we started the connection, so it's doing extra work to keep old data 
around
* the DB connection needs to actually remain open; if the DB goes 
down or

the dump process crashes, whoops! you just lost all your work.

So, grabbing just the page and revision metadata lets us generate a 
file

with a consistent snapshot as quickly as possible. We get to let the
databases go, and the second pass can die and restart as many times 
as it
needs while fetching actual text, which is immutable (thus no worries 
about

consistency in the second pass).

We definitely use this system for Wikimedia's data dumps!


Oh, thanks, now I understand!
But the revisions are also immutable - isn't it simpler just to select 
maximum revision ID in the beginning of dump and just discard newer page 
and image revisions during dump generation?


Also, I have the same question about 'spawn' feature of 
backupTextPass.inc :) what is it intended for? :)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l