[Wikitech-l] Re: [Proposal] Disable setting the "Lowest" Priority value in Phabricator

2023-02-27 Thread Jaime Crespo
Hi,


On Fri, Feb 24, 2023 at 4:32 PM Brian Wolff  wrote:

> If i could change the priority field i would probably change it to be only:
> * unbreak now
> * work on next (or "urgent")
> * normal
> * no intention to fix (aka lowest)
>

I support this, but we should be careful about the wording. The problem
with lowest is that is has negative connotations that I would like to
avoid- that is why I proposed to name the above levels:
* unbreak now
* very high ("urgent" works too)
* high (equivalent to work on next)
(* normal) this would be optional, but in case we want to keep 5 categories
* low (no intention to fix, old lowest)

While this seems to be an inflation of the name usage, the original issue,
and the one I still have sometimes, is to say something is "very low",
which in my experience doesn't sit well for the casual reader, even if it
has a relatively well established meaning, and leads to people complaining
and getting an explanation, etc.
-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Can't claim on Phabricator

2023-02-02 Thread Jaime Crespo
Bináris,

to claim or assign a task, you have to go to a particular task, then at the
end click on the "Add action..." list, then "Assign / Claim" and write your
account name or someone else's (it will autocomplete). Please note that
assigning a task to someone without any previous interaction or context is
generally considered "bad etiquette" by most people or teams, unless so
documented.

A good place to search for Phabricator help is[0] and to ask questions, [1]
(there are versions in other languages); I encourage you to ask there
further doubts. :-)

Regards,

[0] https://www.mediawiki.org/wiki/Phabricator/Help>
[1] https://www.mediawiki.org/wiki/Talk:Phabricator/Help>

On Thu, Feb 2, 2023 at 4:53 PM Bináris  wrote:

> Hi,
> my account is https://phabricator.wikimedia.org/p/binbot/
> I don't see either claim or assign command at the tasks.
> Where is the problem?
>
> --
> Bináris
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Gerrit upgrade November 17th 9:00 UTC

2022-11-17 Thread Jaime Crespo
Thiemo:

I wasn't involved in the upgrade, and I also saw a slowdown, but once it
fully deauthenticated me and I logged back in, errors disappeared for me
(although I haven't checked everything works yet). Could you try that if
that helps?

On Thu, Nov 17, 2022 at 10:45 AM Thiemo Kreuz 
wrote:

> I think the upgrade broke something.
>
> * Gerrit loads very slow. Not on all actions, but on some of them.
> * Half of the patches appear empty, as if there are no changes. I
> suspect this is some timeout. The list of files never loads.
> * Whenever I try to add or edit a comment it fails with an error
> message "An error occurred. Error 500 (Server Error): Internal server
> error. Endpoint: /changes/*~*/revisions/*/drafts/*. Alert: Unable to
> save draft".
>
> Better rollback?
>
> Best
> Thiemo
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>


-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: ListFiles special page

2021-10-15 Thread Jaime Crespo
s/they/emojis/

On Fri, Oct 15, 2021 at 2:12 PM Jaime Crespo  wrote:

> I don't want to defend MySQL development decisions- in fact PHP made some
> similarly bad ones, but it would be unfair to judge them too harsly with
> the "power of hindsight" [0]- but... /pedantic on
>
> On Thu, Oct 14, 2021 at 7:37 PM Roy Smith  wrote:
>
>> What part of "universal" did they not understand?
>>
>
> ... several years ago, during the end of the century/start of a new one,
> no one used UTF-8 [1] and PHP didn't even support multi-byte strings. The
> original spec for UTF-8 called for up to 6 bytes[2]. The BMP, however (3
> bytes) contained characters for most modern languages [3], which was a
> waste of space and performance because at the time, MySQL worked much
> faster with fixed-width columns, which would be a waste of space (double!).
> My guess is that someone said "this is probably good enough", and would it
> be too outrageous to think that we may not need as many extra characters as
> stars in our Galaxy, when less than 65K were practically needed?
>
> 3 things changed after that:
> * Unicode limited UTF-8 to encoding for 21 bits in 2003 [4], requiring
> only 4 bytes- only one more than on MySQL's utf8
> * Apple wanted to sell iPhones in Japan, so they were added to unicode in
> 2010, and its subsequent popularity
> * MySQL/InnoDB has been highly optimized for the fast handling of
> variable-length strings
>
> However, you cannot just arbitrarily break backwards compatibility and
> rename the meaning of configuration- specially with storage software that
> has been continuously supporting incremental upgrades as long as I can
> remember. You can just support the new standard and encourage its usage,
> make it the default, etc.
>
> This is a bit offtopic here (feel free to PM to continue the
> conversation), and just to be clear, I am _not fully justifying the
> decisions_, just giving historical context, but I want to end with some
> relevant lessons to the list:
>
> * It is very difficult to build future-proof applications- PHP, MySQL,
> Mediawiki, they have a long history and we should be gentle when we judge
> them from the future. My work, involving backups, makes sometimes
> supporting storage of stuff for over 5 years (unchanged) challenging,
> because encryption algorithms are found to be weak, or end up being
> unsupported/unavailable in just 2 releases of the operating system!
> * Standards also change, they are not as "universal" as we may want to
> believe (there have been 32 extra unicode versions since 1991). I expect
> new collations to be needed in the future that are currently not
> implemented, too.
> * It is ok to make "mistakes", as long as we learn from them and improve
> upon them :-)
>
> Sorry for the text block.
>
> [0] https://powerlisting.fandom.com/wiki/Hindsight>
> [1] https://commons.wikimedia.org/wiki/File:Utf8webgrowth.svg>
> [2] https://www.rfc-editor.org/rfc/rfc2279>
> [3]  https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane>
> [4] https://www.rfc-editor.org/rfc/rfc3629>
>
>

-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: ListFiles special page

2021-10-15 Thread Jaime Crespo
I don't want to defend MySQL development decisions- in fact PHP made some
similarly bad ones, but it would be unfair to judge them too harsly with
the "power of hindsight" [0]- but... /pedantic on

On Thu, Oct 14, 2021 at 7:37 PM Roy Smith  wrote:

> What part of "universal" did they not understand?
>

... several years ago, during the end of the century/start of a new one, no
one used UTF-8 [1] and PHP didn't even support multi-byte strings. The
original spec for UTF-8 called for up to 6 bytes[2]. The BMP, however (3
bytes) contained characters for most modern languages [3], which was a
waste of space and performance because at the time, MySQL worked much
faster with fixed-width columns, which would be a waste of space (double!).
My guess is that someone said "this is probably good enough", and would it
be too outrageous to think that we may not need as many extra characters as
stars in our Galaxy, when less than 65K were practically needed?

3 things changed after that:
* Unicode limited UTF-8 to encoding for 21 bits in 2003 [4], requiring only
4 bytes- only one more than on MySQL's utf8
* Apple wanted to sell iPhones in Japan, so they were added to unicode in
2010, and its subsequent popularity
* MySQL/InnoDB has been highly optimized for the fast handling of
variable-length strings

However, you cannot just arbitrarily break backwards compatibility and
rename the meaning of configuration- specially with storage software that
has been continuously supporting incremental upgrades as long as I can
remember. You can just support the new standard and encourage its usage,
make it the default, etc.

This is a bit offtopic here (feel free to PM to continue the conversation),
and just to be clear, I am _not fully justifying the decisions_, just
giving historical context, but I want to end with some relevant lessons to
the list:

* It is very difficult to build future-proof applications- PHP, MySQL,
Mediawiki, they have a long history and we should be gentle when we judge
them from the future. My work, involving backups, makes sometimes
supporting storage of stuff for over 5 years (unchanged) challenging,
because encryption algorithms are found to be weak, or end up being
unsupported/unavailable in just 2 releases of the operating system!
* Standards also change, they are not as "universal" as we may want to
believe (there have been 32 extra unicode versions since 1991). I expect
new collations to be needed in the future that are currently not
implemented, too.
* It is ok to make "mistakes", as long as we learn from them and improve
upon them :-)

Sorry for the text block.

[0] https://powerlisting.fandom.com/wiki/Hindsight>
[1] https://commons.wikimedia.org/wiki/File:Utf8webgrowth.svg>
[2] https://www.rfc-editor.org/rfc/rfc2279>
[3] https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane>
[4] https://www.rfc-editor.org/rfc/rfc3629>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: ListFiles special page

2021-10-14 Thread Jaime Crespo
I agree that LOWER doesn't make much sense in binary collation.

Sadly, a utf8 (3-byte UTF-8) conversion may fail for 4-byte characters, so
at the very least it should be utf8mb4 (4-byte UTF-8). I am not so familiar
with ListPager to say if there could be other issues arising from that-
sending a code review would be easier for better context.

On Thu, Oct 14, 2021 at 5:16 PM Sergey Dorofeev  wrote:

> Hello,
>
> I have got issue with ListFiles page in mediawiki 1.35.1
> Filtering worked not very good, was case-sensitive and not always got
> text in middle of file name.
> I looked in DB and saw that img_name column is varbinary, but
> pagers/ImageListPager.php tries to do case-insensitive select with
> LOWERing both sides of strings. But LOWER does not work for varbinary
> So I think that following change will be reasonable:
>
> --- ImageListPager.php.orig 2021-10-14 16:31:52.0 +0300
> +++ ImageListPager.php  2021-10-14 16:00:10.127694733 +0300
> @@ -90,9 +90,10 @@
>
>  if ( $nt ) {
>  $dbr = wfGetDB( DB_REPLICA );
> -   $this->mQueryConds[] = 'LOWER(img_name)'
> .
> +   $this->mQueryConds[] =
> 'LOWER(CONVERT(img_name USING utf8))' .
>  $dbr->buildLike(
> $dbr->anyString(),
> -   strtolower(
> $nt->getDBkey() ), $dbr->anyString() );
> +   mb_strtolower(
> $nt->getDBkey() ), $dbr->anyString() );
> +
>  }
>  }
>
> @@ -161,9 +162,9 @@
>  $nt = Title::newFromText( $this->mSearch );
>  if ( $nt ) {
>  $dbr = wfGetDB( DB_REPLICA );
> -   $conds[] = 'LOWER(' . $prefix . '_name)'
> .
> +   $conds[] = 'LOWER(CONVERT(' . $prefix .
> '_name USING utf8))' .
>  $dbr->buildLike(
> $dbr->anyString(),
> -   strtolower(
> $nt->getDBkey() ), $dbr->anyString() );
> +   mb_strtolower(
> $nt->getDBkey() ), $dbr->anyString() );
>  }
>  }
>
>
>
> --
> Sergey
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>


-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Why does the train start on Tuesday?

2021-06-23 Thread Jaime Crespo
On Wed, Jun 23, 2021 at 12:10 AM Jon Robson  wrote:

> I understand the Friday is a buffer, but it's not a great buffer,
> particularly now
>

I don't have any suggestions, as I am not super-familiar with the current
situation, but I can understand (and suffered) from some hidden bugs (not
immediately obvious) taking some time to surface and/or reach the right
people, but I have some questions:

* How often are issues surfaced in the group0 -> group1 vs group1 ->
group2, are there any stats to back the need for a change there?
* Without changing the actual deploying days or the frequency, would there
be any benefit of shifting the deploy over multiple weeks? (random example
Tu: group1->group2, (new branch) We: group0, Th: group0-> group1) or would
that make things worse?
* You mention commons. I am guessing Commons, and Wikidata, to some extent-
are both large sites with a lot of visibility but also very different from
the core features that are similar to most other wikis, but the test
version of those on group0 may not be enough to catch all issues. Is there
something that could be improved specifically for those sites?
* Can we do something to improve the speed from "a user notices an issue
with the site" to "the right team/owner is aware of it and acts on it"?

-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Re: [Wikitech-l] [engineering-all] Deployment calendar format change

2021-03-29 Thread Jaime Crespo
Tyler,

Thank you very much for the change! I don't use the visual editor for
mediawiki.org, but I think the new format will also make editing it easier
for everyone, by editing easily single day schedules.

Thank you a lot!

On Fri, Mar 26, 2021 at 11:19 PM Tyler Cipriani 
wrote:

> tl;dr: The deployment calendar format will change in 2 weeks (2021-04-05)
> to make it easier to edit with visual editor https://w.wiki/a3b
>
> I updated the deployment calendar for the week of 2021-04-05[0] to use a
> different format than in the past (compare to next week[1]). My hope is
> that this new format will make it much easier to schedule deployment
> windows and to schedule patches for backports using Visual Editor.
>
> Also, selfishly, less squinting at Wikitext for me :)
>
> All credit for the new format goes to Timo Tijhof. Thank you Timo!
>
> Thanks all!
> -- Tyler
>
> [0]: <https://wikitech.wikimedia.org/wiki/Deployments#Week_of_April_05>
> [1]: <https://wikitech.wikimedia.org/wiki/Deployments#Week_of_March_29>
>


-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] New DB_REPLICA constant; DB_SLAVE deprecated

2020-07-02 Thread Jaime Crespo
> I don't plan on doing anything to DB_MASTER, since it seems fine by
itself,
> like "master copy", "master tape" or "master key"

MySQL has announced they chose to use "source":
https://mysqlhighavailability.com/mysql-terminology-updates/ We can
consider updating too, I think it is a more accurate representation of what
those servers are (sending original data to, rather than controlling the
replicas).

On Sat, Sep 17, 2016 at 10:18 AM Amir Ladsgroup  wrote:

> Sorry for late question. I guess we should deprecate wfWaitForSlaves() and
> probably some other methods that still use this logic
>
> Best
>
> On Tue, Sep 6, 2016 at 11:22 AM Aaron Schulz 
> wrote:
>
> > As of 950cf6016c, the mediawiki/core repo was updated to use DB_REPLICA
> > instead of DB_SLAVE, with the old constant left as an alias. This is part
> > of a string of commits that cleaned up the mixed use of "replica" and
> > "slave" by sticking to the former. Extensions have not been mass
> > converted. Please use the new constant in any new code.
> >
> > The word "replica" is a bit more indicative of a broader range of DB
> > setups*, is used by a range of large companies**, and is more neutral in
> > connotations.
> >
> > Drupal and Django made similar updates (even replacing the word
> "master"):
> > * https://www.drupal.org/node/2275877
> > * https://github.com/django/django/pull/2692/files &
> >
> >
> https://github.com/django/django/commit/beec05686ccc3bee8461f9a5a02c607a02352ae1
> >
> > I don't plan on doing anything to DB_MASTER, since it seems fine by
> itself,
> > like "master copy", "master tape" or "master key". This is analogous to a
> > master RDBMs database. Even multi-master RDBMs systems tend to have a
> > stronger consistency than classic RDBMs slave servers, and present
> > themselves as one logical "master" or "authoritative" copy. Even in it's
> > personified form, a "master" database can readily be thought of as
> > analogous to "controller",  "governer", "ruler", lead "officer", or
> such.**
> >
> > * clusters using two-phase commit, galera using certification-based
> > replication, multi-master circular replication, ect...
> > **
> >
> >
> https://en.wikipedia.org/wiki/Master/slave_(technology)#Appropriateness_of_usage
> > ***
> >
> >
> http://www.merriam-webster.com/dictionary/master?utm_campaign=sd_medium=serp_source=jsonld
> >
> > --
> > -Aaron
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I wish I would understand the mediawiki architecture and components

2020-01-29 Thread Jaime Crespo
On Wed, Jan 29, 2020 at 9:39 PM Marco A.  wrote:

> > 2. how to get on a daily basis the access count of the wiki pages ?
> >
>
> I suspect you'll need to set up something server-side for that.


Almost every Mediawiki installation I've seen (including Wikipedia) does it
based on HTTP Frontend server logs/analytics, for example, you could
analyze HTTP (e.g. Apache) or cache server hit logs. There are many open
source self hosted solutions [0] for that.

There is also some mw extensions for analytics integrations [1][2].

[0] https://medevel.com/best-20-open-source-free-self-hosted-web-analytics/>
[1] https://www.mediawiki.org/wiki/Extension:Matomo>
[2] https://www.mediawiki.org/wiki/Extension:Google_Analytics_Integration>
--
Jaime Crespo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Storing some values in the DB

2020-01-28 Thread Jaime Crespo
> It complicates extension installation

One could just copy and paste the existing migration code and just change
the SQL to the right CREATE TABLE, and be done. If you just have to create
a table, you will only have to do that once, no maintenance required.
Trying to fit values into existing tables, however will create a lot of
maintenance headaches, as Brian said.

# include/Hooks.php
$base = __DIR__ . "/../sql";
$updater->addExtensionTable( 'extension_value', $base .
'/ExtensionValue.sql' );
# sql/ExtensionValue.sql
CREATE TABLE extension_value (
k varchar(100) PRIMARY KEY,
v int NOT NULL,
);

If you are overwhelmed with SQL or mediawiki migration system please just
ask for help and people here, including me, will be able to help! :-D
Mediawiki core code can be intimidating because support of several
databases systems and its long history of schema changes, but in your case
adding a table should not take more than 2 files that are never touched
again :-D. Update.php will have to be run anyway on core updates. Handling
mysql and queries is way more difficult than creating it in the first place!

Or maybe we are not understanding what you are trying to achieve. 0:-)

On Mon, Jan 27, 2020 at 5:00 PM Jeroen De Dauw 
wrote:

> Hey,
>
> > Why are you so reluctant to create a table?
>
> It complicates extension installation and adds yet another thing to
> maintain. Which is a bit silly for a handful of values. I was hoping
> MediaWiki would have a generic table to put such things already. Page_props
> is exactly that, just bound to specific pages.
>
> Cheers
>
> --
> Jeroen De Dauw | www.EntropyWins.wtf <https://EntropyWins.wtf>
> Professional wiki hosting and services: www.Professional.Wiki
> <https://Professional.Wiki>
> Software Crafter | Speaker | Entrepreneur | Open Source and Wikimedia
> contributor ~=[,,_,,]:3
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Equivalence for using Template:Cite_Web with templates surrounding Wikicode

2019-12-31 Thread Jaime Crespo
Hi,

On Tue, Dec 31, 2019 at 12:42 PM Egbe Eugene  wrote:

> Hi All,
>
> I am implementing template https://en.wikipedia.org/wiki/Template:Cite_web
> in
> VE and after using the template code[1], I get results shown below
>

Sorry, I may not be understanding what you are trying to accomplish, I am
guessing reimplementing Cite_web functionality on your own local mediawiki
installation?
I don't know much about visual editor or references-related code, someone
else may be able to help you here better, but I know that is implemented on
enwiki:
https://phab.wmfusercontent.org/file/data/wz3njgh6k6iwrugvap6f/PHID-FILE-n6el7aiazixa4mwb2rq2/Screenshot_20191231_150008.png
https://en.wikipedia.org/w/index.php?title=User:JCrespo_(WMF)/sandbox=revision=933365112=926096219=visual

So my only suggestion is to look at how enwiki has it configured and try to
adapt it to your needs.

Another suggestion is to use Phaste:
https://phabricator.wikimedia.org/paste/edit/form/14/ for code snippets, as
it will prevent vandals from editing it + allow comments.

Sorry I cannot be of more help,
-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] About to start solve issues since Google Summer of Code 2020 has been launched

2019-12-11 Thread Jaime Crespo
Hello,

On Wed, Dec 11, 2019 at 10:51 AM Fokou Joel  wrote:

> Hello everyone. Please what shall I do from now since Google Summer of Code
> 2020 launched? Thank you.
>

If you got familiar with
https://www.mediawiki.org/wiki/Google_Summer_of_Code/Participants already
and are wondering, "what's next?",
https://www.mediawiki.org/wiki/New_Developers and
https://www.mediawiki.org/wiki/Good_first_bugs are both good way to get
familiar with the several Wikimedia projects and how to start helping the
project.

I believe Google Summer of Code 2020 student registration doesn't start
until March 2020 https://summerofcode.withgoogle.com/get-started/ , but you
can start getting familiar with projects to choose something that will
interest you.

Cheers.



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Outdated dump?

2019-10-28 Thread Jaime Crespo
Also the string exist on current versions of huwiki. I take the opportunity
to mention that there is a more specific XMLDataDumps list in case that is
useful for future cases:
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

On Mon, Oct 28, 2019 at 3:37 PM Bináris  wrote:

> Hi,
>
> I use the latest
>
> huwiki-latest-pages-articles.xml.bz2
> <
> https://dumps.wikimedia.org/huwiki/latest/huwiki-latest-pages-articles.xml.bz2
> >
>
> (21 Oct) from here: https://dumps.wikimedia.org/huwiki/latest/
>
> A few times my bot founds candidates for text replacement in the dump, and
> then, when it checks the page in the live wiki, it says:
> No changes were necessary in [[2022-es labdarúgó-világbajnokság]] or other
> page.
> But the string "Résztvevő országok", which was found in the dump, was
> removed from wiki on 7 June:
>
> https://hu.wikipedia.org/w/index.php?title=2022-es_labdar%C3%BAg%C3%B3-vil%C3%A1gbajnoks%C3%A1g=21407770=20630428
> This does not happen too often, but not for the first time.
>
> So is it possible that the dump contains earlier version?
>
> --
> Bináris
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] How large is Wikipedia?

2019-06-04 Thread Jaime Crespo
Hi,

I cannot actually answer this question (it is not easy), but I
sometimes get this kind of question related to our main relational
database storage (MariaDB). I am preparing some slides for a
presentation, and took some numbers and wanted to share those with you
(as of June 2019):

* There is approximately 550 TB of used data in the MariaDB-related
servers along the Wikimedia infrastructure (mostly compressed in some
way- InnoDB, gzip, etc.)
* If we do not account for redundancy, 60TB of data is unique (average
of 9x redundancy, which seems about right)
** Of that, 24TB is for insert-only highly-compressed content (External Storage)
** The rest is metadata, local content, misc services, disc cache,
analytics, cloud dbs, and backups.

Please note this doesn't have into account storage in other mediums or
technologies (search, maps, analytics, REST, file storage, etc.). Also
content compression be very efficient so uncompressed data can be much
larger. We are in fact aiming at reducing even more the storage
footprint over the next months.

If someone is interested on seeing size evolution, you can get the
latest up to date metrics on Grafana:
https://grafana.wikimedia.org/d/00607/cluster-overview?orgId=1=eqiad%20prometheus%2Fops=mysql=All

--
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Database schema diagram for MW 1.32

2019-04-23 Thread Jaime Crespo
On Mon, Apr 22, 2019 at 9:57 PM Krinkle  wrote:
> I've updated it yearly, or every other year, since 2011. – hoping to have it
> updated for MediaWiki 1.32. But, I'm currently struggling to find the time.
>
> Are you interested in learning about and creating the next diagram? Let me
> know :)

It would be nice to get that automated, that could be maybe a nice
GSoC project ?
I started working on
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/337390/ as a first
step towards automation, but put it on pause.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Help please! Database::close: mass commit/rollback... on Special Page

2019-04-23 Thread Jaime Crespo
> > For one of our custom wikis we have a Special Page that creates new pages
> > based on form input. This used to work in MW1.23 but in the long overdue
> > update to MW1.31 it stopped working.

I am not familiar with Special:StrainNewPage, I am guessing that is
custom code not available from a standard extension? If it is not
private and you could share that custom code (or at least part of it)
maybe someone can help you better to see what is the problem in your
case. :-)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Patchsets by new Gerrit contributors waiting for code review and/or merge

2019-04-04 Thread Jaime Crespo
For the couple I am slightly familiar with (apologies if I am mistaken):

On Thu, Apr 4, 2019 at 11:11 AM Andre Klapper  wrote:
> * https://gerrit.wikimedia.org/r/#/c/operations/software/cumin/+/497312/
> ** allow running cumin as a regular user
> ** 2019-March-18
> ** Maintainers/Stewards: WMF SRE?

Cumin is maintained by the software automation subteam of SRE.

> * https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/498773/
> ** db::views: Bring back abuse_filter_history table
> ** 2019-March-25
> ** Maintainers/Stewards: WMF SRE?

Wikireplica views are maintained by the WMCS team (not SRE), although
they may or may not need assistance by SREs/security, depending on the
proposed changes.

I will bring these 2 to their attention.

Thanks for the heads up, Andre

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Unstable search results - possible causes

2019-03-07 Thread Jaime Crespo
> A load balancer sends users to whichever server has the fewest active 
> connections.

That could be it- I don't think there has been a lot of focus on full
read-read stability on Mediawiki, so some stickiness may be needed for
some queries. Things that worry about that is: potential non
deterministic queries, which can return results differently depending
on the physical distribution of the rows. Those are mistakes and are
corrected when detected (I am trying to force strict sql mode into our
CI), but because not many people run mediawiki in large clusters AND
use the internal search engine it could have been missed. To verify it
is that- without any writes, rebuild the tables involved and they
would return results in different order.

Please share details about versions, url and queries, or contact us
privately in Phabricator if you cannot share those details publicly.

Percona Cluster/Galera is not officially supported as far as I know,
and while it should work with no problem because it has the same sql
interface, there are no specific controls for the distributed nature
of such cluster (unlike regular replication, which has specific
functionality). Enabling the chronology protector with heartbeat may
work for PXC, but I have not tested it. Another option is to use PXC
in a failover-only mode, being sticky to a primary server and leaving
the other nodes for failover, backups, etc.

On Wed, Mar 6, 2019 at 3:05 AM Hogan (US), Michael C
 wrote:
>
> I'm running a self-hosted wiki, so unfortunately can't provide a URL. The 
> search is performed on the standard [[Special:Search]] page. It may matter 
> that we're running the wiki on two servers, using Percona MySQL for data 
> replication between the servers. A load balancer sends users to whichever 
> server has the fewest active connections.
>
> -Original Message-
> From: Stas Malyshev [mailto:smalys...@wikimedia.org]
> Sent: Tuesday, March 05, 2019 5:39 PM
> To: Wikimedia developers ; Hogan (US), 
> Michael C 
> Subject: Re: [Wikitech-l] Unstable search results - possible causes
>
> Hi!
>
> > I'm using the built-in Mediawiki search engine. We just updated from 1.30.0 
> > to 1.31.0. Since the update, search results are unstable. The same search 
> > term gives different results in different web browsers. We also see 
> > different results across browser sessions. Any advice on how I can 
> > troubleshoot this?
>
> Could you provide more info - which search you're talking about (URL might 
> help, or description of the sequence of actions), which terms, what you're 
> getting in different cases? Some search results might depend on your 
> preferences or user language selected, so there might be difference, e.g. for 
> different logged-in users.
>
> Thanks,
> --
> Stas Malyshev
> smalys...@wikimedia.org
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] PHP 7 is now a beta feature

2019-01-31 Thread Jaime Crespo
Dan and others,

On Thu, Jan 31, 2019 at 9:14 AM Dan Garry (Deskana)  wrote:
>
> Cool! Thanks for developing this beta feature, it makes it easy to test.
>
> Is there anything in particular that you might expect to behave
> differently, or break, that you'd like us to test? Are you just looking for
> more general feedback?

I am an outsider to the process, but maybe I can give some examples.
In theory "everything should work" -in reality these are some of the
issues:

* Configuration and the environment may not be 100% equal -lots of
backend changes-, leading to different results e.g.

* Edge cases may be fixed automagically, or break with the new setup
e.g. 
* Performance may be different, while I've been told in general things
are looking faster, there could be also regressions

* Continuous integration and testing may need double checking and
fixing 
* ...

In general they are the same issues that would arise from migrating
from a major upgrade (e.g. PHP 5 to PHP 7). Your bug reports and the
kind comments here, and on places such as
https://www.mediawiki.org/wiki/Topic:Usx7uerq380mzwq3 I believe are
highly motivational for the many teams involved, thanks! Also many
thanks for all the people, developers, testers and people in
infrastructure for making this possible!

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Gerrit now automatically adds reviewers

2019-01-18 Thread Jaime Crespo
On Fri, Jan 18, 2019 at 3:12 PM Tyler Cipriani 
wrote:

> I would like to re-enable this plugin at some point, provided the
> features identified in this thread are added (perhaps also an
> "X-Gerrit-reviewers-by-blame: 1" email header, or subject line to make
> filtering these messages easier).


Let me suggest one workflow that may work with this feature- Adding a
button, for example, with "Suggest reviewers" which you can press to get
this effect. Or doing it automatically if your history only has less than X
CRs sent. What do you think?
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Gerrit now automatically adds reviewers

2019-01-18 Thread Jaime Crespo
One member of my team sadly left. Now he is pinged every time I upload a
change, passively aggressive reminding him he used to work on this.

Don't get me wrong, I think this is great to get attention for new
contributors, to make sure it is moving forward, but I would suggest to be
able to opt-out of this.

On Fri, Jan 18, 2019 at 9:47 AM Thiemo Kreuz 
wrote:

> > […] automatically add reviewers to your changes based on who previously
> has committed changes to the file.
>
> I'm already overwhelmed with review requests. I'm also one of the
> latest contributors in sooo many files that I'm worried the plugin
> will add me to dozens per day from now on. This surprising addition
> makes me worry very much about my sanity and the usability of my inbox
> and Gerrit dashboards. Please, please, tune it down heavily.
>
> Until now, I had a process to find reviewers:
>
> 1. For planned changes, it's already obvious who needs to do the
> review: my team members. Often they don't even need to be added as
> reviewers, or don't want to, but use the "review" column on our
> Phabricator board. Automatically adding random *other* people is not
> only useless in this situation, it's counterproductive and frustrating
> for everybody involved. Other people should not waste their time with
> patches like this. When they do, it's frustrating for the one who was
> supposed to do the review, as well as for the "auto-reviewer". His
> review is not helpful and ultimately not appreciated.
>
> 2. For code cleanups in core and codebases my team does not own I open
> the list of merged patches on Gerrit to see if I can tell the names of
> one or two main contributors. Often, the list will contain nothing but
> the Translatewiki bot and a few people doing cross-codebase
> maintainance work. These highly active people should *not* be the
> first pick as potential reviewers for multiple reasons. Most
> importantly, just because someone is very productive it's not ok to
> expect him to accept even more workload. This is
> super-counterproductive and ultimately leads to people burning out.
> Secondly, just because someone updated, let's say, a call to a
> deprecated core function does not mean he is familiar with a codebase.
>
> 3. I look at the files my patch happens to touch in my PHPStorm IDE,
> enable the blame column that shows who touched a line last, and see if
> I can find the one who introduced the code I'm touching.
>
> All steps in this process involve *reading* the commit messages and
> considering what people did, when and why. This can't be automated.
>
> I do not entirely oppose the idea of adding reviewers automatically,
> as long as I (as a reviewer) have a chance to easily tell the
> difference between being added manually vs. automatically. For my
> sanity, I will most probably setup an email filter that auto-deletes
> all automated requests, and only look at these auto-reviews once a
> week via my Gerrit dashboard.
>
> Based on all these arguments this is what I, personally, find acceptable:
> * Make sure no emails are sent for being automatically added, or make
> sure it's possible to turn them off (without also killing the wanted
> emails about manually being added).
> * Make sure tool accounts like the library-upgrader or Translatewiki
> bot are never added as reviewers.
> * Never automatically add reviewers if there are other reviewers
> already. Most importantly, if people have been added via the reviewer
> bot already, that's more than enough.
> * Only add 2 reviewers. 2 people will more likely feel like a team. 3
> people are much more likely to all think "the other 2 will do it" and
> all ignore it.
> * Don't just pick the "most recent" contributor. That's most certainly
> not the person you want (probably one who fixed a typo, or updated a
> library). Implement an algorithm that can either understand who
> touched which places in the code, and assigns them to patches that
> happen to touch the same place again. Or go for an algorithm that (for
> example) analyzes the last year and picks the 2 people who did the
> most changes to a file in that time span.
>
> If more than one of these criteria is not met or not possible, the
> only solution I see is to make the plugin opt-in per codebase, not
> opt-out as of now (because you can't expect me to opt-out from
> literally a thousand codebases).
>
> Thank you for keeping my inbox sane.
>
> Best
> Thiemo
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Non-linear search in an XML dump

2018-09-03 Thread Jaime Crespo
Not that this is offtopic here, but you will find probably more
knowledgeable people and probably a quicker response at the specialized
list https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

On Mon, Sep 3, 2018 at 3:06 PM Bináris  wrote:

> Hi,
>
> As far as I understand, pages in an XML dump are in the order of their
> original creation.
> This does not correspond to the page ID, because if a page gets a new id
> after deletion and restore or renaming to that title or anything, the order
> still remains the original.
> But this sortkey itself is not stored. In other words, a dump is not sorted
> by any key one could finf in the dump, and behaves as an unosorted
> structure.
>
> Is this true? Can I use any non-linear (e.g. binary) search in a dump?
>
> --
> Bináris
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Datacenter switchover and switchback

2018-08-30 Thread Jaime Crespo
Let me explain the rationale of the bellow request for clarification:

On Wed, Aug 29, 2018 at 11:30 PM MA  wrote:

> Hello:
>
> >For the duration of the switchover (1 month), deployers are kindly
> >requested to refrain from large db schema changes and avoid deploying
> >any kind of new feature that requires creation of tables.
> >There will be a train freeze in the week of Sept 10th and Oct 8th.


During the failover, some schema changes will be finalized on the current
active datacenter (plus some major server and network maintenance may be
done)- our request is mostly to refrain from quickly enabling those large
new unlocked features (e.g. the ongoing comment refactoring, actor/user
refactoring, Multi Content Revision, JADE, major wikidata or structured
comons structure changes, new extensions not ever deployed to the cluster,
etc.) at the same time than the ongoing maintenance to reduce variables of
things that can go bad- enabling those features may be unblocked during the
switchover time, but we ask you to hold until being back on the current
active datacenter. Basically, ask yourself if you are enabling a large new
core feature or want to start a heavy-write maintenance script and there is
a chance you will need DBA/system support. Sadly, we had some instances of
this happening last year and we want to explicitly discourage this during
these 2 weeks.

In own my opinion, enabling existing features on smaller projects (size
here is in amount of server resources, not that they are less important) is
equivalent to a swat change, and I am not against it happening. I would ask
contributors to use their best judgement on every case, and ask people on
the #DBA tag on phabricator or add me as reviewers on gerrit if in doubt.
My plea is to not enable major structural changes during that time may
affect thousands of edits per minute. Swat-like changes and "boring" :-)
trains are ok.

For new wiki creations I would prefer if those were delayed but CC #DBA s
on the phabricator task to check with us.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] introduction

2018-08-16 Thread Jaime Crespo
Hi,

Here are some resources I would recommend reading first:

* https://www.mediawiki.org/wiki/Special:MyLanguage/New_Developers
* Slides:
https://commons.wikimedia.org/wiki/File:Wikimedia_Technical_Areas_An_Overview.pdf
*
https://www.mediawiki.org/wiki/Special:MyLanguage/How_to_become_a_MediaWiki_hacker

Other than that, general resources about PHP, Javascript, web development,
SQL, git can also be useful, depending on the topic you would be interested
on contributing.
Maybe someone else can provide also some extra tips and resources.

Thank you and welcome!

On Thu, Aug 16, 2018 at 1:09 PM Nadege Awah  wrote:

>  Dear sir/madame;
>
> I am Awah Nadege Tayebatu, female and a Cameroonian. I am an Applied
> Geology student of the University of Buea, Cameroon. I will love to
> contribute code to MediaWiki and so will appreciate some guide on how to
> start contributing and some learning  resources i can use as a beginner.
>
>   Thank you.
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikidata-tech] Normalization of change tag schema

2018-07-30 Thread Jaime Crespo
> --
> >> Amir Sarabadani
> >> Software Engineer
> >>
> >> Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> >> Tel. (030) 219 158 26-0
> >> http://wikimedia.de
> >>
> >> Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
> >> Wissens frei teilhaben kann. Helfen Sie uns dabei!
> >> http://spenden.wikimedia.de/
> >>
> >> Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
> >> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter
> >> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> >> Körperschaften I Berlin, Steuernummer 27/029/42207.
> >>
> >
> >
> > --
> > Amir Sarabadani
> > Software Engineer
> >
> > Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> > Tel. (030) 219 158 26-0
> > http://wikimedia.de
> >
> > Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
> > Wissens frei teilhaben kann. Helfen Sie uns dabei!
> > http://spenden.wikimedia.de/
> >
> > Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter
> > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> > Körperschaften I Berlin, Steuernummer 27/029/42207.
> >
>
>
> --
> Amir Sarabadani
> Software Engineer
>
> Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Tel. (030) 219 158 26-0
> http://wikimedia.de
>
> Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
> Wissens frei teilhaben kann. Helfen Sie uns dabei!
> http://spenden.wikimedia.de/
>
> Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] AV1? (was Re: [Multimedia] Video output changing to WebM VP9/Opus soon)

2018-07-23 Thread Jaime Crespo
Thanks to all people involved,

I just read about this new video format in the making/released [0].

Of course, I am not asking to support this, as this seems like the future,
and not the present, but being a complete noob on video formats and codecs,
I would like to know if someone more knolegeble has some insight about this
and if it is something to keep in mind/someone has tested it and has
experiences to share/client and vendor support?

--
Jaime

[0] https://blog.mozilla.org/blog/2018/07/11/royalty-free-web-video-codecs/>

On Fri, Jun 29, 2018 at 6:46 PM, Brion Vibber  wrote:

> Awesome sauce. Thanks Moritz!
>
> -- brion
>
> On Fri, Jun 29, 2018 at 7:39 AM Moritz Muehlenhoff <
> mmuhlenh...@wikimedia.org> wrote:
>
> > Hi all,
> >
> > On Thu, Jun 28, 2018 at 01:54:18PM -0700, Brion Vibber wrote:
> > > Current state on this:
> > >
> > > * still hoping to deploy the libvpx+ffmpeg backport first so we start
> > with
> > > best performance; Moritz made a start on libvpx but we still have to
> > > resolve ffmpeg (possibly by patching 3.2 instead of updating all the
> way
> > to
> > > 3.4)
> >
> > I've completed this today. We now have a separate repository component
> > for stretch-wikimedia (named component/vp9) which includes ffmpeg 3.2.10
> > (thus allowing us to follow the ffmpeg security updates released in
> Debian
> > with a local rebuild) with backported row-mt support and linked against
> > libvpx 1.7.0.
> >
> > I tested re-encoding
> >
> > https://commons.wikimedia.org/wiki/File:Wall_of_Death_-_
> Pitts_Todeswand_2017_-_Jagath_Perera.webm
> > (which is a nice fast-paced test file) from VP8 to VP9, which results in
> > a size reduction from 48M to 31M.
> >
> > When using eight CPU cores on one of our video scaler servers, enabling
> > row-mt
> > gives a significant performance boost; encoding time went down from 5:31
> > mins
> > to 3:36 mins.
> >
> > All the details can be found at
> > https://phabricator.wikimedia.org/T190333#4324995
> >
> > Cheers,
> > Moritz
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [mailman question] Adding whole domain to ban_list

2018-05-09 Thread Jaime Crespo
Hi, I found these instructions, not sure if accurate or still applicable:

https://mail.python.org/pipermail/mailman-users/2007-April/056468.html

On Wed, May 9, 2018 at 9:28 AM, Martin Urbanec <martin.urba...@wikimedia.cz>
wrote:

> If there is a way how to enforce manual adding by a list administrator,
> I'll be happy, but helping how to add whole aol.com to ban_list will be
> useful. I tried to add "^.*@aol.com$" (a regex) to the ban_list variable,
> but this apparently do not help, still some aol addresses are able to sign
> up.
>

-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Introducing Quibble, a test runner for MediaWiki

2018-05-08 Thread Jaime Crespo
I've created https://phabricator.wikimedia.org/T194125 No matter if we
enable innodb_large_preffix, migrate to binary only or reduce the maximum
size of indexes, there is work/migrations/installer changes/maintenance
needed for each of the possible solutions.

On Mon, Apr 30, 2018 at 5:18 PM, Brad Jorsch (Anomie) <bjor...@wikimedia.org
> wrote:

> On Mon, Apr 30, 2018 at 10:57 AM, Jaime Crespo <jcre...@wikimedia.org>
> wrote:
>
> > > MediaWiki currently doesn't even try to support UTF-8
> >
> > I thought the installer gave the option to chose between binary and utf8
> > 83-bytes)?
>
>
> Hmm. Yes, it looks like it does. But if all fields are varbinary, does it
> matter? Maybe it should be removed from the installer.
>
> There's also a $wgDBmysql5 configuration setting, which controls whether
> MediaWiki does "SET NAMES 'utf8'" or "SET NAMES 'binary'". I don't know
> what difference this makes, maybe none since all the columns are varbinary.
>
>
> > innodb_large_preffix cannot be enabled anymore because it is enabled
> > (hardcoded) automatically on MySQL 8.0.
> >
>
> That's good, once we raise the supported version that far. Currently it
> looks like we still support 5.5.8, which at least has the setting to
> enable.
>
> --
> Brad Jorsch (Anomie)
> Senior Software Engineer
> Wikimedia Foundation
> _______
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Introducing Quibble, a test runner for MediaWiki

2018-04-30 Thread Jaime Crespo
> MediaWiki currently doesn't even try to support UTF-8

I thought the installer gave the option to chose between binary and utf8
83-bytes)? It is ok if we support UTF-8 thought binary fields + custom
library collations, but I think the sane approach would be to either move
everthing to binary or we support the most complete collation, not the
confusing combination of the 2. Note I don't need utf8mb4 to be enabled, I
just want Mediawiki to work out of the box on any MySQL or MariaDB versions
supported, including the latest 2 of each one- even if that means doing
some workarounds.

> While it's not actually part of "strict mode"

It is not- we can delay the group by change until mariadb supports it
properly according to sql standard and we do not support any older database
version behaviour. However, strict mode ("don't add corrupt data") is
available on all versions and the default in the latest ones, and it should
be enabled at least on testing environments.

innodb_large_preffix cannot be enabled anymore because it is enabled
(hardcoded) automatically on MySQL 8.0.


On Mon, Apr 30, 2018 at 4:40 PM, Brad Jorsch (Anomie) <bjor...@wikimedia.org
> wrote:

> On Mon, Apr 30, 2018 at 9:05 AM, Jaime Crespo <jcre...@wikimedia.org>
> wrote:
>
> > * Support "real" (4-byte) UTF-8: utf8mb4 in MySQL/MariaDB (default in the
> > latest versions) and start deprecating "fake"  (3-byte) UTF-8: utf8
> >
>
> MediaWiki currently doesn't even try to support UTF-8 in MySQL. The core
> MySQL schema specifically uses "varbinary" and "blob" types for almost
> everything.
>
> Ideally we'd change that, but see below.
>
>
> > * Check code works as intended in "strict" mode (default in the latest
> > versions), at least regarding testing
> >
>
> While it's not actually part of "strict mode" (I think), I note that
> MariaDB 10.1.32 (tested on db1114) with ONLY_FULL_GROUP_BY still seems to
> have the issues described in
> https://phabricator.wikimedia.org/T108255#2415773.
>
>
> > Anomie- I think you were thinking on (maybe?) abstracting schema for
> > mediawiki- fixing the duality of binary (defining sizes in bytes) vs.
> UTF-8
> > (defining sizes in characters) would be an interesting problem to solve-
> > the duality is ok, what I mean is being able to store radically different
> > size of contents based on that setting.
> >
>
> That would be an interesting problem to solve, but doing so may be
> difficult. We have a number of fields that are currently defined as
> varbinary(255) and are fully indexed (i.e. not using a prefix).
>
>- Just changing them to varchar(255) using utf8mb4 makes the index
>exceed MySQL's column length limit.
>- Changing them to varchar(191) to keep within the length limit breaks
>content in primarily-ASCII languages that is taking advantage of the
>existing 255-byte limit to store more than 191 codepoints.
>- Using a prefixed index makes ORDER BY on the column filesort.
>- Or the column length limit can be raised if your installation jumps
>through some hoops, which seem to be the default in 5.7.7 but not
> before:
>innodb_large_prefix
><https://dev.mysql.com/doc/refman/5.7/en/innodb-
> parameters.html#sysvar_innodb_large_prefix>
>set to ON, innodb_file_format
><https://dev.mysql.com/doc/refman/5.7/en/innodb-
> parameters.html#sysvar_innodb_file_format>
>set to "Barracuda", innodb_file_per_table
><https://dev.mysql.com/doc/refman/5.7/en/innodb-
> parameters.html#sysvar_innodb_file_per_table>
>set to ON, and tables created with ROW_FORMAT=DYNAMIC or COMPRESSED. I
>don't know what MariaDB might have as defaults or requirements in which
>versions.
>
> The ideal, I suppose, would be to require those hoops be jumped through in
> order for utf8mb4 mode to be enabled. Then a lot of code in MediaWiki would
> have to vary based on that mode flag to enforce limits on bytes versus
> codepoints.
>
> BTW, for anyone reading this who's interested, the task for that schema
> abstraction idea is https://phabricator.wikimedia.org/T191231.
>
> --
> Brad Jorsch (Anomie)
> Senior Software Engineer
> Wikimedia Foundation
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Introducing Quibble, a test runner for MediaWiki

2018-04-30 Thread Jaime Crespo
The fact that Mediawiki (I know, I think it is none of core, but some
important extensions) doesn't work out of the box on the
latest/distro-available versions of MySQL and MariaDB is worrying to me
(specially when those were supposed to be the best supported systems(?)).

I know there is lots of people involved here, and that each database may
have differents degrees of support, based on the user base, plus there has
been a lot of changes on those databases since MySQL 5.0, but I had been
crying wolf for over 2 years already: T112637. Supporting MySQL/MariaDB 5.5
doesn't mean we should't support the latest stable versions, too, if the
changes are sensible.

Note I am not ranting here, and I am the first that will help anyone to fix
their code, just want to urge everybody to:

* Support "real" (4-byte) UTF-8: utf8mb4 in MySQL/MariaDB (default in the
latest versions) and start deprecating "fake"  (3-byte) UTF-8: utf8
* Check code works as intended in "strict" mode (default in the latest
versions), at least regarding testing
* Check support for latest unicode standards ("emojies")
* Avoiding unsafe writes to the database (non-deterministic statements)
like UPDATE... LIMIT without ORDER BY
* Add primary keys to all tables

Fixing those will likely reveal many hidden bugs by supposing a too lenient
storage system and will allow better support for clustering solutions.

Anomie- I think you were thinking on (maybe?) abstracting schema for
mediawiki- fixing the duality of binary (defining sizes in bytes) vs. UTF-8
(defining sizes in characters) would be an interesting problem to solve-
the duality is ok, what I mean is being able to store radically different
size of contents based on that setting.

I am also offering all mediawiki contributors time if you do not feel
confident enough with sql/persistence storage systems to make those fixes,
if you need support.

On Sat, Apr 28, 2018 at 3:58 PM, Brad Jorsch (Anomie) <bjor...@wikimedia.org
> wrote:

> On Fri, Apr 27, 2018 at 5:58 PM, Antoine Musso <hashar+...@free.fr> wrote:
>
> > [T193222] MariaDB on Stretch uses the utf8mb4 character set. Attempting
> > to create a key on VARCHAR(192) or larger would cause:
> >  Error: 1071 Specified key was too long; max key length is 767 bytes
> >
> > Reducing the key length is the obvious solution and some fields could
> > use to be converted to ENUM.
> >
>
> Personally, I'd rather we didn't use more enums. They work inconsistently
> for comparisons and ordering, and they require a schema change any time a
> new value is needed. It'd probably be better to use NameTableStore instead.
>
> --
> Brad Jorsch (Anomie)
> Senior Software Engineer
> Wikimedia Foundation
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] In what order are "Pages that link here" sorted?

2018-04-12 Thread Jaime Crespo
They are ordered by page_id, so roughly older pages (in order of creation)
will appear earlier (*it is more complicated than that, but look at the
source code for details): includes/specials/SpecialWhatlinkshere.php .

There is not a record of the timestamp of when a page link relationship was
added to that table(s)- have also into account that I would bet that table
is filled in asynchronously, so it could pass a few seconds to a few
minutes until that information is added. I guess it could be suggested as a
feature, but I feel it is not the normal way contributors use it (e.g.
checks move/delete a page, wrong links to a disambiguation page, etc.), but
I could be very wrong.

If you want accurate information about those, you will need to analyze each
page to get which edit added such link. If you just want some recent ones,
I would suggest to download the list in 2 periods of times and compute the
differences to detect the recently added or deleted links. You have older
versions of the pagelinks/templatelinks/imagelinks tables for download on
the dumps, if you cannot wait, to perform analysis.

On Thu, Apr 12, 2018 at 2:07 PM, Joseph Reagle <joseph.2...@reagle.org>
wrote:

> In what order are "Pages that link here" sorted? The follow doesn't seem
> to be by date-used or name:
>
>   https://en.wikipedia.org/w/index.php?title=Special:
> WhatLinksHere/Template:Retired
>
> I'd like to get a list of reasonable recent ones for a in-class project
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] You can now translate Phabricator to your language

2018-03-02 Thread Jaime Crespo
These are incredible good news! I have seem some contributors doubting
about filing tickets because they were intimidated by the interface. Also
it could maybe help better for coordinating local projects and communities.
Thanks!

On Fri, Mar 2, 2018 at 4:05 PM, Niklas Laxström 
wrote:

> It's now possible to translate Phabricator in translatewiki.net thanks
> to the Phabricator developers, Mukunda Modell, and many others who
> participated in https://phabricator.wikimedia.org/T225
>
> We are currently in an experimental phase, where these translations
> are only used in https://phabricator.wikimedia.org, but the plan is to
> propose these translations to Phabricator upstream.
>
> A few languages are already available and the language setting can be
> changed at https://phabricator.wikimedia.org/settings
>
> You can help our multilingual user base by translating Phabricator.
> You can find some more info about translating this project at
> https://translatewiki.net/wiki/Translating:Phabricator, and you can
> start translating directly at
> https://translatewiki.net/w/i.php?title=Special:Translate;
> group=phabricator
>
> The whole Phabricator project is large, with over 17,000 strings to
> translate. Some simple and familiar strings can be found under the
> "Maniphest" and "Project" sub-groups. "Maniphest" includes strings for
> creating and searching tasks, and "Project" includes strings for
> managing project workboards with columns. Here are the direct links to
> them
> * https://translatewiki.net/w/i.php?title=Special:Translate;
> group=phabricator-phabricator-maniphest
> * https://translatewiki.net/w/i.php?title=Special:Translate;
> group=phabricator-phabricator-project
>
>   -Niklas
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recursive Common Table expressions @ Wikimedia [was Fwd: [Wikimedia-l] What's making you happy this week? (Week of 18 February 2018)]

2018-02-28 Thread Jaime Crespo
On Wed, Feb 28, 2018 at 5:26 PM, Brad Jorsch (Anomie) <bjor...@wikimedia.org
> wrote:

> On Wed, Feb 28, 2018 at 8:47 AM, Jaime Crespo <jcre...@wikimedia.org>
> wrote:
>
> > Very recently I have been experimenting with recursive Common Table
> > Expressions [2], which are or will be available on the latest versions of
> > MySQL and MariaDB.
> >
>
> Do the other databases MediaWiki tries to support have that feature?
>

Actually, MySQL/MariaDB is the *last* database to conform to the sql:1999
WITH standard: https://modern-sql.com/feature/with#compatibility Even
sqlite suported a limited set of those!

The good news is that, probably because it arrived last, it got a pretty
feature-full implementation:
https://twitter.com/MarkusWinand/status/852862475699707904


>
> > With a single query on can obtain all titles directly or indirectly in a
> > category:
> >
> > WITH RECURSIVE cte (cl_from, cl_type) AS
> > (
> > SELECT cl_from, cl_type FROM categorylinks WHERE cl_to =
> > 'Database_management_systems' -- starting category
> > UNION
> > SELECT categorylinks.cl_from, categorylinks.cl_type FROM cte JOIN
> page
> > ON
> > cl_from = page_id JOIN categorylinks ON page_title = cl_to WHERE
> > cte.cl_type
> > = 'subcat' -- subcat addition on each iteration
> > )
> > SELECT page_title FROM cte JOIN page ON cl_from = page_id WHERE
> > page_namespace = 0 ORDER BY page_title; -- printing only articles in the
> > end
> > , ordered by title
> >
>
> Does that work efficiently on huge categories, or does it wind up fetching
> millions of rows and filesorting?


Needs more testing-- that query worked for me well enough on my laptop to
expose it directly on a webrequest (<0.1 s), but I only imported the
categorylinks and page tables, so I was working with memory. Obviously, the
more complex the query, and the more results it returns, the less likely it
is to be able to be exposed to, e.g. a public API. But at least there are
configurable limits on recursivitiy and max execution time.

Honestly, given it is a new feature, I don't expect mediawiki --which at
the moment has to support 5.5- to embrace it any time soon. However, I
wanted to ask if it was interesting enough to setup some test hosts for
mediawiki to "play" with it --e.g. evaluate performance--, and maybe (?)
some upgraded mariadb/mysql servers for WMF labsdb (for long-running
analytics or gadgets that generates reports).

-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Recursive Common Table expressions @ Wikimedia [was Fwd: [Wikimedia-l] What's making you happy this week? (Week of 18 February 2018)]

2018-02-28 Thread Jaime Crespo
I was checking sixdegreesofwikipedia.com [0] and I saw that it implements
an application-driven breath-first search [1], like many other gadgets for
Wikipedia.

Very recently I have been experimenting with recursive Common Table
Expressions [2], which are or will be available on the latest versions of
MySQL and MariaDB.

With a single query on can obtain all titles directly or indirectly in a
category:

WITH RECURSIVE cte (cl_from, cl_type) AS
(
SELECT cl_from, cl_type FROM categorylinks WHERE cl_to =
'Database_management_systems' -- starting category
UNION
SELECT categorylinks.cl_from, categorylinks.cl_type FROM cte JOIN page ON
cl_from = page_id JOIN categorylinks ON page_title = cl_to WHERE cte.cl_type
= 'subcat' -- subcat addition on each iteration
)
SELECT page_title FROM cte JOIN page ON cl_from = page_id WHERE
page_namespace = 0 ORDER BY page_title; -- printing only articles in the end
, ordered by title

(it is more complex than needed because table denormalization, other
examples would be much simpler)

Thanks to CTEs, we can traverse hierarchies, without the need of external
tools, in a single SQL query and much more efficiently-- it doesn't need an
external application.

None of these features are present on the minimum required versions of
Mediawiki, or the latest version available on WMF servers-- but I wonder if
people- Mediawiki hackers and Tools creators- would be interested on doing
those?

[0] <https://www.sixdegreesofwikipedia.com>
[1] <
https://github.com/jwngr/sdow/blob/master/sdow/breadth_first_search.py#L36>
[2] <
https://dbahire.com/mysql-8-0-new-features-in-real-life-applications-roles-and-recursive-ctes/
>


-- Forwarded message --
From: mathieu stumpf guntz <psychosl...@culture-libre.org>
Date: Tue, Feb 27, 2018 at 11:17 AM
Subject: Re: [Wikitech-l] [Wikimedia-l] What's making you happy this week?
(Week of 18 February 2018)
To: Wikimedia Mailing List <wikimedi...@lists.wikimedia.org>, Pine W <
wiki.p...@gmail.com>, "wikitech-l@lists.wikimedia.org" <
wikitech-l@lists.wikimedia.org>


What's making me happy this week is joining the "Telegrafo" discussion for
ELISo <https://t.me/joinchat/CQ8tET7pcCXQSBO1ERPJug> and I also just found
Six Degrees of Wikipedia <https://www.sixdegreesofwikip
edia.com/?source=Peace=Epistemology>.


Le 18/02/2018 à 23:12, Pine W a écrit :

> What's making me happy this week is Isarra's persistence in working on the
> Timeless skin. Timeless is based on Winter. [0] [1]
>
> For anyone who would like to try Timeless, it's available in Preferences
> under Appearance / Skin.
>
> What's making you happy this week?
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
> [0] https://www.mediawiki.org/wiki/Skin:Timeless
> [1] https://www.mediawiki.org/wiki/Winter
> ___
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wik
> i/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: wikimedi...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
>

_______
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Special:Export was overloading enwiki databases; limits have been set

2018-02-02 Thread Jaime Crespo
Exporting pages on en.wikipedia may result on a failure, this is being
investigated right now. For stability reasons, time to export pages has
been temporarily limited to avoid an worse outage, afecting regular page
views and edits. While we do not have right now any advice regarding
changing behaviours when exporting pages, we advice to check any exports
done are successful until this is resolved, specially if done unattended by
bots- a portion of those exports could be failing and people may not be
aware.

Right now only a single, not-logged-in user is probably affected on
en.wikipedia around 1200 times in the last 12 hours, but it could affect in
the future other users on other wikis, too.

This is publicly tracked on:
https://phabricator.wikimedia.org/T186318

If you use the wikiexport function and it is still working for you/started
failing, feel free to provide us feedback on the ticket above. Pregenerated
dumps at https://dumps.wikimedia.org/backup-index.html or wikireplicas
would be almost universally a better way to get mass-revisions.
-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Help test longer edit summaries (and other comments) on Beta Cluster

2017-11-20 Thread Jaime Crespo
This is so cool! This was almost at the top of the wishlist on previous
years! Thank you.

On Fri, Nov 17, 2017 at 8:39 PM, Brad Jorsch (Anomie) <bjor...@wikimedia.org
> wrote:

> I've just now enable the feature flag on the Beta Cluster[1] that allows
> MediaWiki to store comments longer than 255 bytes.
>
> The web UI has not been updated to allow longer comments in places where it
> enforces a limit, such as the edit summary box. But if you use the API to
> edit, or perform page moves or do other things where long comments could be
> entered and were truncated, you should now find that they're truncated at
> 1000 Unicode characters rather than 255 bytes.
>
> Please test it out! If you find errors, or places in core features (not
> comments in extensions such as SecurePoll, AbuseFilter, CheckUser, or Flow)
> where *new* comments are still being truncated to 255 bytes, or places
> where comments aren't showing up at all, please let me know. You can reply
> to this message or post a task in Phabricator and add me as a subscriber.
>
> If things go well, we'll look at rolling this out to production wikis once
> the schema changes to the production databases are complete. See
> https://phabricator.wikimedia.org/T174569 to follow progress there.
>
> If anyone is interested in submitting patches for the web UI to reflect the
> changed length limits, please do. I'll try to review them if you add me as
> a reviewer.
>
>
>  [1]: https://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page
>
> --
> Brad Jorsch (Anomie)
> Senior Software Engineer
> Wikimedia Foundation
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Roadmap for continued compatibility with MySQL

2017-10-20 Thread Jaime Crespo
> How many years into the future is it reasonable to expect that MySQL will
remain a recommended database server for "production use" along with
MariaDB now that the Foundation has switched to using MariaDB internally

Given a migration back to MySQL is not out of the table for us WMF DBAs,
and I will complain if we drop compatibility; I would say both MariaDB and
MySQL will probably be supported for as long as the databases are or even
more- we just "upgraded" the requirements from 5.0 to 5.5 for newer mw
versions.

> but the list of variances seems to be slowly growing

That is annoying, and even more worrying to me is MariaDB roadmap about
free software/lock-in: https://mariadb.com/bsl-faq-mariadb
https://mariadb.com/projects-using-bsl-11

> is there anywhere else (wiki pages, Phabricator tasks/tags, etc.) that I
should check for technical roadmap information before emailing wikitech-l?

If you are in charge of a mediawiki installation and want to know the
ops/db maintenance side of things, get in contact with me on phabricator; I
would be glad to share experiences.

On Fri, Oct 20, 2017 at 3:41 AM, Hogan, Michael C <
michael.c.hog...@boeing.com> wrote:

> How many years into the future is it reasonable to expect that MySQL will
> remain a recommended database server for "production use" along with
> MariaDB now that the Foundation has switched to using MariaDB internally? I
> know that MariaDB claims to be drop in compatible with MySQL, but the list
> of variances seems to be slowly growing. References...
> * Compatibility - [[mw:Compatibility]]
> * MariaDB support? - [[mw:Topic:R1hqki3kaytylml4]]
> * https://mariadb.com/kb/en/library/mariadb-vs-mysql-compatibility/
>
> Other than [[mw:Compatibility]] on mediawiki.org is there anywhere else
> (wiki pages, Phabricator tasks/tags, etc.) that I should check for
> technical roadmap information before emailing wikitech-l?
>
>
> Thank you!
> Michael
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Wikidata recentchanges on commonswiki and ruwiki not shown

2017-10-09 Thread Jaime Crespo
Hi all,

This is a heads up that I asked, and other developers agreed with the
decision, to *temporarily* disable wikidata-originated recentchanges
appearing on commonswiki and ruwiki (Commons and Wikipedia in
Russian): [0]. I cannot disclose right now all reasons (we will do
when things are fixed), but please understand that server admins only
do this in case of emergencies such as things being down or in a very
broken state. For your tranquillity, this will not affect any wikidata
functionality and has no edit or other data loss- it will only make
wikidata edits not be *shown* (but they will take effect) on
recentchanges and watchlist on other projects (but updates and
wikilinks happen as usual) during the time it is disabled.

Part of the story can be read at [1], even with only that information,
the rollback I think is not much worse than the issues created. We do
not discard extending this action to other wikis, too. We apologize
for the impact this could have on vandalism patrolling- we will try to
make things better soon, so that both errors are minimized but also
new features can be active again soon.

Regards,

[0] 
<url:https://phabricator.wikimedia.org/source/mediawiki-config/browse/master/wmf-config/Wikibase-production.php;d48276756839dc9922ac00aebc87b20d31ad2474$209>
[1] <url:https://phabricator.wikimedia.org/T171027>

-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Can we drop revision hashes (rev_sha1)?

2017-09-19 Thread Jaime Crespo
I am not a mediawiki developer, but shouldn't sha1 be moved instead of
deleted/not deleted? Moved to the content table- so it is kept
unaltered.

That way it can be used for all the goals that have been discussed
(detect reversions, XML dumps, etc.) and they are not altered, just
moved away (being more compatible). And it is not like structure
compatibility is going to be kept, as many fields are going to be
"moved" there, so code using the tables directly has to change anyway;
but if the actual content is not altered, the sha field can be kept
unaltered with the same value as before. It would also allow to detect
a "partial revertion", that means, mediawiki text is set to the same
than a previous one, which is what I assume it is used now. However,
now there will be other content that can be reverted individually.

I do not know what exactly MCR is going to be used for, but if (silly
idea), main text article and categories are 2 different contents of an
article, if user A edits both, and user B reverts the text only, that
would get a different revision sha1 value; however, most reasons here
would want to detect the reversion by checking the sha of the text
only (aka content). Equally, for backwards compatibility, storing it
on content would allow to not have to recalculate it for all already
existing values literally reducing it to a "trivial" code change,
while keeping all old data valid. Keeping the field as is, on
revision, will mean all historical data and old dumps are invalid.
Full revision reversions, if needed, can be checked by checking each
individual content sha or the linked content ids.

If, on the other side, revision should be kept completely backwards
compatible, some helper views can be created on the cloud
wikireplicas, but other than that, MCR would not be possible.

If at a later time, text with the same hash is detected (and content
double checked), content could be normalized by assigning the same id
to the same content?

On Mon, Sep 18, 2017 at 8:25 PM, Danny B. <wikipedia.dann...@email.cz> wrote:
>
> -- Původní e-mail --
> Od: Dan Andreescu <dandree...@wikimedia.org>
> Komu: Wikimedia developers <wikitech-l@lists.wikimedia.org>
> Datum: 18. 9. 2017 16:26:18
> Předmět: Re: [Wikitech-l] Can we drop revision hashes (rev_sha1)?
> "So, as things stand, rev_sha1 in the database is used for:
>
> 1. the XML dumps process and all the researchers depending on the XML dumps
> (probably just for revert detection)
> 2. revert detection for libraries like python-mwreverts [1]
> 3. revert detection in mediawiki history reconstruction processes in Hadoop
> (Wikistats 2.0)
> 4. revert detection in Wikistats 1.0
> 5. revert detection for tools that run on labs, like Wikimetrics
> ?. I think Aaron also uses rev_sha1 in ORES, but I can't seem to find the
> latest code for that service
>
> If you think about this list above as a flow of data, you'll see that
> rev_sha1 is replicated to xml, labs databases, hadoop, ML models, etc. So
> removing it and adding it back downstream from the main mediawiki database
> somewhere, like in XML, cuts off the other places that need it. That means
> it must be available either in the mediawiki database or in some other
> central database which all those other consumers can pull from.
> "
>
>
>
> I use rev_sha1 on replicas to check the consistency of modules, templates or
> other pages (typically help) which should be same between projects (either
> within one language or even crosslanguage, if the page is not language
> dependent). In other words to detect possible changes in them and syncing
> them.
>
>
>
>
> Also, I haven't noticed it mentioned in the thread: Flow also notices users
> on reverts, but IDK whether it uses rev_sha1 or not. So I'm rather
> mentioning it.
>
>
>
>
>
>
>
> Kind regards
>
>
>
>
>
>
>
> Danny B.
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fwd: Notes on Mediawiki Hackathon 2017, Vienna, Austria

2017-05-22 Thread Jaime Crespo
On Mon, May 22, 2017 at 3:39 AM, Shrinivasan T  wrote:
> Hi friends,
> Will fix few issues and host it on a server soon. Till then try
> installing in your local computers and try it out.

That is really cool! Remember you have available hosting resources at
the Wikimedia infrastructure if you want them:
https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_Introduction

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Get Wikipedia Page Titles using API looks Endless

2017-05-07 Thread Jaime Crespo
>
> Looking at https://dumps.wikimedia.org/enwiki/20170501/ you can find:
>
> 2017-05-03 07:26:20 done List of all page titles
> https://dumps.wikimedia.org/enwiki/20170501/enwiki-20170501-all-titles.gz
> (221.7 MB)
> 2017-05-03 07:22:02 done List of page titles in main namespace
> https://dumps.wikimedia.org/enwiki/20170501/enwiki-
> 20170501-all-titles-in-ns0.gz (70.8 MB)
>

If you want to do analysis of namespaces and redirects on your own, you can
also use:
https://dumps.wikimedia.org/enwiki/20170501/enwiki-20170501-page.sql.gz
It is larger, but you can filter by columns page_is_redirect
and page_namespace on your own terms.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Get Wikipedia Page Titles using API looks Endless

2017-05-07 Thread Jaime Crespo
On Sat, May 6, 2017 at 9:12 PM, Abdulfattah Safa 
 wrote:

> I'm trying to get all the page titles in Wikipedia in namespace using the
> API as following:
>
> https://en.wikipedia.org/w/api.php?action=query=
> xml=allpages=0=nonredirects&
> aplimit=max&$continue=-||$apcontinue=BASE_PAGE_TITLE
>
> I keep requesting this url and checking the response if contains continue
> tag. if yes, then I use same request but change the *BASE_PAGE_TITLE *to
> the value in apcontinue attribute in the response.
> My applications had been running since 3 days and number of retrieved
> exceeds 30M, whereas it is about 13M in the dumps.
> any idea?
>

Please do not scrap the web for those kind of requests- it is a waste of
resources for you and for Wikimedia servers (given that there is a faster
and more reliable alternative).

Looking at https://dumps.wikimedia.org/enwiki/20170501/ you can find:

2017-05-03 07:26:20 done List of all page titles
https://dumps.wikimedia.org/enwiki/20170501/enwiki-20170501-all-titles.gz
(221.7 MB)
2017-05-03 07:22:02 done List of page titles in main namespace
https://dumps.wikimedia.org/enwiki/20170501/enwiki-20170501-all-titles-in-ns0.gz
(70.8 MB)

Use one of the above. Not only it is faster, you will also get consistent
results- by the time you stop going over your loop, pages have been created
and deleted. The above exports are done trying to get the most consistent
state as practically possible, and actively monitored by WMF staff.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Historical use of latin1 fields in MySQL

2017-05-02 Thread Jaime Crespo
On Tue, May 2, 2017 at 9:24 PM, Brian Wolff <bawo...@gmail.com> wrote:

> .
> >
> > On the latest discussions, there are proposals to increase the minimum
> > mediawiki requirements to MySQL/MariaDB 5.5 and allow binary or utf8mb4
> > (not utf8, 3 byte utf8), https://phabricator.wikimedia.org/T161232.
> Utf8mb4
> > should be enough for most uses (utf8 will not allow for emojis, for
> > example), although I am not up to date with the latest unicode standard
> > changes and MySQL features supporting them.
> >
>
> I dont know about mysql, but in unicode emojis are like any other astral
> character, and utf-8 can encode them in 4 bytes*.
>

I am sorry I wasn't clear before, MySQL's utf8 IS NOT international
standard generally known as UTF-8, it is a bastardization of 3-byte max
UTF-8. MySQL's utf8mb4 is UTF-8:

Proof:

```
mysql> use test
Database changed
mysql> CREATE TABLE test (a char(1) CHARSET utf8, b char(1) CHARSET
utf8mb4, c binary(4));
Query OK, 0 rows affected (0.02 sec)

mysql> SET NAMES utf8mb4;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test VALUES ('\U+1F4A9', '\U+1F4A9', '\U+1F4A9');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+-+--++
| Level   | Code | Message
   |
+-+--++
| Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' for column
'a' at row 1 |
+-+--++
1 row in set (0.01 sec)

mysql> SELECT * FROM test;
+--+--+--+
| a| b| c|
+--+--+--+
| ?|  |  | -- you will need an emoji-compatible font here
+--+--+--+
1 row in set (0.00 sec)

mysql> SELECT hex(a), hex(b), hex(c) FROM test;
++--+--+
| hex(a) | hex(b)   | hex(c)   |
++--+--+
| 3F | F09F92A9 | F09F92A9 |
++--+--+
1 row in set (0.00 sec)
```

To avoid truncations:

```
mysql> set sql_mode='TRADITIONAL'; --
https://phabricator.wikimedia.org/T108255
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test VALUES ('\U+1F4A9', '\U+1F4A9', '\U+1F4A9');
ERROR 1366 (22007): Incorrect string value: '\xF0\x9F\x92\xA9' for column
'a' at row 1
```

More info at:
https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8.html vs.
https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html


-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Historical use of latin1 fields in MySQL

2017-05-02 Thread Jaime Crespo
Mark,

On Tue, May 2, 2017 at 7:10 PM, Mark Clements (HappyDog) <
gm...@kennel17.co.uk> wrote:

> Hi all,
>
> I seem to recall that a long, long time ago MediaWiki was using UTF-8
> internally but storing the data in 'latin1' fields in MySQL.
>
> I notice that there is now the option to use either 'utf8' or 'binary'
> columns (via the $wgDBmysql5 setting), and the default appears to be
> 'binary'.[1]
>

I can provide you general information about  the MySQL side of things.

'utf8' in MySQL is 3-bytes UTF-8. "Real" UTF-8 is called in MySQL utf8mb4.
While this may sound silly, think that emojies and characters beyond the
basic multilingual plane were probably more theoretical than practical
10-15 years ago, and variable-string performance was not good for MySQL on
those early versions.

I know there was some conversion pain in the past, but right now, in order
to be as compatible as possible, on WMF servers binary collation is being
used almost everywhere (there may be some old text not converted, but this
is true for most live data/metadata databases that I have seen). Mediawiki
only requires MySQL 5.0 and using binary strings allows to support
collations and charsets only available on the latest MySQL/MariaDB versions.

On the latest discussions, there are proposals to increase the minimum
mediawiki requirements to MySQL/MariaDB 5.5 and allow binary or utf8mb4
(not utf8, 3 byte utf8), https://phabricator.wikimedia.org/T161232. Utf8mb4
should be enough for most uses (utf8 will not allow for emojis, for
example), although I am not up to date with the latest unicode standard
changes and MySQL features supporting them.

I've come across an old project which followed MediaWiki's lead (literally
> - it cites MediaWiki as the reason) and stores its UTF-8 data in latin1
> tables.  I need to upgrade it to a more modern data infrastructure, but I'm
> hesitant to simply switch to 'utf8' without understanding the reasons for
> this initial implementation decision.
>

I strongly suggest to go for utf8mb4, if mysql >=5.5, and only binary if
you have some special needs that that doesn't cover. InnoDB variable-length
performance has been "fixed" on the newest InnoDB versions and it is the
recommended deafault nowadays.

Cheers,
-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikipedia-l] Data centre switchover to Eqiad

2017-04-30 Thread Jaime Crespo
On Sun, Apr 30, 2017 at 9:24 PM, Eddie Greiner-Petter <
wikimedia@eddie-sh.de> wrote:

> That reminds me that we noticed during switch to codfw that the message
> shown when trying to really edit a page (the mediawiki read-only
> message) contains:
>
> The system administrator who locked it offered this explanation:
> MediaWiki is in read-only mode for maintenance. Please try again in a
> few minutes
>
> which isn't quite informative. Is there a task for changing the "offered
> explanation" part? Some hint about the DC switch (and maybe a link to
> the meta page) would be better.


The read-only messages are controlled by these strings:

https://phabricator.wikimedia.org/source/mediawiki-config/browse/master/wmf-config/db-codfw.php;a65f35adbc9d2c8c9a85e956a64661783d2c973d$645
https://phabricator.wikimedia.org/source/mediawiki-config/browse/master/wmf-config/db-eqiad.php;a65f35adbc9d2c8c9a85e956a64661783d2c973d$664

I think with the pressure of keeping everything up, plus discussing
internally if we should declare a specific amount of time (given it doesn't
auto-update) we ended up a very generic message. We are, however, looking
at showing better error messages like on the ticket I reported at
https://phabricator.wikimedia.org/T163455#3199813

Please send a pull request or file a new ticket on Phabricator with the
#operations and #codfw-rollout tags with a proposal and we can definitely
change it by Wednesday.

Thanks,

-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Potential Spoof] Question about wikidata dump bz2 file

2017-04-06 Thread Jaime Crespo
Trung,

If you do not get an answer on the developers' forum, there is a
dumps-focused mailing list at
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Cheers,

On Thu, Apr 6, 2017 at 6:59 AM, Trung Dinh <t...@fb.com> wrote:

> Sorry, I hit enter early by accident.
>
> I realized the dump file for wikidata is no longer in the format
> wikidatawiki-2017-pages-articles.xml.bz2 anymore.
> Now, it is split in to different dumps:
> https://dumps.wikimedia.org/wikidatawiki/latest/
> wikidatawiki-latest-md5sums.txt
>
> I am wondering when did this happen and the rationale behind it. Will it
> be permanent or we will switch back to the original format soon ?
>
> Thank you,
>
> Best regards,
>
> Trung
>
> On 4/5/17, 9:57 PM, "Wikitech-l on behalf of Trung Dinh" <
> wikitech-l-boun...@lists.wikimedia.org on behalf of t...@fb.com> wrote:
>
> Hi everyone,
>
> I realized the dump file for wikidata is no longer in the format
> wikidatawiki-2017-pages-articles.xml.bz2 anymore.
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Long running tasks/scripts now included on [[wikitech:Deployments]]

2016-09-22 Thread Jaime Crespo
Let me clarify the reasoning for the idea:

We realized that some schema changes (which used to be scheduled like other
deployments) no longer take 1 hour (they can take 1 month, running
continuously like https://phabricator.wikimedia.org/T139090 , because it
affects 3 of our largest tables). Also, they no longer requires read-only
mode or affect code in anyway (unless they are a prerequisite).

On the other side, a schema change, combined with high read or write load
from long-running maintenance jobs, like those of the updateCollation
script, or any other (those where just an example), could potentially make
lagging a worse problem: a single transaction has to store pending changes
during its lifetime, or long-running reads can block and create pileups due
to metadata locking. We want to avoid those, which certainly caused
infrastructure issues in the past.

So, in summary, regular deployments are exclusive from each others.
Long-running maintenance work could affect each other. This is a way for me
(and others) to have visibility of those potential negative interactions,
and make sure we can coordinate: "You are doing work on enwiki? No problem,
we will just run this task for commons". "you need to do an emergency data
recovery? I will wait to do this other task that can wait". Even if only
DBAs use it, it is already useful to not perform incompatible changes at
the same time. But it will be even more useful if everybody uses it!

On Thu, Sep 22, 2016 at 4:27 PM, Alex Monk <am...@wikimedia.org> wrote:

> I had been assuming that puppetised crons were not really relevant...
>
> On 22 September 2016 at 15:19, Guillaume Lederrey <gleder...@wikimedia.org
> > wrote:
>
>> Hello!
>>
>> Increasing visibility sounds like a great idea! How far do we want to
>> go in that direction? In particular, I'm thinking of a few of the
>> crons we have for Cirrus. For example, we do have daily crons on
>> terbium that re-generate the suggester indices. Those can run for >
>> 1h.
>>
>> My understanding is that those kind of crons should not be considered
>> scripts, but standard working parts of the system. Adding them will
>> probably generate more noise than useful information. Is this a
>> reasonable understanding?
>>
>> Thanks!
>>
>>Guillaume
>>
>>
>>
>> On Wed, Sep 21, 2016 at 12:29 AM, Greg Grossmeier <g...@wikimedia.org>
>> wrote:
>> > In an effort to reduce surprises and potential mishaps it is now
>> > required to include any long running tasks in the deployment
>> > calendar[0].
>> >
>> > "Long running tasks" include any script that is run on production 'work
>> > machines' such as terbium that last for longer than ~1 hour. Think:
>> > migration and maintenance scripts.
>> >
>> > This was discussed and proposed in T144661[1].
>> >
>> > Best,
>> >
>> > Greg
>> >
>> > [0] https://wikitech.wikimedia.org/wiki/Deployments
>> > Relevant diff:
>> > https://wikitech.wikimedia.org/w/index.php?diff=850923=850244
>> > [1] https://phabricator.wikimedia.org/T144661
>> >
>> > --
>> > | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
>> > | Release Team ManagerA18D 1138 8E47 FAC8 1C7D |
>> >
>> > ___
>> > Engineering mailing list
>> > engineer...@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/engineering
>> >
>>
>>
>>
>> --
>> Guillaume Lederrey
>> Operations Engineer, Discovery
>> Wikimedia Foundation
>> UTC+2 / CEST
>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
>
> --
> Alex Monk
> VisualEditor/Editing team
> https://wikimediafoundation.org/wiki/User:Krenair_(WMF)
>
> ___
> Engineering mailing list
> engineer...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/engineering
>
>


-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Schema migration for 'image' and 'oldimage' tables

2016-08-11 Thread Jaime Crespo
idst of this migration. Seems possible, but is
> it worth the complexity? (We'd need extra code that knows about that
> migration field, and how long do we keep that code? Also complicates
> migration for third-parties using update.php).
>
> Is creating the new tables separately viable for the scale of Wikimedia
> Commons? (and dropping the old ones once finished). Is this a concern from
> a DBA perspective with regards to storage space? (We'd temporarily need
> about twice the space for these tables). So far I understood that it
> wouldn't be a problem per se, but that there are also other options we can
> explore for Wikimedia. For example we could use a separate set of slaves
> and alter those while depooled (essentially using entirely separate set of
> db slaves instead of a separate table within each slave).
>
> Do we create the new table(s) separately and switch over once it's caught
> up? This would require doing multiple passes as we depool slaves one by one
> (we've done that before at Wikimedia). Switch-over could be done by
> migrating before the software upgrade, with a very short read-only period
> after the last pass is finished. It wouldn't require maintaining multiple
> code paths, which is attractive.
>
> Other ideas?
>
> -- Timo
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RESTBase multiple pages with one request

2016-08-04 Thread Jaime Crespo
Sorry, I am not sure 100%, if that is true, maybe creating a feature
request may help suggesting its implementation?

On Thu, Aug 4, 2016 at 3:09 PM, Toni Hermoso Pulido <toni...@cau.cat> wrote:
> Thanks Jaime, so it only works with Action (MediaWiki default) API so
> far, doesn't it?
>
> El 08/04/2016 a les 10:07 AM, Jaime Crespo ha escrit:
>> Hi, you can combine multiple pages with the "pipe" sign:
>>
>> Check:
>> <https://en.wikipedia.org/w/api.php?action=query=revisions=content=jsonfm=Hillary_Clinton|Donald_Trump>
>> (change 'jsonfm' for 'json' on a real request)
>> There is a limit on the number of pages depending on your account
>> rights, but it is very helpful to avoid round-trip latencies for us in
>> high-latency places.
>>
>>
>> On Thu, Aug 4, 2016 at 9:34 AM, Toni Hermoso Pulido <toni...@cau.cat> wrote:
>>> Hello,
>>>
>>> is it already possible to retrieve data from different pages just by
>>> using one request?
>>>
>>> E.g by combining:
>>> https://en.wikipedia.org/api/rest_v1/page/summary/Electron
>>> and
>>> https://en.wikipedia.org/api/rest_v1/page/summary/Dog
>>>
>>>
>
> --
> Toni Hermoso Pulido
> http://www.cau.cat
> http://www.similis.cc
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RESTBase multiple pages with one request

2016-08-04 Thread Jaime Crespo
Hi, you can combine multiple pages with the "pipe" sign:

Check:
<https://en.wikipedia.org/w/api.php?action=query=revisions=content=jsonfm=Hillary_Clinton|Donald_Trump>
(change 'jsonfm' for 'json' on a real request)
There is a limit on the number of pages depending on your account
rights, but it is very helpful to avoid round-trip latencies for us in
high-latency places.


On Thu, Aug 4, 2016 at 9:34 AM, Toni Hermoso Pulido <toni...@cau.cat> wrote:
> Hello,
>
> is it already possible to retrieve data from different pages just by
> using one request?
>
> E.g by combining:
> https://en.wikipedia.org/api/rest_v1/page/summary/Electron
> and
> https://en.wikipedia.org/api/rest_v1/page/summary/Dog
>
> Thanks!
>
> --
> Toni Hermoso Pulido
> http://www.cau.cat
> http://www.similis.cc
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] What's the "correct" content model when rev_content_model is NULL?

2016-07-12 Thread Jaime Crespo
On Tue, Jul 12, 2016 at 12:40 PM, Daniel Kinzler
<daniel.kinz...@wikimedia.de> wrote:
> Yea, still something we need to figure out :)

> That was, if I remember correctly, one of the arguments for using readable
> strings there, instead of int values and a config variable, as I originally
> proposed. This was discussed at the last Berlin hackathon, must have been 
> 2012.
> Tim may remember more details. We should probably re-consider the pros and 
> cons
> we discussed back then when planning to change the scham now.

But that was already re-reviewed and discussed and approved by Tim
himself (among others) on 2015:
<https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-07-29-20.59.html>.

-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] What's the "correct" content model when rev_content_model is NULL?

2016-07-12 Thread Jaime Crespo
Your last question is a non issue for me- I do not care if things are
on the database or on configuration- that is not the issue I have been
complaining about.

What I blocked is having 6000 million rows (x40 due to redundancy)
with the same column value "gzip; version 3 (1-2-3-testing-testing. It
seems to work)" when it can be summarized as a 1-byte or less id (and
that id be explained somewhere else). The difference between both
options is extremely cheap to code and not only it would save
thousands of dollars in server cost, it would also minimize
maintenance cost and dramatically increase performance (or not
decrease it) on one of the largest bottlenecks for large wikis, as it
could fit fully into memory (yes, we have 515 GB servers now).

To give you an idea how how bad things are currently: WMF's
architecture technically does not store on the main databases servers
any data (a lot of asterisks here, allow me be inexact for the sake of
simplicity), only metadata, as the wiki content is stored on the
"external storage" subsystem. I gave a try to InnoDB compression [0]
(which has a very low compression ratio and a very small block size,
as it is for real-time purposes only), yet I was able to reduce the
disk usage to less than half by only compressing the top 10 tables:
[1]. If this is not an objective measurement of how inefficient
mediawiki schema is, I do not know how I can convince you otherwise.

Of course there are a lot of history and legacy and maintenance
issues, but when the guy that actually would spend days of his life
running schema changes so they do not affect production is the one
begging for them to happen you know there is an issue. And this is not
a "mediawiki" is bad complain- I think mediawiki is a very good piece
of software- I only want to make it better with very, very small
maintenance-like changes.

> The disadvantage is of course that the model and format are not obvious when
> eyeballing the result of an SQL query.

Are you serious? Because this is super-clear already :-P:

MariaDB  db1057 enwiki > SELECT * FROM revision LIMIT 1000,1\G
*** 1. row ***
   rev_text_id: 1161 -- what?
[...]
 rev_content_model: NULL -- what?
rev_content_format: NULL
1 row in set (0.00 sec)

MariaDB  db1057 enwiki > SELECT * FROM text WHERE old_id=1161; -- WTF, old_id?
++-++
| old_id | old_text| old_flags  |
++-++
|   1161 | DB://rc1/15474102/0 | external,utf-8 |  -- WTF is this?
++-++
1 row in set (0.03 sec)

I am joking at this point, but emulating what someone that looks at
the db would say. My point is that mediawiki is no longer simple.

More recommended reading (not for you, for many developers that still
are afraid of them- and I really found many cases in the wild for
otherwise good contributors):
<https://en.wikipedia.org/wiki/Join_(SQL)>


[0] <https://phabricator.wikimedia.org/T139055>
[1] 
<https://grafana.wikimedia.org/dashboard/db/server-board?panelId=17=1467294350779=1467687175941=db1073=eth0>

On Tue, Jul 12, 2016 at 10:40 AM, Daniel Kinzler
<daniel.kinz...@wikimedia.de> wrote:
> Addendum, after sleeping over this:
>
> Do we really want to manage something that is essentially configuration, 
> namely
> the set of available content models and formats, in a database table? How is 
> it
> maintained?

-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] What's the "correct" content model when rev_content_model is NULL?

2016-07-11 Thread Jaime Crespo
On Mon, Jul 11, 2016 at 2:07 PM, Daniel Kinzler
 wrote:
> It seems there is disagreement about what the correct interpretation of NULL 
> in
> the rev_content_model column is. Should NULL there mean

> What should we write into rev_content_model in the future

Content model handling is pending a refactoring:

Once that happens, they should never be NULL.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Commons read-only period

2016-07-06 Thread Jaime Crespo
Hi,

Yesterday, 5 July 2016 there was a period of read-only mode affecting
mostly Commons wiki, but also to some deletions on other wikis,
starting at 10:22 UTC. You have the details of the issue at:

<url:https://wikitech.wikimedia.org/wiki/Incident_documentation/20160705-commons-replication>

Regards,
-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Etherpad outage

2016-07-05 Thread Jaime Crespo
On Fri, Jun 24, 2016 at 12:56 AM, Jaime Crespo <jcre...@wikimedia.org> wrote:
> The -restore url will be available for **a week** until it is deleted.

This was deleted today. Continue using https://etherpad.wikimedia.org
as usual (but remember to backup anything important!).

Regards,
-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Etherpad outage

2016-06-23 Thread Jaime Crespo
On Thu, Jun 23, 2016 at 7:02 PM, Jaime Crespo <jcre...@wikimedia.org> wrote:
> Hi all,
>
> Etherpad [0], our real-time colaborative editing tool suffered an
> outage due to what we only know for now was database corruption. This

> - If possible, recover the last days of edits on a separate location.
> See [1] for progress if you are affected.

Thanks to Alex's incredible work to make it run again, the previous
version of the etherpad database (a few minutes before the crash-
around 13:30 UTC) was recovered, and it is available temporarily on:

https://etherpad-restore.wikimedia.org

If you want to recover some lost text, **you need to copy it manually
from here and paste it into https://etherpad.wikimedia.org (the usual
address)** We will **not** touch the current etherpad, as some of you
have already added/recovered your texts.

The -restore url will be available for **a week** until it is deleted.

Please resend this information to anybody that may find this useful,
so no important data is lost.

Regards,
-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Etherpad outage

2016-06-23 Thread Jaime Crespo
Hi all,

Etherpad [0], our real-time colaborative editing tool suffered an
outage due to what we only know for now was database corruption. This
was detected shortly after it happened 14:27 UTC and we (ops in charge
of the service and the database) worked to reestablish the service.

As the service continued crashing despite our efforts, we decided to
recover a database backup from 2016-06-22 01:00:01 UTC. The service is
now back up and working since 16:11 UTC, but that means that you may
have lost a day and a half of edits in the current available etherpad
[0].

I understand that that may cause a lot of inconveniences, specially
for the people at Wikimania. *We are now trying to recover more than
that*, but as the corruption could come back, or not all could be
recovered, and people need the service the plan is the following:

- Keep the current pads as is, not delete or add anything from now.
You can continue using etherpad now as usual.
- If possible, recover the last days of edits on a separate location.
See [1] for progress if you are affected.

Sorry for the inconveniences. Please, more than ever, follow the
recommendation we added at the beginning of every empty pad:
> "Keep in mind as well that there is no guarantee that a pad's contents will 
> always be available. A pad may be corrupted, deleted or similar. Please keep 
> a copy of important data somewhere else as well"
The reason for this is that wiki content has proper HA and redundancy,
etherpad does not.

Again, my most sincere apologies,

[0] <https://etherpad.wikimedia.org/>
[1] <https://phabricator.wikimedia.org/T138516>
-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Should we switch the default category collation to uca-default?

2016-06-03 Thread Jaime Crespo
> I agree with having a discussion on Meta-Wiki. I think this type of larger
> undertaking also requires close coordination with a database
> administrator, if it requires running maintenance/updateCollation.php.

A (the?) DBA has been closely following and helping the team doing the
latest changes to the script, making sure it executes faster and with
non-impacting load. He is ok with the current implementation and happy
to let it run (one wiki at a time).

-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] historical trivia: who first picked UTC as Wikimedia time, when and why?

2016-05-10 Thread Jaime Crespo
On Mon, May 9, 2016 at 6:15 PM, Brion Vibber  wrote:
> In 2001 when Magnus was writing the initial attempt at a custom wiki engine
> in PHP backed by MySQL, he chose to use the TIMESTAMP column type.
>
> TIMESTAMPs in MySQL 3 were automatically filled out by the server at INSERT
> time, normalized to UTC, and exposed in the 14-digit MMDDHHMMSS format
> we still know and love today.

By the way, if we had to design this from the scratch, TIMESTAMPs now
allow to not be auto set
https://dev.mysql.com/doc/refman/5.6/en/timestamp-initialization.html
This would save 15 - 4 = 11 bytes per row. However, both in code and
migration effort + potential bugs + backwards compatibility, this is
not worth it.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Phabricator was down for a short time today (April 4th)

2016-04-14 Thread Jaime Crespo
On Thu, Apr 14, 2016 at 10:26 PM, Mukunda Modell  wrote:
> Phabricator is gaining improved high-availability support thanks to recent
> work upstream, so it might be possible to have dual-master phabricator
> nodes in the near future. See https://secure.phabricator.com/T10751 for
> upstream progress.

Phabricator has 3 dedicated bare-metal machines on database side
(including geographical replication and a 24-hour delayed slave).
Currently the slaves are mostly used for backups, maintenance and long
running stats.

It would be great if they could be used for the main app, too (ro and
semi-automatic failover)!

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] X-Wikimedia-Debug, your new secret side-kick

2016-03-31 Thread Jaime Crespo
On Thu, Mar 31, 2016 at 3:32 AM, Ori Livneh <o...@wikimedia.org> wrote:

> Cool? Cool.

Definitely Cool.

-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] pt.wikimedia.org - database naming

2016-03-19 Thread Jaime Crespo
On Fri, Mar 18, 2016 at 2:51 PM, Chad <innocentkil...@gmail.com> wrote:

> Interwiki link existence isn't checked or tracked. File stuff isn't a
> problem (since ptwikimedia wasn't a central repo & existed prior to
> GlobalUsage being a thing). Cross-wiki notifications didn't exist either.
> Banners? Doesn't make sense. Pre-elasticsearch too so no conflicts.

^Am I seeing a volunteer to take ownership of this task (and its
potential problems)?

-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] pt.wikimedia.org - database naming

2016-03-19 Thread Jaime Crespo
On Thu, Mar 17, 2016 at 9:15 PM, Alchimista  wrote:
>   Llegoktm already pointed that there are no entries on CentralAuth, since
> the wiki was shut down on 2011 / 2012, is there any other places to check?

Content of every row of every other database that references
ptwikimedia through links, notifications, file usage, content
reference, global permissions or banner, or common content
repositories, including tables that are created or modified by any
mediawiki extension that exists or existed at some point on the
Wikimedia infrastructure and that it wasn't properly cleaned up on
uninstall.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] pt.wikimedia.org - database naming

2016-03-19 Thread Jaime Crespo
On Fri, Mar 18, 2016 at 3:00 PM, Chad <innocentkil...@gmail.com> wrote:
> Well looks like nobody else is going to do it, so sure.
>
> -Chad

It is always nice to see deployment issues being handled by deployment people.

-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] pt.wikimedia.org - database naming

2016-02-24 Thread Jaime Crespo
> I would suggest
> to rename on WMF cluster the ptwikimedia DB to something like
> ptwikimedia_old

If you want this, I hope the users will have the patience to wait for
when the renaming wikis bug will be solved in 2034.

Or, you know, we can create *now* a new wiki with a new name that most
users will not notice (it will not affect the domain name,
pt.wikimedia.org).

On Wed, Feb 24, 2016 at 3:18 PM, Antoine Musso <hashar+...@free.fr> wrote:
> Le 24/02/2016 14:56, Waldir Pimenta a écrit :
>> Assuming there's no easy way to merge the databases, we are fine with
>> dropping the old db. I believe most content was imported to the current
>> wiki at the time of the migration, see
>> https://phabricator.wikimedia.org/T25537. An xml dump was used, not an SQL
>> one, so I suppose stuff like logs may not have been preserved, but in any
>> case it's not critical that we preserve all that historical info. I mean,
>> it would certainly be nice, but we can live without it.
>>
>> Or we could use the pt2wikimedia, as that would allow future archeologists
>> to recover the data from the beginning of the wiki :) Either option is fine.
>
> Hello,
>
> The ptwikimedia database being from 2012, I don't think there is much
> point in attempting to upgrade its schema, much less attempting to merge
> in the external db in.
>
> Given most of the useful data/history has been exported, I would suggest
> to rename on WMF cluster the ptwikimedia DB to something like
> ptwikimedia_old or even just archive a dump of it and drop it.
>
> Then create a new ptwikimedia and import the external db there.
>
>
> I am an idealist, but I am afraid introducing a scheme like
> 2 is asking for various exceptions to be added in
> various code and will cause a nasty technical debt down the road.
>
> My 0.02 €
>
> --
> Antoine "hashar" Musso
>
>
> _______
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Scheduled database maintenance (9 Feb)

2016-02-09 Thread Jaime Crespo
Between 23:00 and 23:59 UTC, February 9th 2016 there is a scheduled
maintenance window that will affect some of the wikis hosted by the
Wikimedia Foundation. The maintenance is needed in order to perform
necessary hardware, operating system and database upgrades. During the
upgrade, content on affected wikis will be available at all times, but
edits may fail during approximately 5 minutes within that schedule
(these wikis will be in "read only mode"). The following wikis will be
affected:

bg.wikipedia.org
bg.wiktionary.org
cs.wikipedia.org
en.wikiquote.org
en.wiktionary.org
eo.wikipedia.org
fi.wikipedia.org
id.wikipedia.org
it.wikipedia.org
nl.wikipedia.org
no.wikipedia.org
pl.wikipedia.org
pt.wikipedia.org
sv.wikipedia.org
th.wikipedia.org
tr.wikipedia.org
zh.wikipedia.org

All other wikis will *not* be affected by this maintenance.

I apologize in advance for this disruption and will try to minimize
the duration of the maintenance work.

I will update 
<https://wikitech.wikimedia.org/wiki/Planned_Maintenance-February_9_2016>
after the maintenance has finished.
-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Scheduled database maintenance (9 Feb)

2016-02-09 Thread Jaime Crespo
Upgrade seems to have gone as well. Read-only mode was enabled for a
bit longer than I initially expected: approximately between 23:16 and
23:30, but all within the scheduled maintenance window. Most of the
time was spent monitoring that the switchover was done correctly, and
that traffic was not affected after the fact.

s2 master is now running a modern operating system version (Debian
Jessie) and the latest version of MariaDB 10 on a new host, and no
replication lag or major issues where detected after the failover. I
will continue monitoring that shard and wikis for stability and
performance.

If you find any further issues, please use Phabricator[0] to
communicate this to us.

[0] <https://phabricator.wikimedia.org/>

On Tue, Feb 9, 2016 at 8:40 PM, Jaime Crespo <jcre...@wikimedia.org> wrote:
> Between 23:00 and 23:59 UTC, February 9th 2016 there is a scheduled
> maintenance window that will affect some of the wikis hosted by the
> Wikimedia Foundation. The maintenance is needed in order to perform
> necessary hardware, operating system and database upgrades. During the
> upgrade, content on affected wikis will be available at all times, but
> edits may fail during approximately 5 minutes within that schedule
> (these wikis will be in "read only mode"). The following wikis will be
> affected:
>
> bg.wikipedia.org
> bg.wiktionary.org
> cs.wikipedia.org
> en.wikiquote.org
> en.wiktionary.org
> eo.wikipedia.org
> fi.wikipedia.org
> id.wikipedia.org
> it.wikipedia.org
> nl.wikipedia.org
> no.wikipedia.org
> pl.wikipedia.org
> pt.wikipedia.org
> sv.wikipedia.org
> th.wikipedia.org
> tr.wikipedia.org
> zh.wikipedia.org
>
> All other wikis will *not* be affected by this maintenance.
>
> I apologize in advance for this disruption and will try to minimize
> the duration of the maintenance work.
>
> I will update 
> <https://wikitech.wikimedia.org/wiki/Planned_Maintenance-February_9_2016>
> after the maintenance has finished.
> --
> Jaime Crespo
> <http://wikimedia.org>



-- 
Jaime Crespo
<http://wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] git.wikimedia.org down?

2015-11-27 Thread Jaime Crespo
Please see my comments on: <
https://phabricator.wikimedia.org/T119701#1834962>

And the related ticket: <https://phabricator.wikimedia.org/T83702>

On Fri, Nov 27, 2015 at 11:06 AM, planetenxin <planeten...@web.de> wrote:

> Since yesterday we could not reach git.wikimedia.org any more.
>
> A call like:
>
>
> https://git.wikimedia.org/zip/?r=mediawiki/extensions/AdminLinks.git=2619ed9beede0017f50ed08b20f6ea3a5200a838=gz
>
> fails with a timeout.
>
> Ping is working.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] git.wikimedia.org down?

2015-11-27 Thread Jaime Crespo
On Fri, Nov 27, 2015 at 12:34 PM, planetenxin  wrote:

> ... okay, but how to download a specific commit ID as tar.gz from
> Phabricator Diffusion like shown in my example?
> 
>

I cannot answer that, so I would recommend commenting what you need on
T83702 or T111465.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Development policy around database/SQL use

2015-09-15 Thread Jaime Crespo
Rob,

Please use https://phabricator.wikimedia.org/T112637 as the RFC and lets
revert the ticket to the original scope.

On Tue, Sep 15, 2015 at 8:00 AM, Rob Lanphier <ro...@wikimedia.org> wrote:

> Hi folks,
>
> Executive summary:
> T108255 is the default option for our Wednesday RfC review (E66)[0].
> As part of improving our database use, we need to start gating our
> code review on better shared norms of SQL correctness.  We need to
> enable strict mode, cleanup/enforce primary keys (T17441), and start
> using row-based replication (T109179).  Let's talk about this on
> Wednesday.
>
> Details:
> We're still not 100% decided what our topic for this week's RfC review
> meeting will be, but I'm leaning pretty heavily toward T108255.  Jaime
> Crespo (Jynus) asked me about it last week, which inspired me to turn
> T108255 into an RfC.  After he cleared up my writeup, I think there's
> something for us to talk about.
>
> In particular, I originally thought this was merely about enabling
> MariaDB's strict mode, and all of the rainbows and unicorns that would
> result from that.  Jaime corrected me, pointing out that there is
> other database related cleanup we would need to do to get the benefits
> of this.
>
> So, as of this writing, T108255 by title still appears to be about
> merely enabling strict mode.  It's tempting to split this ticket into
> two tickets:
> 1.  RfC: Write/enforce SQL correctness guidelines
> 2.  Enable MariaDB/MySQL's Strict Mode
>
> I may make a separate ticket tomorrow unless someone convinces me that
> kittens will die as a result.[1]
>
> Regarding SQL correctness guidelines, we have a mess of stuff on
> mediawiki.org, which doesn't seem to be very discoverable, and also
> doesn't seem to have any teeth to it.  We have a modest number of
> pages marked as "MediaWiki development policies"[2], but of the 5
> pages that were there, only 1 of them was about specifically about
> databases, which is weakly called [[Database optimization]][3].  Since
> [[Database optimization]] didn't seem to have gotten the review that
> [[Security for developers]] or [[Gerrit/+2]] had, I changed its status
> to "{{draft}}"
>
> We *do* have something that actually looks more policy-like, which is
> the "#Database patches" section of the [[Development policy]] page[4]
>  However, it's not clear that the "Development policy" page gets read,
> and has gotten pretty crufty.  It's tempting to put "{{draft}}" on
> that one too.
>
> It seems there are a number of sources we could/should be pulling from
> to make a database development policy[5]  T108255 (or some
> database-related RfC) should be about pulling all of these together
> into a coherent set of guidelines.  These guidelines should be
> well-known to frequent committers, and should be well-written for a
> beginning developer.
>
> What we need to actually *do* is not merely enable strict mode, but
> also cleanup/enforce primary keys (T17441), and start using row-based
> replication (T109179). Before completing all of this, we need our code
> review gated on actually making this work.
>
> The fact that we have a mess of documentation and norms is the reason
> why I'm leaning toward this topic for the E66 meeting this week.  If
> you believe we should talk about this, please participate at T108255
> and help get this as far along as possible so that we can wrap things
> up at the E66 meeting,  If you believe we should be talking about
> something else in our IRC meeting, please say so in E66 on Phab.
>
> Rob
>
> [0]  IRC meeting:
> <https://phabricator.wikimedia.org/E66>
> "RfC: Enable MariaDB/MySQL's Strict Mode"
> <https://phabricator.wikimedia.org/T108255>
>
> [1]  if someone decides to jfdi, I would recommend using T108255 for
> the "Write/enforce SQL correctness guidelines" RfC, and make a new
> ticket for the less important "Enable MariaDB/MySQL's Strict Mode".
> The comments on the ticket seem to relate more to the former than the
> latter, and the subscribers will probably be more interested in the
> former.
>
> [2]
> <https://www.mediawiki.org/wiki/Category:MediaWiki_development_policies>
>
> [3] <https://www.mediawiki.org/wiki/Database_optimization>
>
> [4] <https://www.mediawiki.org/wiki/Development_policy#Database_patches>
>
> [5] Other database-related guidance for developers:
> <https://www.mediawiki.org/wiki/Performance_guidelines>
> <https://www.mediawiki.org/wiki/Manual:Coding_conventions/Database>
> <https://www.mediawiki.org/wiki/Architecture_guidelines>
> <https://wikitech.wikimedia.org/wiki/Schema_changes>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Jaime Crespo
<http://wikimedia.org>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] renaming Wikimedia domains

2015-08-26 Thread Jaime Crespo
(this is not an official response, just my opinion after some research on
the topic)

Due to internal (and growing) complexity of the mediawiki software, and WMF
installation (regarding numerous plugins and services/servers), this is a
non trivial task. It also involves many moving pieces and many people-
network admins (dns), general operations (load control/downtime), dbas
(import/export), services, deployment engineers and developers (mediawiki
configuration changes, patches).

What's worse, is that it would almost certainly create downtime for the
wikis involved (not being able to edit) -specially given that it is not a
common operation-, and some of them are smaller communities, and I would be
worried be to annoy or discourage editing on those wikis (when we want the
opposite!).

It would be great to have someone in contact with the community so that we
can identify which sites have a great consensus about renaming the wiki,
and are perfectly informed about the potential problems and still are ok to
go forward. Maybe someone in Community engagement can evaluate risks vs.
return?

On Wed, Aug 26, 2015 at 9:53 AM, Antoine Musso hashar+...@free.fr wrote:

 Le 26/08/2015 07:20, Amir E. Aharoni a écrit :
  In the past when requests to rename such domains were raised, the usual
  replies were along the lines of it's impossible or it's not worth the
  technical effort, but I don't know the details.
 
  Is this still correct in 2015?

 As pointed out: https://phabricator.wikimedia.org/T21986

 For what it is worth, in 2011 JeLuF wrote a list of actions needed to
 rename a wiki.  It is outdated nowadays but that is sufficient to state
 renaming a wiki is a non-trivial task:
 https://wikitech.wikimedia.org/wiki/Rename_a_wiki

 It would surely consume a lot of engineering time to come up with a
 proper migration plan and actually conduct them.  I am not sure it is
 worth the time and money unfortunately.

 --
 Antoine hashar Musso


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Jaime Crespo
http://wikimedia.org
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] renaming Wikimedia domains

2015-08-26 Thread Jaime Crespo
On Wed, Aug 26, 2015 at 3:46 PM, Alex Monk kren...@gmail.com wrote:

 I'm not sure why database changes would be involved in a domain-only
 change? It should be simple enough to get the new domain set up in DNS and
 apache config, tell multiversion how to map it to the old DB name (there's
 an array in setSiteInfoForWiki that does this bit), and once it's all


Not a developer, but AFAIK It requires a patch, that is proposed (by Reedy,
I think), but not implemented. And there are other blockers like non-main
DB dependencies on FlowDB, external storage and X1 plugins, plus some names
inside the rows. But I am not the mediawiki expert. Feel free to contribute
to the above mentioned tickets.

-- 
Jaime Crespo
http://wikimedia.org
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] renaming Wikimedia domains

2015-08-26 Thread Jaime Crespo
On Wed, Aug 26, 2015 at 3:19 PM, MZMcBride z...@mzmcbride.com wrote:

 That said, creating a wiki is actually fairly easy, so maybe we should
 investigate a large-scale export and import process instead of renaming.


My answer already assumed that as the only way, hence the downtime. :-)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] http://dumps.wikimedia.org went off

2015-08-03 Thread Jaime Crespo
It should be up now.

On Mon, Aug 3, 2015 at 5:21 PM, Bináris wikipo...@gmail.com wrote:

 About 10 minutes ago it became unrechable.

 --
 Bináris
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Jaime Crespo
http://wikimedia.org
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l