Re: [Wikitech-l] Thank you Tuesday
Hi! > Any day can be Tuesday if you really want. Thanks to Antoine and Brad for figuring the hhvm crash in https://phabricator.wikimedia.org/T216689. Finding the cause of random crashes can be very hard and frustrating, but they figured it out and resolved it quickly. Thanks! -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Question to WMF: Backlog on bugs
Hi! > Yes, there should always be a response to all bugs. Without a response > the impression in the reporting wiki-community would be "nobody cares > about our bug reports". Would a canned "thank you for your feedback, please stay on the line, your call is very important to us" response make anybody feel better? The reality of a project with huge userbase and limited resources is that there are more bugs that can be addressed seriously and substantially, not with a canned response that does not solve the issue, than there's developer resource. It doesn't mean "nobody cares about the bug reports" - it means some bug reports will be cared for first and some later (and possibly some, unfortunately, never). This set of priorities can be influenced by alerting developer's attention about specific bugs needing addressing, and by existing prioritisation processes, which very much include community input, but the harsh reality of having a lot of bugs dictates that giving serious non-canned attention leading to satisfactory outcome to 100% of them is IMHO not realistic. We could of course institute the policy of "every bug should have a comment from a developer within X time" - but unless X is very large, I think it will be unsatisfactory, since getting "yes, it's a very important bug, thanks for submitting it" comment without the bug being fixed is IMHO no better than getting no comment at all. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Question to WMF: Backlog on bugs
Hi! > 1. My impression is that there's agreement that there is a huge backlog. Obviously, there is a backlog. As for it being "huge", it's subjective, for someone who has experience with long-running projects, having thousands of issues in the bug tracker is nothing out of the ordinary. Does it make the backlog "huge"? I don't know, depends of what is "huge". > 2. I think that there's consensus that the backlog is a problem. I'm not sure where such a consensus came from. Of course, bugs not being resolved immediately is not the ideal situation - ideally, the bug would be fixed within hours of submitting it. Nobody thinks it can really happen. Any large popular long-running project has more bugs and wishlist items than it can implement. There are always more users than developers and more ideas than time. Thus, the backlog. Of course, ignoring the backlog completely would be the problem, but I don't think we have this situation. It's likely we could do better. But I think we know the backlog exists, and its existence by itself is not a problem, or at least not a problem that can be realistically solved for such a huge project. > Regarding my own opinions only, I personally am frustrated regarding > multiple issues: > > a. that there's been discussion for years about technical debt, I'm not sure why it's the source of frustration for you. Having discussion about technical debt is great. Of course, it should also lead to fixing the said debt - which I think is happening. But expecting that starting some magic date we stop having technical debt or the need to address it as realistic as deciding starting today our code won't have bugs anymore. > b. that WMF's payroll continues to grow, and while I think that more > features are getting developed, the backlog seems to be continuing to grow, Of course. How it could be any other way? With more features, come more places that can have bugs (you don't expect WMF to be the only software developing organization in the Multiverse that writes code without bugs?) and once people start using them, they inevitably ask for improvements and have ideas on how to extend it, thus adding more tasks. Expecting that more functionality would lead to less issues in the bug tracker is contrary to what I have experienced over my whole career - it never happened, unless the project is effectively dead and the users have moved away. > f. that I think that some of what WMF does is good and I want to support > those activities, but there are other actions and inactions of WMF that I > don't understand or with which I disagree. Conflicts can be time consuming> > and frustrating for me personally, and my guess is that others might feel > the same, including some people in WMF. I don't know how to solve this. I I don't think it's possible to "solve" this. Unless you are appointed the Supreme Dictator of WMF, there always would be things that WMF does and you disagree. And so would be the case for every other person who cares about what WMF does. We just need to find things that we can do that a lot of people can use and not too many people disagree, but there's no way to guarantee you won't disagree with anything (for any value of "you"). I think we already have processes for finding this kind of kinda-consensus-even-though-not-completely. As all processes, they are not ideal. They can be improved with specific suggestions. But expecting that nobody (and any specific person in particular) would ever think "WMF is totally wrong in doing this!" is not realistic. Reasonable people can and do disagree. Reasonable people also can work through disagreements and find common interests and ways to contribute to mutual benefit. I think that's what we're trying to do here. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Question to WMF: Backlog on bugs
Hi! > In my experience WMF teams usually have a way to distinguish "bugs we're > going to work on soon" and "bugs we're not planning to work on, but we'd > accept patches". This is usually public in Phabricator, but not really > documented. There's the "Need volunteer" tag that I think can be used for that. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Unstable search results - possible causes
Hi! > I'm using the built-in Mediawiki search engine. We just updated from 1.30.0 > to 1.31.0. Since the update, search results are unstable. The same search > term gives different results in different web browsers. We also see different > results across browser sessions. Any advice on how I can troubleshoot this? Could you provide more info - which search you're talking about (URL might help, or description of the sequence of actions), which terms, what you're getting in different cases? Some search results might depend on your preferences or user language selected, so there might be difference, e.g. for different logged-in users. Thanks, -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Thank you Thursday
Hi! > I wanted to say thank you this week to Zoranzoki21 and Matěj Suchánek > for jumping on the phan migration and helping with upgrading > extensions to take advantage of newer static analysis tools. I in turn would like to thank Legoktm for working on this phan upgrade. I've already found three bugs we totally missed in our code, just by upgrading to it. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Code sniff for verbose conditionals
Hi! > I was saying, you can find several examples of wrong and correct code at > [1]. > Thiemo pointed out that this patch could encourage to write shorter and > less readable code (you can find his rationale in Gerrit comments), and I > partly agree with him. My proposal is to only trigger the sniff if the "if" > conditions are "short" enough. Which could mean either a single condition, > shorter than xxx characters etc. > We're looking for some feedback to know whether you would deem this feature > useful, and how to tweak it in order to avoid encouraging bad code. > Thanks to you all, and again - sorry for the half message. This looks useful. I think PHPStorm has this check built in, but having it in sniffs too wouldn't be a bad thing. I've seen such things happen when refactoring code. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] What would you like to see in gerrit?
Hi! One thing still missing for me is better ability to indicate which kind of attention the item needs from me. Right now, to improve it, I used custom scripts with this colorization scheme: - paint items with test failures and -1's red (these need work from me if they're outgoing, and probably don't require my attention if incoming) - paint items with +1/+2 green - those are ready to merge or being merged if outgoing, and already reviewed by somebody in incoming - paint items with merge conflict in different color - these need to be rebased or fixed before any further action This enables me to scan quickly through the dashboard and identify the picture of what's up. But this script unfortunately is not working very reliably with PolyGerrit, due to very complex UI model there. Another thing that I'd like is being able to fold "Work in progress" panel. In about 90% of my gerrit use it's not relevant, but it takes valuable real estate on the screen. I need it from time to time, but I'd rather keep it folded until I do. Also, if there was some way of sorting incoming patches by whether they have updates that aren't mine or not, that'd be useful. This is now provided by "bold" font, but while you can see it, you can't order by it, afaik. Couple of things for git review command too: It'd be nice if git review told me when I am trying to make a patch in master branch instead of a feature branch. In 99% of cases, this is not what I actually wanted, I just forgot to create a branch. One useful command for me would be "check out a change and put it in a branch named after topic of that change, overriding it if it existed". This allows easy syncing of patches where more than one person contributes to them. Thanks, -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Gerrit now automatically adds reviewers
Hi! > On Fri, Jan 18, 2019 at 10:13 PM Pine W wrote: > >> I'm glad that this problematic change to communications was reverted. >> >> I would like to suggest that this is the type of change that, when being >> planned, should get a design review from a third party before coding >> starts, should go through at least one RFC before coding starts, and be I think there's no reason to make it bigger deal than it is. There was a feature that people thought would be good, turned out it is not as good as expected, so it was disabled. Nothing broken, nobody hurt (well, maybe except some inboxes...). I don't think there's a reason to describe this situation as "inexcusable". Sometimes something that we expect to work one way and be useful turns out different way and things that seemed to be excellent idea turn out to be very bad one. And we have to adjust, and this is normal. We don't like such situations, but we know they happen, and as long as they are handled properly - and I think in this case it was - there's no reason to make it more than it is. >> reasonable likelihood of costing an administrator their tools, and I hope >> that a similar degree of accountability is enforced in the engineering I hope not. Expecting unreasonable perfection from people and processes and overreacting when inevitable problems happen will only lead to frustration, failure and stagnation. Every failure has some lesson to learn, but the lesson shouldn't be "let's find somebody to punish". I am not sure if that was the intent, but it kinda felt this way to me. And I don't think this is warranted neither in general nor in particular case. Thanks, -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Collecting UI feedback for PolyGerrit - Gerrit
Hi! Some tweaks that I found useful to do for myself in new UI (some can be done custom styles/scripts etc.) and that might be interesting to implement if possible: 1. Color coding of changes on the dashboard: - one with +1 gets green - one with -1/-2 gets red - one that has merge conflict gets its own distinct color This allows to quickly see which items can be reviewed/merged, which ones need work, which need rebase, etc. 2. On dashboard, Owner column gets too wide sometimes. Some names are pretty long and this space would be best used by Subject - which I do want to see in its entirety, unlike the name for which the prefix is almost always OK. In fact, if we could also compress "Status" column somehow it'd be even nicer - "Merge Conflict" message is useful but takes way too much precious space. 3. In change view, sometimes the "related changes" box consumes too much space, you can see it on https://phabricator.wikimedia.org/F26296242 - it takes almost half horizontal space, despite being way less important than the patch description. It'd be nice to put a constraint on it 4. It would be nice to be able to add people to WIP tasks without moving it to Review status. Sometimes several people may need to cooperate on WIP patch before it is ready to go. Of course, one can add reviwer and then move back to WIP, but it'd be nicer to avoid extra actions. Thanks! -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] problematic use of "Declined" in Phabricator
Hi! > I think we are misusing the term "priority" here. Priority for whom? For whoever is responsible for the planning. Which in most cases is the WMF team that is tagged, though if it's a project that belongs to another team (or person), then it's this team's (or person's) planning. > Setting something to "lowest" priority implies that users do not care about > the item. No, I don't think this is what it means. It should mean the planning team does not have any immediate plans or resources to dedicate to it. That's the consequence of using Phabricator as development tracking tool. It's developer's priorities - which are supposed to mirror users' ones, to a reasonable degree, but are not the same thing. > I suggest we use dashboard columns for the planning activities, while > keeping the tasks themselves fully under "requester's" control. E.g. let I don't think it would help developers' work. If we need a mechanism to track user's priorities in Phabricator, we should create one, but I don't think breaking existing and used mechanism for tracking development priorities is a good way to achieve that. Columns *are* used for planning, but in a different way. > the community decide what is more important, but use dashboards for team > work planning. This way, if a volunteer developer wants to contribute, > they would look for the highest-demanded bugs that don't have active status > in any teams. I recognize that highlighting issues that volunteers should concentrate on is a valid need. But I don't think reusing the same mechanism as ongoing development tracking is using now is going to be good. It may get very confusing. We should try to find other way to specify that. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] problematic use of "Declined" in Phabricator
Hi! > All of which does raise a slightly different question: I am much less > clear on what the exact difference is between “Invalid” and > “Declined.” Thoughts? I usually use Invalid where the description of the task does not really describe the problem (or a problem) - e.g. it was a user error, misconfiguration, misunderstanding about how it should have worked, transient condition that has passed and we can not do anything about it, something that is outside of realm of possibilities for us to do (e.g. "fix copyright laws"), a task created by mistake, etc. For TODO tasks, Invalid would be when the task is describing something that can not be done at all (at least by us), or would not produce any desirable result. Declined is when it describes a valid task in general, but we are not going to do it because of reasons. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] problematic use of "Declined" in Phabricator
Hi! On 10/2/18 9:57 AM, Brion Vibber wrote: > *nods* I think the root problem is that the phabricator task system does > double duty as both an *issue reporting system* for users and a *task > tracker* for devs. This is probably the real root cause. But I don't think we are going to make the split anytime soon, even if we wanted to (which is not certain), and this mode of operation is very common in many organizations. Realizing this, I think we need some mode of explicitly saying "we do not have any means to do it now or in near-term future, but we don't reject it completely and if we ever have resources or ways to do this, we might revisit this". We kinda sorta have this with "Stalled" status and "Need volunteer" tag, but we might want to get this status more prominent and distinguish "TODO" items outside of any planning cycle and the ones that are part of the ongoing development. And document it in the lifecycle document. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RFC Discussion Wednesday - new namespace for collaborative judgments about wiki entities
Hi! > Hi, thanks for pointing this out! Here are the workflows we've identified > so far, and how JADE might affect them in the long-term: On a cursory look, there's also some affinity between this and what Wikibase Quality Constraints extension is doing. Though this data is not human generated but automatic, but still this is quality-related data which is linked to page content (and different for each revision AFAIU). Currently if I understand right it has its own storage. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] My Phabricator account has been disabled
Hi! > This isn't a he said, she said > <https://en.wiktionary.org/wiki/he_said,_she_said> type of issue, it's > based on evidence that is public and difficult (if not impossible) to > delete. In this particular case, it was based on a comment that was deleted. And of course most content in our technical spaces (those managed by WMF, not sure about Github and such) is deletable by admins. > If you feel that you would have to defend your behavior, perhaps the > behavior ought to be self-examined. This sounds suspiciously like "if you did nothing wrong, you have nothing to hide". Which I hope everybody knows is not how it works - after all, that's why we have our privacy policies - so I assume it was not the intended meaning. We can have disagreement, and we can make mistakes, and this is why good process is important. Saying "if you're worried about good process, maybe it's because you're guilty" - that's how this comment sounded to me - is not right. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] My Phabricator account has been disabled
Hi! > to me that this could easily be used as a shaming and blaming list. If the > block is over and the person wants to change their behavior, it might be > hard for them to start with a clean sheet if we keep a backlog public of > everyone. I'd see it not only as a privacy issue for the people reporting, > but also the reported. You have a good point here. Maybe it should not be permanent, but should expire after the ban is lifted. I can see how that could be better (though nothing that was ever public is completely forgotten, but still not carrying it around in our spaces might be good). So I'd say public record while the ban is active is a must, but after that expunging the record is fine. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Fwd: My Phabricator account has been disabled
Hi! On 8/8/18 1:58 PM, bawolff wrote: > On Wed, Aug 8, 2018 at 8:29 PM, Amir Ladsgroup wrote: > [...] >> 2) the duration of block which is for one week was determined and >> communicated in the email. You can check the email as it's public now. > > Can you be more specific? I'm not sure I see where this is public. I think it's this one: https://lists.wikimedia.org/pipermail/wikitech-l/2018-August/090490.html -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] My Phabricator account has been disabled
Hi! > 3) not being able to discuss cases clearly also bothers me too as I can't > clarify points. But these secrecy is there for a reason. We have cases of > sexual harassment in Wikimedia events, do you want us to communicate those > too? And if not, where and who supposed to draw the line between public and > non-public cases? I'm very much for more transparency but if we don't iron > things out before implementing them, it will end up as a disaster. True enough, and I agree we should be careful, and I think we can trust our CoCC to be careful in such matters, we trust them with the cases themselves after all. But with all due care, I think we can find the way to reveal the admin action was taken and why, without going into sensitive details. Even some detail would be better than what we have now, and in a case of a bad comment saying "This user has been temp. banned from date A till date B because of comments incompatible with CoC" doesn't seem to hurt anyone. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] My Phabricator account has been disabled
Hi! > This mailing list is not an appropriate forum for airing your grievances > with the way the Code of Conduct Committee has handled this matter. Very well may be so, but I think this case has something that is, IMHO, very on-topic for this mailing list, as a venue to discuss running this technical project. I think regardless of the merits of the particular CoCC decision, there's something wrong in how it happened. Namely: 1. The account was disabled without any indication (except the email to the person owning it, which is also rather easy to miss - not the admin's fault, but read on) of what and why happened, as far as I could see. Note that Phabricator is a collaborative space, and disabling an account may influence everybody who may have been working with the person, and even everybody that working on a ticket that this person commented once. If they submitted a bug and I want to verify with them and the account is disabled - what do I do? People are left guessing - did something happen? Did his user leave the project? Was it something they said? Something I said? Some bug? Admin action? What is going on? There's no explanation, there's no permanent public record, and no way to figure out what it is. What I would propose to improve this is on each such action, to have permanent public record, in a known place, that specifies: a. What action it was (ban, temporary ban - with duration, etc.) b. Who decided on that action and who implemented it, the latter - to be sure if somebody thinks it's a bug or mistake, they can ask "did you really mean to ban X" instead of being in unpleasant and potentially embarrassing position of trying to guess what happened with no information. c. Why this action was taken - if sensitive details involved, omitting them, but providing enough context to understand what happened, e.g. "Banned X for repeated comments in conflict with CoC, which we had to delete, e.g. [link], [link] and [link]" or "Permanently banned Y for conduct unwelcome in Wikimedia spaces", if revealing any more details would hurt people. It doesn't have to be 100% detail, but it has to be something more that people quietly disappearing. Establishing such a place and maintaining this record should be one of the things that CoCC does. 2. There seems to be no clearly defined venue to discuss and form consensus about such actions. As it must be clear now, such venue is required, and if it is not provided, the first venue that looks suitable for it will be roped in. To much annoyance of the people that wanted to use that venue for other things. I would propose to fix it by providing such venue, and clearly specifying it in the same place where the action is described, as per above. Again, establishing and advertising such place should be something that CoCC does. It is clear to me - and I think to anybody seeing the volume of discussion this generated - that we need improvement here. We can do better and we should. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Phabricator monthly statistics - 2018-07
Hi! >> and "high"? The reason that I ask is that a median age of 738 days for >> "high" priority tasks seems very long. I would hope that we would not take >> two years to complete "high" priority tasks. >> > > The median age of open priority X tasks is not the same as the median time > it takes to complete priority X tasks. Yes, it looks more the case of "we thought it's a high priority task but turned out it's not" rather than "we take a long time to do high priority tasks". I.e. maybe we need to have some rules around removing tasks from "High" if it's clear we're not doing it anytime soon. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Using mwapi:srsearch for composite strings like "goat cheese" in WDQS
Hi! > in the query examples at > https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI#Find_all_entities_with_labels_%22cheese%22_and_get_their_types > , > the way to search for "cheese" is > > bd:serviceParam mwapi:search "cheese" . > > > What if the search string I want to use is a composite, e.g. "goat cheese"? So, that depends on what you're looking for. This specific endpoint (EntitySearch) is wbsearchentities API (https://www.mediawiki.org/wiki/Wikibase/API#wbsearchentities), i.e. prefix search. I don't think it has an option for "exact match", however it would match only the prefix, i.e. it won't match "This cheese was made from goat milk". However, if you want to use full-text search instead, you may need to use phrase search to capture only exact phrase. In that case you probably should write the parameter as: bd:serviceParam mwapi:search "\"goat cheese\"" . Probably something like this: http://tinyurl.com/y9fdva5w -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Help remove unused wmf-config code
Hi! > I wrote a script to detect unused wmf-config code. [1] Specifically, unused > settings and conditional blocks for configuration that no longer exists in > MediaWiki. > > The script cross-references any wg- identifiers in wmf-config with > Legoktm's Codesearch tool [2] and reports those that have no matches > outside wmf-config. [1] I checked the CirrusSearch ones and most are indeed unused, except that this one: CirrusSearchCrossProjectBlockScoreProfiles - seems to have a mention as CirrusSearchCrossProjectBlockScorerProfiles - I wonder if it's a typo. Generally CirrusSearch uses some configs without 'wg' prefix, but in this case it doesn't seem to be an issue. > Open for review: > https://gerrit.wikimedia.org/r/#/q/project:operations/mediawiki-config+topic:cleanup+is:open This one produces 404. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FW: Warning on behalf of Code of conduct committee
Hi! > Wouldn't it be easy to just log out and read any task (or even use > incognito mode/private browsing in the browser)? It is certainly a small > inconvenience, but I am not sure how it is very important, given a very > simple workaround. > > > Sure, but you have to inform the user somehow about this. I think the easiest way would be to change the error message and add a pointer to a page which describes the issue and how to work around it. I imagine changing error message in phab shouldn't be too hard? -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FW: Warning on behalf of Code of conduct committee
Hi! > On the other hand, I also agree also with MZMcBride that new users should > be able to at least see the tasks, so I don't understand why the priority > of this bug was lowered. If unapproved users cannot be treated as logged > out, then there should be another solution. Like getting more information Wouldn't it be easy to just log out and read any task (or even use incognito mode/private browsing in the browser)? It is certainly a small inconvenience, but I am not sure how it is very important, given a very simple workaround. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Can/should my extensions be deleted from the Wikimedia Git repository?
Hi! > I agree. I do think that as a community of practice we have many > unwritten rules and numerous expectations of how we work together. We > don't explicitly define the expectation of a README.MD file in repos > either.[0] It's a best practice and cultural expectation in our spaces > to include one. The code works the same with or with out it. I think there's several aspects here to consider: a) In WMF technical spaces, and more widely, in the Mediawiki/Wikimedia universe, I think there's universally acknowledged expectation of certain standards of behavior, which in the Wikimedia space have been codified in the CoC. The purpose of these expectations, as I understand them, is to build and maintain an open, welcoming, productive and inspiring community that would support development of Mediawiki and Wikimedia projects. And the CoC is the instrument that we chose to codify and implement those expectations in the Wikimedia spaces, which applies to all of them regardless of technical means chosen to publish or document it. I do not think there is much disagreement about that. b) How exactly the spaces are managed within this wide framework has a lot of complex and tricky details. Some of which may seem trivial to some people and highly sensitive to other people. Including which files are placed in which repositories, who is allowed to change which repository and for which reasons and procedures, and so on. I think having more clear expectations on that would certainly help. But beyond that, I think when designing and enforcing the rules for these minute details, we should not lose the sight of why it is done, and not make the process of CoC enforcement go against the goal of having CoC - namely, the welcoming community. If that means sometimes being more flexible, or having a bit more patient discussion and resisting the urge to force your point through, even if I am completely sure I am correct, I think it is still worth it in the long run. c) Specifically about CoC.md file, I personally think having redundant pointers to the documentation (both technical and about societal norms) is highly welcome, as locating proper docs is notoriously hard and largely unsolved problem with most code. Having the docs is half of the problem (which we also sometimes fail at ;), having it where people would find them is the other half. So adding of the CoC file from this point of view is a smart move. On the other hand, maintaining a rigid "one size fits all, no exceptions, no discussions, shut up and comply" approach to it feels a bit counter-productive to me. Yes, I foresee the question "if it is a good thing, why not make everybody do it?" - and I could probably easily write a 20-page essay on this topic, but I would limit myself here to this - people have different points of view, and I think being more accommodating in this case is better than having a nice set of checkboxes checked. What it specifically means for the specific file? I admit I don't have a better proposal than "let's have a community discussion on it". But I think making an open and welcoming community including sometimes being patient in figuring out how exactly to do it. Enforcing having the file in every single repo does not seem to be a pressing concern that would do any harm if not brought into compliance right now. So let's see if we can reach some consensus here. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki and OpenID Connect
Hi! > Account verification was initially a feature for public figures and > accounts of public interest, individuals in music, acting, fashion, > government, politics, religion, journalism, media, sports, business > and other key interest areas. Account verification was introduced to > Twitter in June 2009, Google+ in August 2011, Facebook in February > 2012, Instagram in December 2014, and Pinterest in June 2015. Given how much controversy and unhealthy dynamics surrounds verified accounts on Twitter (not sure about Pinterest etc. - don't have any info there) I do not think it is a good idea to copy it. If there's an individual need to establish link between legal identity of somebody and their Wiki credentials, there are easy ways to do it - e.g. publish a signed message both on wiki user page under the account and on the resource known to be controlled by the person, etc. But I don't think special status like that would be very useful or very welcome. Also note linking one's legal identity to Wiki edits may be very dangerous in some countries. Requiring people to take that risk to edit certain pages is not really a good thing. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Ops] Changes to SWAT deployment policies, effective Monday April 30th
Hi! > The new policy asks the folks submitting patches to split up patches to > avoid bad intermediate states ahead of time. Thank you Tyler for the explanation! Which means this patch needs to be split into several patches? Giving the lower limit of the patches, this becomes kinda challenging - if this patch becomes, say, three commits, using figures from Gergo it is possible to apply it (without blocking whole SWAT window) only in ~27% of windows available. Given that many people can't use every window for timezone reasons, it may become tricky. I think we should reconsider how we do both of those things - if we're requiring splitting the patches, the limit should be considered differently - there's no point of performing the whole cycle of checks after merging each component of a multi-component patch, so maybe those should be counted differently. Though, of course, merging patches separately probably make it slower than before. If we additionally have limit of four - which is low even by current historic usage - I am concerned this will lead to long wait times and a backlog of patches which might then block other work. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Ops] Changes to SWAT deployment policies, effective Monday April 30th
Hi! > First, we now disallow multi-sync patch deployments. See T187761[0]. > This means that the sync order of files is determined by git commit > parent relationships (or Gerrit's "depends-on"). This is to prevent SWAT > deployers from accidentally syncing two patches in the wrong order. Question about this: if there's a patch that requires files to land in specific order, e.g. one that part of the config is moved into another file (example: https://gerrit.wikimedia.org/r/c/419367) is this handled automatically by scap (i.e. all changes in the same patch land at the same time atomically and scap takes care of nothing ever seeing the intermediate states) or has to be managed manually, and if so, how? Thanks, -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Introducing Quibble, a test runner for MediaWiki
Hi! > A second advantage, is one can exactly reproduce the build on a local > computer and even hack code for a fix up. This is great! CI errors not reproducible locally has been a huge annoyance and very hard to debug. Thanks for making it easier! -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] PHP7 expectations (zero-cost assertions)
Hi! > PHP7's expectations seem like they started fixing those issues, although > eval()-like use is still an option and exception-throwing seems to not be > the default. eval mode is deprecated in 7.2 which means that nobody should use it anymore. It's likely would not be deleted until the mythical 8.0 aka "version where the things can be broken" but for all practical purposes it should not be used by anybody now and is as dead as anything can be in 7.x. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] PHP7 expectations (zero-cost assertions)
Hi! > But I worry about the perf implications of these lines of code. I don't > want these assertions to be used to track errors in production mode. > > PHP7 introduced expectations which permit to have zero-cost assert() [1] > Looking at the MW codebase we don't seem to use assert frequently (only 26 > files [2] ). According to https://phabricator.wikimedia.org/T172165 MW 1.31 will require PHP 7 or HHVM. However, I am not sure how the situation with zero-cost asserts are in HHVM and therefore in WMF production. Since WMF production is running HHVM now (this may change in the future but looks like that's the status for now) we should not rely on something HHVM may not support. I would avoid assertions at all in performance-critical code paths. I.e. something that every microsecond counts. Example: code paths that were generating RDF dumps had to be hand-tuned to the point we rewrote RDF generation library, and there's still room for improvement there - and that's because each code path is hit 40M times when the dump for 40M entities is generated. However, most of the code in MediaWiki is not that intensely used and would be just fine with extra function call here and there. We can also note that using Wikimedia\Assert precludes using zero cost asserts in the future, even if we migrate to PHP 7, and the assertion done this way can not be disabled or turned into zero-cost assertions. Depending on what function these assertions serve in the code, this may be a good thing or bad thing. I think in our case it is a good thing - most parts of the code are not CPU-critical to the tune that extra fcall would change things, and we have tons of runtime errors anyway and have a reasonable system around them to handle when they happen. So using Wikimedia\Assert is likely the way to go for most of MW code. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Recursive Common Table expressions @ Wikimedia [was Fwd: [Wikimedia-l] What's making you happy this week? (Week of 18 February 2018)]
Hi! > None of these features are present on the minimum required versions of > Mediawiki, or the latest version available on WMF servers-- but I wonder if > people- Mediawiki hackers and Tools creators- would be interested on doing > those? It would be interesting to see how this can work in deepcategory searches - we now have a keyword for it (driven by SPARQL for now) and what would happen if it is ported to SQL. If we get it on labs db replicas, we could set up mediawiki so that we could test how good is it on real data. Thanks for posting about it! -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Proposal] Change terminology used for wikitech/LDAP/Toolforge/Labs accounts
Hi! > I proposed in Phabricator a wile ago [0] that we adopt the term > "Wikimedia developer account" across our wikis and other documentation > for the LDAP-backed user accounts that are created using > wikitech.wikimedia.org. Sounds good. It's sometimes rather confusing which set of credentials is used where, and having good terminology would help that. "Wikimedia developer account" looks like a good option for this. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Upstream is redesigning dashboard in gerrit - Any feedback?
Hi! > Hi, upstream are currently in the process of redesigning polygerrit to look > more like [1]. I tried it out for a couple of days, and in general I have the following (shared it a bit on IRC already but this would be more permanent I think): 1. The look is great and I like it. The following would be mostly complaining about various details, so I just want to say it so it wouldn't sound like I hated it :) 2. Please be aware however not everybody uses very wide screens with windows opened full-width. So design that would require 1900px wide windows to be useful is suboptimal. More points below on that. 3. Length of such things as usernames, branch names, commit topics, etc. can vary wildly, thus displaying them in table may blow up the table to very large dimensions, unless countermeasures are taken. One that I would suggest is to make columns fixed max width (configurable would be nice, but sane default would be ok) with title/tooltip on hover displaying the full data. Not sure how to solve it for mobile where there's no hover but I assume there are best practices on that already. 4. In current PolyGerrit design and it seems on the mockups the width of the single patch display window is divided between list of patch reviewers, etc. column, commit message and related patches. This leaves - especially in non-full-max-width windows - not a lot of space for the commit message to be displayed properly[1]. 5. Reply button is red. Usually red means something dangerous or erroneous or at least demanding full attention *now*. I think it should have some more neutral color. If any button should be red it's +2 one, but even that one probably shouldn't. But I really liked how +2 button was highlighted in old GUI, so I would like to keep it highlighted somehow. 6. On related patches display, the long line is abbreviated with ..., which is great. However, I'd like it to display full text on hover - sometimes the interesting part if towards the end, and right now to get to it you need to click through. 7. On related patches, now the status is displayed in words (like: Merged). While it is more clear than magic colored dots of the old UI, it also takes much more space. I wonder if some compromise can be made - more clear than colored dot but not taking 10 char spaces like the "(Merged)" string would. 8. There's a lot of vertical blank space if the list of the reviewers is long and commit message is short[2]. 9. Going back to +2 button, it is kind of in the middle of others, maybe it should be next to Reply? Or at least first in line - you certainly would be clicking it more than cherry pick or delete change? 10. I really liked how "gerrit be nice to me" extension allowed me to paint -1'ed items in the dashboard in red and +1'ed in green. Any chance to have it (and also a color for Merge Conflict ones)? I tried to figure out how to do it as chrome extension but so far Polymer has defied my efforts. 11. Speaking of Merge Conflict, all I have seen in that Status column - at least for pending patches - is empty and Merge Conflict (there's more for merged patches but how often do you look at those? Not too often). I wonder if that column - which now occupies prime real estate right in the middle of the screen - could be shortened somehow or converted into pictograms or anything like that? Or at least let me rearrange columns so I could put it on the less prominent place. 12. On https://gerrit-review.googlesource.com/ I see there's WIP badge on changes - like https://gerrit-review.googlesource.com/c/gerrit/+/159670/. Can we get the same thing please? I know I can do [WIP] in commit message, but this way is so much nicer :) 13. In old UI, I could edit the file in the patch right in place. Don't see it in the new one. Haven't used it *that* much but sometimes it's useful when you notice wrong whitespace or something and can fix it right there then. 14. All docs are for old UI as it seems. I wonder if there are docs for PolyGerrit UI? [1] https://www.awesomescreenshot.com/image/3174140/0fac7a4a591a59b768f1fe2eadaf66d0 [2] https://www.awesomescreenshot.com/image/3174139/bf6445a910aa63b2f26433e4d9171d95 Thanks, -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RevisionInsertComplete vs. RevisionRecordInserted
Hi! On 2/1/18 7:39 AM, Andrew Otto wrote: > This is the first I’ve heard of it! So, we don’t have a plan to change it, > but I suppose we should if RevisionInsertComplete is deprecated. I haven’t > looked at RevisionRecordInserted yet so I can’t answer questions about > schema changes, but I doubt it would change anything. I suspect it has to do with MCR work, but don't know the details. There might be a need to add some info in the revision for new MCR information, but not sure. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] RevisionInsertComplete vs. RevisionRecordInserted
Hi! I've noticed that RevisionInsertComplete hook is now deprecated in favor of RevisionRecordInserted. However, EventBus still uses RevisionInsertComplete. Is this going to change soon? If so, will the underlying event/topic change too? I couldn't find anything in Phabricator about this - is there plan to change it or still use old hook for now and foreseeable future? Thanks, -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Help Needed] MediaWiki-Vagrant: Debian Stretch testing and migration
Hi! > It would be great to have some more users testing it out to make sure > the roles that they need to work with MediaWiki-Vagrant day to day are > working there. Special thanks to Stas, Gilles, Hashar, and Željko for > the bug reports and patches that they have already supplied. Seems to work ok for me for all roles I've used, except for the issue in https://phabricator.wikimedia.org/T183306 which is of course still there. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Security question re password resets and Spectre
Hi! > So far so good. What I am wondering is whether that password reset trial is > actually even more dangerous now given Spectre / Meltdown? I think for those you need local code execution access? In which case, if somebody gained one on MW servers, they could just change your password I think. Spectre/Meltdown from what I read are local privilege escalation attacks (local user -> root or local user -> another local user) but I haven't heard anything about crossing the server access barrier. > (I probably should set up 2FA right now. Have been too lazy so far) Might be a good idea anyway :) -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Deprecation + phan issue
Hi! I've noticed a certain problem in our workflow that I'd like to discuss. From time to time, we deprecate certain APIs in core or extensions. The idea of deprecation is that we have gradual transition - existing code keeps working, and we gradually switch to the new API over time, instead of removing old API at one sweep and breaking all the existing code. This is the theory, and it is solid. Also, we have phan checks on the CI builds, which prevent us from using deprecated APIs. Right now if phan detects deprecated API, it fails the build. Now, combining this produces a problem - if somebody deprecates an API that I use anywhere in my extension (and some extensions can be rather big and use a lot of different APIs), all patches to that extension immediately have their builds broken - including all those patches that have nothing to do with the changed API. Of course, the easy way is just to add @suppress and phan shuts up - but that means, the deprecated function is completely off the radar, and nobody probably would remember to look at it again ever. I see several issues here with this workflow: 1. Deprecation breaks unrelated patches, so I have no choice but to shut it up if I want to continue my work. 2. It trains me that the reaction to phan issues should be to add @suppress - which is the opposite of what we want to happen. 3. The process kind of violates the spirit of what deprecation is about - existing code doesn't keep working without change, at least as far as the build is concerned, and constantly @suppress-ing phan diminishes the value of having those checks in the first place. I am not sure what would be the best way to solve this, so I'd like to hear some thoughts on this from people. Do you also think it is a problem or it's just me? What would be the best way to improve it? Thanks, -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Tracking internal uses of Wikidata Query Service
Hi! We are seeing more use of the Wikidata Query Service by Wikimedia projects. Which is excellent news, but somewhat worse news is that the maintainers of WDQS do not have a good idea what these services are, what they needs are and so on. So, we have decided we want to start tracking internal uses of Wikidata Query Service. To that point, if you run any functionality on Wikimedia sites (Wikipedias, Wikidata, etc., anything with wikimedia domain) that uses queries to the Wikidata Query Service, please go to: https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Usage and add your project there. That is both if your project runs queries by itself on the background, or if it uses queries as part of user interaction scenario. We do not include labs tools currently unless it is absolutely vital infrastructure (i.e. if it went down, would it substantially degrade the main site functionality or make some features unusable?) If you still feel we should know about certain lab tool, please leave a note on the talk page. What's in it for you? We want to know these in order to better understand the scope of internal usage and as preparation for T178492 (creating internal WDQS setup) - with the goal to provide internal users more robust and more flexible service. Also we want it to ensure we do not break anything important when we do maintenance, and we know who to talk to if some queries do not work as expected and we want to fix it. What we want to know? - We'd like to have general description of the functionality (i.e., what the service is for) - How to recognize queries run by it - user agent? source host? specific query pattern? some other mark? It is recommended that it would be possible to recognize - What kind of queries it runs (no need to list every possible one of course but if there are typical cases it'd help to see it)? - How often the queries run - if it's periodic, or what is expected/statistical usage of the tool if it's user driven tool? - Where could we see the code at the base of it and who maintains it? - Feel free to add any other information about anything you think would be useful for us to know. What was that page again? https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Usage Thanks in advance, -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] AdvancedSearch beta feature now on Mediawiki
Hi! > haha, awesome! > > thanks a lot :-) Confirming, looks great for me now. And congratulations to the team on the release of this excellent feature! -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] AdvancedSearch beta feature now on Mediawiki
Hi! > Hello, Birgit. Unfortunately, I can't open a phab ticket, because I'm on > mobile, and there is no way to upload a file to phabricator from mobile. > So, I'll answer you here. I'm on Lollipop, use internal browser, timeless > skin. I can't make a screenshot from this device, so I did it on another Likely there's a problem with timeless, see: https://phabricator.wikimedia.org/T181106 Vector works fine for me though. I guess that's why it's "beta" :) -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] need a URL encoding expert
Hi! > The main issue I'm not sure about here is the use of ; as a query string > initiator (rather than a query string parameter separator). This use of > semicolons is completely non-standard, AFAIK, but it looks like there are > some web servers that are actually using it this way. Comments at the pull > request itself would be most useful (rather than by email). Generally, the servers are free to parse the local part of the URL as they like. After all, many servers using REST treat something like /user/2/name as essentially query string, even though / is a path separator. Nothing prevents other servers from adopting the scheme of user;2;name instead or any other way of parsing the local path. https://tools.ietf.org/html/rfc3986#section-3.4 clearly states that query is delimited by "?". Which means the URLs with ";" are path components, as per RFC: Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. So, the specific application can treat path components the same way as query components, but they are still path components. My reading of the RFC also seems to be that ";" is a reserved character, and as such should not be URL-encoded. Indeed, path BNF includes sub-delims without encoding, which includes ";". However, I am not sure I understand other part of the patch where it plays with query string. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HHVM vs. Zend divergence
Hi! > It was a big contrast to my interactions with the PHP community, which > were so often negative. For example, Jani's toxic behaviour on the bug > tracker, closing bugs as "bogus" despite being serious and > reproducible, usually because he didn't understand them technically. > Even with other maintainers, I had to fight several times to keep > serious bugs open. I had no illusions that they would ever be fixed, I > just wanted them to be open for my reference and for the benefit of > anyone hitting the same issue. I filed bugs as "documentation issues", > requesting that undesired behaviour be documented in the manual, since > they were more likely to stay open that way. By your mention of Jani, I derive it was a long time ago :) Quite a lot changed since then, even though the internals list is still not the most friendly place on the nets. But the processes got better, I think, and the level of maturity of both discussion and participants increased. Also, thank you for the kind words, and I'll be glad to help with what I can if we need something done in PHP core. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HHVM vs. Zend divergence
Hi! > IMO the more interesting discussion to be had is how little we invest into > the technology our whole platform is based on. You'd think the largest > production user of PHP would pay at least one part-time PHP developer, or > try to represent itself in standards and roadmap discussions, but we do > not. Is that normal? I think this is a very good point. There are other ways to support PHP too. Like assisting in regular testing of upcoming versions. Helping writing the docs. Contributing to RFC discussions (we have a large codebase, heavily visited site, and a lot of experience dealing with it, surely there could be a thing or two we could contribute). Triaging the bugs (one of the most necessary, thankless, and under-appreciated jobs in an open-source project). Probably more... -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HHVM vs. Zend divergence
Hi! > So to be super clear: I'm just pointing out that there used to be issues > here; sometimes the community's interests do not exactly align. Consider > me in the devil's advocate role again: I'd be interested to hear an > insider's opinion (Stas?) on how security issues are handled these days and > what the future outlook is, > https://www.cvedetails.com/vulnerability-list/vendor_id-74/product_id-128/PHP-PHP.html OK, so it's a big topic but I can do some quick survey and we can get deeper if you want to. So: Despite what you see there, there are not a lot of genuine security issues in PHP. Unfortunately, CVE issuance process is such that it does not involve any consultation with the vendor about classifying the issue and assigning severity. You can just create CVE ID for anything and how they assign severity is kinda mystery to me, but one thing is clear - PHP core team is not consulted about it, at least not in any way I've ever noticed. It doesn't mean those are all wrong, but some probably are, and a lot are misclassified. Now, for the substance of the issues. Many of them are unserialize() issues. For this, without getting too much into the woods, I can only say this: there seems to be no way to make unserialize() robust against untrusted data, and it has to do with how references, object construction, object destruction (notice the similarity with what Hack intends to drop? It is not a coincidence) and serialization support works in PHP. There is too much internal structure exposed to make it work with untrusted data now. Maybe if we redesigned the whole thing from scratch, it _could_ be possible, but even then I kinda doubt it, at least not without sacrificing some feature support. For now, the best approach to this is consider using any unserialize() on untrusted data inherently insecure. It's just too low-level to be sure no corner case ever does anything strange. Some of those are genuine security problems, which now mostly concentrate in several extensions, which has been either under-served or have very wide exposed areas. Namely, wddx - obscure format that suffers from issues similar to unserialize() and in addition lack maintainership, phar - which wasn't really designed to deal with untrusted scripts but seem to kinda be moving that direction, and gd - which mostly relies on libgd and every issue there is also automatically extends into PHP. The latest two would probably benefit from a good fuzzing testing, but nobody took on it yet. I suspect if hhvm uses the same or derived code - and it likely does, I don't think they reimplemented libgd from scratch? - many of those would also be present in hhvm if hhvm supports these (no idea). There are also some long-standing debates about how randomness is handled (basically, there are many ways to get randoms, and most of them are not suitable for security-related randomness) and some DoS issues in PHP hash tables - the latter are mostly resolved, but in kinda temporary way, so there's more work to be done. Some of these issues - like https://bugs.php.net/bug.php?id=74310 - are definitely not security issues at all. Yes, you can create bad code that hits some corner case and produces segfault. That shouldn't happen, but this being complex C code which is 20 years old, it does. This has absolutely nothing to do with security, and whoever decided to issue CVE to it and assign 7.5 severity (https://www.cvedetails.com/cve/CVE-2017-9119/) has done a very sloppy job. As I said, this is done with zero communication with actual PHP team as far as I know, which is very sad, but this is the state of affairs. Some of those are, I am sorry to say, complete baloney, e.g. https://www.cvedetails.com/cve/CVE-2017-8923/ says: <> This is nonsense - unless you run with no memory limit at all (nobody sane does that) and specifically allow your code not only accept infinite untrusted data, but have it fed specifically to certain functions arranged into a specific code pattern, no remote attacker can do it. There are many issues that have similar claims, none of them are actual security issues. Some are also assigned to PHP despite them being application issues, e.g. https://www.cvedetails.com/cve/CVE-2017-9067/. To add insult to injury, this is 2017 CVE about a version that was EOLed in 2014. The data quality seems to be very sad there. I tried to figure out how to make it better a while ago but pretty much gave up on it because I couldn't find anybody responsible or at least concerned about this sad state of things. OK, this came out super-long and kinda ranty (sorry!), so I will stop for now, but if you have any questions about it, please feel free to ping me, and I will be glad to discuss it. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HHVM vs. Zend divergence
Hi! > Incidentally, how much work has been done on incorporating HHVM's > improvements back into Zend? Depends on which ones you're talking about. Syntax ones may or may not find its way into PHP, but performance ones would probably be completely different from HHVM - i.e. the resulting performance may or may not be on par or better, but reusing most of the performance work on HHVM in PHP would not be possible due to completely different engine internals. So pretty much all that can be taken from HHVM into PHP would be "this syntax looks like a good idea, let's reimplement it". -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HHVM vs. Zend divergence
Hi! > I agree the short timeline seems to push us toward reverting to Zend. But > it is worth having a meaningful discussion about the long-term outlook. > Which VM is likely to be better supported in 15 years' time? Which VM 15 years is a lot to predict. 15 years ago Facebook, Twitter, Reddit and Linkedin did not exist and Slashdot, Livejournal, etc. were all the rage. We don't even know if Facebook as such would exist in 15 years or would have budget to support its own language. > would we rather adopt and maintain indefinitely ourselves, if needed -- > since in a 15 yr timeframe it's entirely possible that (a) Facebook could > abandon Hack/HHVM, or (b) the PHP Zend team could implode. Maintaining While (b) could happen, PHP project is not very dependent on Zend for its existence. Zend owns none of the infrastructure or processes, and while a lot of performance work on PHP 7 was conducted by Zend team (and they are still working on improvements AFAIK), there are plenty of community members that do not work for Zend and do not depend on Zend in any way. Of course, it is possible that the whole community would implode, but here we have many more stakeholders than in Hack case, where the stakeholder is mostly a single - albeit large and currently very successful - company. > speaking, it's not really a choice between "lock-in" and "no lock in" -- we > have to choose to align our futures with either Zend Technologies Ltd or > Facebook. One of these is *much* better funded than the other. It is Again, I do not think this is the right statement to make. The control of Zend Tech as a company over the future of PHP is much less than Facebook's one over Hack (which is pretty much absolute). PHP is guided by the community, decisions are taken by using community processes in which Zend does not have any special role, and PHP project could survive reasonably survive without Zend, even if with less resources. Most PHP infrastructure - Composer, debugging, IDEs, profiling, code quality, frameworks, etc. - are completely independent of Zend (which also has a number of tools, but it is not the only provider). So I do not think it is an adequate comparison. I am not sure if Hack has an open-source community outside Facebook (if anybody has pointers to that, please share - commit numbers certainly don't tell much) - but it is pretty clear to me that Facebook is in absolute control over this platform. This is not the case with Zend and PHP. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HHVM vs. Zend divergence
Hi! > * Rather than "drifting away" from PHP, their top priority plans > include removing core language features like references and destructors. Wow. I can see why they're doing it (those are sources of most complications ans security issues in the language, references being especially weird and tricky). But dropping those would certainly mean very heavy incompatibility with PHP, by which point it'd be completely separate language. Which probably excludes Max's #2 from consideration altogether. > Actually, I think a year is a pretty short time for ops to switch to > PHP 7. I think we need to decide on this pretty much immediately. Should it be on the TechCom agenda and should we have some public discussion on IRC in RFC format for this soon? -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Categories in RDF/WDQS
Hi! I'd like to announce that the category tree of certain wikis is now available as RDF dump and in Wikidata Query Service. More documentation is at: https://www.mediawiki.org/wiki/Wikidata_query_service/Categories which I will summarize shortly below. The dumps are located at https://dumps.wikimedia.org/other/categoriesrdf/. You can use these dumps any way you wish, data format is described at the link above[1]. The same dump is loaded into "categories" namespace in WDQS, which can be queried by https://query.wikidata.org/bigdata/namespace/categories/sparql?query=SPARQL. Sorry, no GUI support yet (probably will happen later). See example in the docs[2]. These datasets are not updated automatically yet, so they'll be up to date roughly for the date of the latest dump. Hopefully soon it will be automated and then the datasets will be updated daily. The list of currently supported wikis is here: https://noc.wikimedia.org/conf/categories-rdf.dblist - these are basically all 1M+ wikis and couple more that I added for various reasons. If you have a good candidate wiki to add, please tell me or write on the talk page for the document above. Please note this is only the first step for the project, so there might still be some rough edges. I am announcing it early since I think it would be useful for people to look at the dumps and SPARQL endpoint and see if something is missing or does not work properly, and share ideas on how it can be used. We plan eventually to use it for search improvement[3] - this work is still in progress. As always, we welcome any comments and suggestions. [1] https://www.mediawiki.org/wiki/Wikidata_query_service/Categories#Data_format [2] https://www.mediawiki.org/wiki/Wikidata_query_service/Categories#Accessing_the_data [3] https://phabricator.wikimedia.org/T165982 Thanks, -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HHVM vs. Zend divergence
Hi! > I can say that PHP 7 locally runs unit tests significantly faster than PHP > 5.6, although that's not really a representative workload for running a > website. PHP 7 is faster than 5.6, and probably be on almost any workload, from my experience (the degree of speedup would vary of course). As for comparison with hhvm, I heard various reports, but I think spending some time on seriously testing it (I mean creating proper production setup, and directing either captured/replayed or real traffic to it) is the best way to go. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HHVM vs. Zend divergence
Hi! > Other tools we are using, such as Phabricator, will also be following HHVM > to Hack (presumably). I am not sure this is the case. Mozilla recently declared they want to use Phabricator[1], but I heard no mention about HHVM. Which makes me think that one stays on PHP. Also, Phabricator is now independent from Facebook, afaik, since it's developers have separate company, Phacility. [1] https://wiki.mozilla.org/Phabricator -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HHVM vs. Zend divergence
Hi! > 1) Continue requiring that MediaWiki uses a common set of HHVM and Zend > PHP. This, however, is a dead end and will make things progressively harder > as the implementations will diverge and various Composer libraries we use > will start requiring some Zend-specific features. This will probably ultimately happen, but given the PHP version stats, e.g. here: https://seld.be/notes/php-versions-stats-2017-1-edition I think we have several years at least before that starts becoming an issue. Realistically, if you write distributable PHP code now, targeting 7.1 gets you only 17% of the users, so you'd do either 7.0 (gets you about half) or more likely even 5.6. Extending this trend (I know, dangerous, but let's assume) if 7.2 is released somewhere around Dec 2017-Jan 2018, 7.3 would probably not happen before around 2019. If that would have features not supported in HHVM, that means we'd have to worry around 2021 when people would start releasing components targeting it. So we have about 3 years to get the solution - *if* 7.3 has features not supported by HHVM. Note that this statistics is for Composer users, which means it is probably skewed towards modern versions, since people using PHP 5.3 probably don't use Composer too much in general. OTOH, since we do use Composer, that appears to be appropriate for our case. > 2) Declare our loyalty to HHVM. This will result in most of our current > users being unable to upgrade, eventually producing what amounts to a > WMF-only product and lots of installations with outdated MediaWiki having > security holes. At least we will be able to convert to Hack eventually. > This is a very clean-cut case of vendor lock-in though, and if Facebook > decides to switch their code base to something shinier, we'll be deep in > trouble. I don't think this is a good idea, for reasons that seem obvious to me (but I can elaborate if necessary). > 3) Revert WMF to Zend and forget about HHVM. This will result in > performance degradation, however it will not be that dramatic: when we > upgraded, we switched to HHVM from PHP 5.3 which was really outdated, while > 5.6 and 7 provided nice performance improvements. I think we should evaluate 7.1 or 7.2 (provided we don't have any runtime issues with them) and see how performance looks like ASAP (with opcache, of course). Of there's some help needed, or there are some specific issues that are blockers, I think Zend team would be glad to talk to us. If needed, I could probably help with establishing the contacts. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Can we drop revision hashes (rev_sha1)?
Hi! On 9/15/17 1:06 PM, Andrew Otto wrote: >> As a random idea - would it be possible to calculate the hashes > when data is transitioned from SQL to Hadoop storage? > > We take monthly snapshots of the entire history, so every month we’d > have to pull the content of every revision ever made :o Why? If you already seen that revision in previous snapshot, you'd already have its hash? Admittedly, I have no idea how the process works, so I am just talking out of general knowledge and may miss some things. Also of course you already have hashes from revs till this day and up to the day we decide to turn the hash off. Starting that day, it'd have to be generated, but I see no reason to generate one more than once? -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Can we drop revision hashes (rev_sha1)?
Hi! > We should hear from Joseph, Dan, Marcel, and Aaron H on this I think, but > from the little I know: > > Most analytical computations (for things like reverts, as you say) don’t > have easy access to content, so computing SHAs on the fly is pretty hard. > MediaWiki history reconstruction relies on the SHA to figure out what > revisions revert other revisions, as there is no reliable way to know if > something is a revert other than by comparing SHAs. As a random idea - would it be possible to calculate the hashes when data is transitioned from SQL to Hadoop storage? I imagine that would slow down the transition, but not sure if it'd be substantial or not. If we're using the hash just to compare revisions, we could also use different hash (maybe non-crypto hash?) which may be faster. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Recommending Firefox to users using legacy browsers?
Hi! > After Firefox and Chromium, there's a bunch of open source web browsers > listed on [2], but a brief spot check showed many as being Linux only > (or outdated Mac builds). One that looked promising was Brave[3], though > it's a relatively new browser and I would need to do more research > regarding #3. I've been using Brave for a couple of months occasionally, and it seems to work pretty well. It has (some) adblocking in default config, and some other privacy-enhancing settings, which are probably not very important for Wikimedia sites but may either break some other sites or make them bearable :) It's pretty young, so I don't think we can say much about security record yet - IIRC it's based on Chromium, and it's updated pretty frequently, and it's easy to use (though the UI might be a bit more spartan then others for now, and not many extensions available - but for ex-IE users it may not be an issue). -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Search update: sister project snippets are now in production!
Hi! > But, no results for Wikidata, the site that covers more topics than all out > other sites? Wikidata search is think right now may not be ready for this yet. It's way more complicated than regular wiki search because it's a) multilingual and b) data and not text. We're working on it though :) -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] @author annotations in files in the mediawiki codebase
Hi! >> > It can sometimes tell you whom to ask for advice or reviews. (git >> log would too but it's more effort.) I feel @author is a bit misleading in this case - if code is refactored/amended, original author that wrote it, possibly 10 years ago, may not be the best person to ask what's going on in it now. OTOH, the person who knows it best now may not be comfortable listing oneself as author of the code after just refactoring and amending it, not originally authoring it. Additionally, some @author clauses only list name or nick, without any contact information. If the person is still active in the project under the same name, it may be easy to track them, but if not, it's mostly hopeless. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] @author annotations in files in the mediawiki codebase
Hi! > MY understanding is that removing the @author @copyright tags in > MediaWiki code represent ownership of the original code placed under the > GPL. Subsequent modifications being derivative products. But there's no way to verify that the code is indeed an original creation of whoever is listed under @author, and not a derivative work of something else. > I am not a lawyer, but by dropping the copyright information, I highly > suspect that will be a breach of the license. AFAIK GPL itself does not protect attribution. It allows (optionally) to add clauses protecting attribution, but does not require it. I wonder though, given that Git has all the change history including authorship, what is the need to duplicate that information in the source code (and risk the two getting out of sync)? And if we are not considering Git logs to be part of the distribution, we're already violating this GPL clause: You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. Since you can commit the change (thus causing original work to be modified) without such notice, except for Git metadata. We obviously consider Git metadata to be enough in this case, why not in any others? -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] use of @inheritdoc
Hi! > What are people's opinions on using it? I think it's OK to use it in relevant cases (along with already existing and used @see) and recognize such uses as documented, but making it mandatory would look like increasing busywork for a doubtful benefit of making robots (e.g. syntax checkers) not bother us with warnings. Since I see the purpose of robots to make _less_ busywork, this seems to be counter-productive. For the benefit of humans, if they're reading generated docs, doc generators should be already able to handle inheritance? If they're reading raw code @inheritdoc provides marginal benefit of reminding to look into parent method, but most people probably would do it anyway if child method does not have docs. I am not sure I find the argument "it ensures docs are not forgotten" that convincing - if no docs meaning "same docs as parent", then there's nothing to forget - there is always documentation inheritance by default (of course, if parent is not documented, this is not true, but @inheritdoc does not fix that) so putting @inheritdoc is just saying "yes, use default" - thus defeating the purpose of having the default! And if the method needs more docs that just the default (e.g. child does something special), the habit of putting @inheritdoc everywhere does not help - on the contrary, since @inhertidoc, at least as defined by JS, is exclusive, it means to add some content one will have to work extra, thus lowering the probability of it happening. Also, some code now uses params/return tags in child code but the doc text only in parents (presumably so that it'd be easier for IDEs?). @inheritdoc as defined by Javascript means "ignore the rest of the tags" - but that's not what most IDEs would do. So there's a potential that different systems would read different tags (shouldn't make a difference but might still). Not sure how that would work? If combining @inheritdoc with other content works as Gergo described, it may be helpful, but still not sure what exactly it would happen. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] "must be of type int, int given"
Hi! > Exception handler threw an object exception: TypeError: Argument 2 passed > to stream_set_blocking() must be an instance of int, bool given in > /vagrant/mediawiki/includes/libs/redis/RedisConnectionPool.php:233 This one is weird - second arg of stream_set_blocking should be boolean, e.g.: http://php.net/stream_set_blocking But looking into hhvm source, I find this: https://github.com/facebook/hhvm/pull/7084 Looks like hhvm had a wrong definition of the function. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] "must be of type int, int given"
Hi! > So, if I start a fresh MediaWiki Vagrant installation, and the vagrant ssh > into the virtual machine, the fastest way to reproduce the issue is to > start hhvmsh and type > > function f():int { return 1; }; Suspicion: HHVM thinks "int" is a class name, not primitive type name. Not sure why though... -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] +2 request for yurik in mediawiki and maps-dev
Hi! >>> that step entirely seems risky to me. Unless, as I said, having +2 really >>> isn't a big deal. +2 isn't super-big deal since all changes are public and you're not supposed to self-+2 in any case, and anything doable by patch can also be undone by patch. As I understand it, it's more like "this person knows enough and known enough to get approval rights". It won't change if somebody leaves WMF. If there's external reasons for +2 removal, that can happen without WMF in the picture and I imagine there are rules for this which can be applied. So I'd say if somebody had pre-WMF +2, they should not lose it just because they joined and then left the WMF, and this should be automatic by default. Non-default cases can always be handled on per-case basis. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Arbitrary Wikidata querying
Hi! > Actually, specifically for list of presidents you don't need bot. Yeah, you are right, I was thinking about going through query route, but if your list is contained in one property (like Q30/P6) then using Lua is just fine. It's not always the case (e.g. "list of all movies where Brad Pitt played"). But where it works it's definitely a good way to go. > 3. It is limited to simple lists (you can't have list of Republican > presidents - because it requires additional filters and you don't want to > create new property for it) Exactly. You probably could still do something in Lua, but that's pushing it already. > 4. Internationalization - What if yi Wikipedia wants to create list of > governors of some small country where there are no yi labels for the > presidents? The list would be partially in yi partially in en - is this > desired behavior? or they can show only presidents who have label in yi - > but this would give partial data - is this the desired behavior? [Probably > the correct solution is to do show the fallback labels in en, but add some > tracking category for pages requires label translation or [translate me] > links) That sounds like a good idea :) -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Arbitrary Wikidata querying
Hi! > Sure, but I'm not really worried about potential false positives. I'm > worried that we're building a giant write-only data store. Fortunately, we are not doing that. >> Unless you're talking about pulling a small set of values, in which case >> Lua/templates are probably the best venue. > > I'm not sure what small means here. We have about 46 U.S. Presidents, is > that small enough? Which Lua functions and templates could I use? No, list of presidents is not small enough. Lua right now can fetch specific data from specific item. Which is OK if you know the item and what you're getting (e.g. infoboxes, etc.) but not good for lists of items, especially with complicated conditions. That use case currently needs external tools - like bots. > Wikidata began in October 2012. I thought it might take till 2014 or even > 2015 to get querying capability into a usable state, but we're now looking Please do not confuse your particular use case with querying not be usable at all. It is definitely usable and being used by many people for many things. Generating lists directly from wiki template is not supported yet, and we're working on it. I'm sorry that your use case is not supported and you're feeling disappointed. But we do have query capability and it can be used and is being used for many other things. Of course, contributions - in any form, query development, code development, design, frontend, backend, data contributions, etc. - are always welcome. > to even contribute to it when it feels like putting data into a giant > system that you can't really get back out. I love Magnus and I have a ton Again, this is not correct - you can read data back out and there are several ways you can use query functionality for it right now. The way you want to do it is not supported - yet - but there are many other ways. Which we are constantly improving. But we can't do everything at once. Please be patient, please contribute with what you can, and we'll get there. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] phan (new static analysis tool) is now voting on MediaWiki core
Hi! > There's documentation on mediawiki.org[3] about how it is currently > configured, and how to set it up locally. You'll need PHP 7 with the ast > extension to actually run phan. If that's not possible for your system, > you can rely on jenkins to run it for you. I wonder how hard would it be to add php7/phan to mediawiki-vagrant? -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Discussion Platform
Hi! > Depends on how you define easy. > http://bots.wmflabs.org/~wm-bot/logs/%23mediawiki/ is a recording of > everything in #mediawiki by date, oldest at the top, newest at the bottom. > I would consider that fairly easy. It's easy if your use case is "reading everything on a particular day". If your use case is "locating that part where John and Mary talked about FooBar", it's not easy at all. Raw logs are as usable as... well, raw logs. People do read raw logs from time to time, but usually they employ tools to make sense of them - e.g. kibana, etc. Of course, it's a bit strained analogy, but my point is IRC raw logs are not a very good UI for many use cases. Don't have a good solution for this, as IRC is still excellent as transient quick discussion medium, but much less as a long-term persistent discussion one. OTOH, maybe that should be done with wiki+Flow? -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Arbitrary Wikidata querying
Hi! > If I wanted to make a page on the English Wikipedia using wikitext called > "List of United States presidents" that dynamically embeds information > from <https://www.wikidata.org/wiki/Q23> and > <https://www.wikidata.org/wiki/Q11806> and other similar items, is this > currently possible? I consider this to be arbitrary Wikidata querying, but > if that's not the correct term, please let me know what to call it. So this is kind of can of worms which we I guess eventually have to open, but very carefully. So I want to state my _current_ opinion on the matters - please note, it can change at any time due to changing circumstances, persuasion, experience, revelation, etc. 1. Technically, anything that can access a web-service and speak JSON, can talk to SPARQL server. So, in theory, making some way to do this, *in theory*, would not be very hard. But - please keep reading. 2. I am very apprehensive about having direct link between any wiki pages and SPARQL server without heavy caching and rate limiting in between. We don't have super-strong setup there and I'm afraid making such link would just knock our setup over, especially if people start putting queries into frequently-used templates. 3. We have number of bot setups (Listeria etc.) which can auto-update lists from SPARQL periodically. This works reasonably well (excepting occasional timeout on tricky queries, etc.) and does not require requesting the info too frequently. 4. If we want more direct page-to-SPARQL-to-page interface, we need to think about storing/caching data, and not for 5 minutes like it's cached now but for much longer time, probably in storage other than varnish. Ideally, that storage would be more of a persistent store than a cache - i.e. it would always (or nearly always) be available but periodically updated. Kind of like bots mentioned above but more generic. I don't have any more design for it beyond that but that's I think the direction we should be looking into. > A more advanced form of this Wikidata querying would be dynamically > generating a list of presidents of the United States by finding every > Wikidata item where position held includes "President of the United > States". Is this currently possible on-wiki or via wikitext? No, and there are tricky parts there. Consider https://www.wikidata.org/wiki/Q735712. Yes, Lex Luthor held the office of the President of the USA. In a fictional universe, of course. But the naive query - every Wikidata item where position held includes "President of the United States" - would return Lex Luthor as the president as legitimate as Abraham Lincoln. In fact, there are 79 US presidents judging by "position held" alone. So clearly, there need to be some limits. And those limits would be on case-by-case basis. > If either of these querying capabilities are possible, how do I do them? > I don't understand how to query Wikidata in a useful way and I find this > frustrating. Since 2012, we've been putting a lot of data into Wikidata, > but I want to programmatically extract some of this data and use it in my > Wikipedia editing. How do I do this? Right now the best way is use one of the list-maintaining bots I think. Unless you're talking about pulling a small set of values, in which case Lua/templates are probably the best venue. > If these querying capabilities are not currently possible, when might they > be? I understand that cache invalidation is difficult and that this will > need a sensible editing user interface, but I don't care about all of > that, I just want to be able to query data out of this large data store. We're working on it (mostly thinking right now, but correct design is 80% of the work, so...). Visualizations already have query capabilities (mainly because they have strong caching model embedded and because there are not too many of them and you need to create them so we can watch the load carefully). Other pages can gain them - probably via some kind of Lua functionality - as soon as we figure out what's the right way to do it, hopefully somewhere within the next year (no promise, but hopefully). -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Changes in colors of user interface
Hi! > <https://xkcd.com/1770/> seems pretty timely! Or this one: https://xkcd.com/1172/ :) -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Update on WMF account compromises
Hi! > By using an online testing tool, you are effectively breaking the very > first rule: > > DO NOT GIVE OUT YOUR PASSWORD. EVER. That's why I suggested having internal bot that would use the same techniques intruders use to test passwords (without knowing them), instead of having people to send their pwds to unknown site and trust them not to do anything wrong with it :) -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikimedia Developer Summit 2017 discussion
Hi! > In what way do you think that MediaWiki is not designed to promote and > foster online collaboration? Depends on purpose of collaboration. For some things - like project maintenance, planning, tracking, etc. - it's far from the best. Deep discussion is also not without friction (phab is not ideal in that regard either btw) - keeping track of several discussions becomes very hard, especially if a number of people participates. Flow may make matters better but it's not enabled everywhere, and not on mediawiki.org. That said, for collaborative document editing MediaWiki is not bad - but definitely can use some improvement. Things that specifically come to mind are: 1. Ability to comment directly on parts of text. Right now to discuss the paragraph you need to go to talk, and then each time go back and forth to keep in mind what we're talking about. High friction. 2. Ability to submit/approve changes. I know "be bold" works well in some cases, but in others I wouldn't presume to edit somebody's text without their approval. However, if I could submit an edit for their consideration, which they might approve - that might work better. 3. Better diff notifications (you mentioned it). The delta for MediaWiki to become good task/project management is very big IMHO, so I'm not sure it makes sense to go there instead of using existing solutions. >> frequently called for ArchCom-RFC authors to move the bulk of the >> prose of their RFCs onto mediawiki.org. However, Phabricator is >> really good tool for a couple of things: >> 1. Doling out short unique identifiers of tasks, events, etc > > Every page in MediaWiki is assigned a unique page_id. You can visit an It does. I'm not sure the UI allows you to discover this fact though. OTOH, I agree that textual IDs are better for many things when we're talking about unique documents (a-la RFCs). For stuff like bugs or fine-grained tasks, IDs usually work better though. > work on improving MediaWiki's search, while Phab doesn't support basic > features like stemming. I'm generally a fan of Phabricator as a bug > tracker - not as a collaborative document editing platform. I agree here. For collaborative document editing Phabricator is not the best solution, and IMHO not better than MediaWiki. This of course is general things, not specific to decision to use Phab for specific task of CfP, which Quim seems to have decided already. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Recent changes, notifications & pageprops
Hi! > You can seek back on EventBus events, but not permanently (by default, only > up to 1 week). If you want to respond to changes in an event stream, you 1 week is not enough for this use case, but if it could be extended to, say, 1 month, that could be workable. The reason is that the starting point for the WDQS server install is wikidata dump, which is made weekly. Then the server is catching up to the data that changed from the dump point until the current moment. However, there could be dump failures or other conditions which may make most recent dump unusable. It also takes to load the dump itself. So the delta between current moment and data in freshly deployed WDQS server could be 2 weeks or even more. We need to be able to catch up to the changes since then. We probably will never need the full month, but it's a conservative limit we're using now for how far back we can ask for data. 2 weeks would probably work too even if it could mean some scenarios become more complicated to handle. > should consume the full event stream realtime and react to the events as > they come in. A proper Stream Processing system (like Flink or Spark This is not possible for the WDQS Updater. Since WDQS server is completely independent of Wikidata, it can be started and stopped at anytime. There's no way to ensure that at every moment something is changed in Wikidata all WDQS instances that are interested in this change are up and running. There needs to be an intermediary system that keeps the data. So far recent changes API served as this system, but since it does not know about secondary data, it's no longer enough. > this stream will be relatively small, and you don’t need fancy features > like time based windowing. You just need to update something based on an > event, right? Well, I need something based on an even that I can ask something like: "give me all events that happened since time point T". For T being, say, from a second ago to 2 weeks ago. > The change-propagation service that the Services team is building can help > you with this. It allows you to consume events, and specify matching rules > and actions to take based on those rules. > > https://www.mediawiki.org/wiki/Change_propagation I see no mention of ability to consume past events. Is it possible? -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Recent changes, notifications & pageprops
Hi! > Could we emit a page/properties-change event to EventBus when page props > are updated? Similar to how we emit an event for revision visibility > changes: This, however, still is missing a part because, as I understand, EventBus is not seekable. I.e., if I have data up-to-date to timepoint T, and I am now at timepoint N, I can scan recent changes list from T to N and know if certain item X has changed or not. However, since recent changes list has no entries for page props, and events on EventBus past N are lost to me, I have no idea if page props for X changed between T and N. To know that, I need permanent seekable record of changes. Or some flag that says when it was last updated, at least. Unless of course I'm missing the part where you can seek back on EventBus events, then please point me to the API that allows to do so. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Recent changes, notifications & pageprops
Hi! I'd like to raise a topic of handling change notifications and up-to-date-ness of Wiki pages data with relation to page props. First, a little background about how I arrived at the issue. I am maintaining Wikidata Query Service, which updates from Wikidata using recent changes API and RDF export format for Wikidata pages. Recently, we have implemented using certain page properties, such as link & statement counts. This is when I discovered the issue: the page properties are not updated when the page (Wikidata item) is edited, but are updated later, as I understand by a job. Now, this leads to a situation where when I have a recent changes entry, and I look at the RDF export page - which contains page props derived data now - I can not know if page props data is up-to-date or not. Moreover, if the job - some unknown and undefined time later - updates the page props, I get no notification since the modification is not reflected in recent changes. This makes usage of information derived from page props very hard - you never know if the data is stale or whether the data in page props matches the data in the page. The problem is described in more detail in https://phabricator.wikimedia.org/T145712 I'd like to find a solution for it, but not sure how to proceed. The data specific to this case can be easily generated from the data already present in memory during the page update, but I assume there were some reasons why it was deferred. We could make some kind of notification when updating page props, though that would probably seriously increase the number of notifications and thus slow the updates. Also, in some cases, the second notification may not be necessary since the page props were updated before I've processed the first one, but I have no way of knowing it now. Any advice on how to solve this issue? -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] What's the "correct" content model when rev_content_model is NULL?
Hi! > It seems there is disagreement about what the correct interpretation of NULL > in > the rev_content_model column is. Should NULL there mean > > (a) "the current page content model, as recorded in page_content_model" > > or should it mean > > (b) "the default for this title, no matter what page_content_model says"? As I understand, NULL is there as a space-saving measure. So I guess we want to ask ourselves if we want to go to so much trouble to save space... Abstractly, a) looks better than b) for me since the scenario where default changed and all pages with all default are now broken is avoided there. OTOH, if the pages are updated together with the default, that must have caused page_content_model to update too, so in this case a) should work too. > There is also an in-between option, let's call it a/b: fall back to > page_content_model for the latest revision (that should *always* be right), > but > to ignore page_content_model for older revisions. That would cater to use case This may be even better, since page record is supposed to match latest revisions, but not prior revisions. That still leaves prior revisions in case of default change broken, but at least current one isn't. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] PHP namespaces
Hi! > According to PSR-4 <http://www.php-fig.org/psr/psr-4/> > >The fully qualified class name MUST have a top-level namespace name, >also known as a "vendor namespace". Does this imply all classes in our code base must start with \Mediawiki? If so, I don't think this is a very good idea. This is a good idea for libraries, especially ones that have good functional unity (i.e. largely do one thing) or part of one big framework (e.g. Zend Framework or Symphony). For large and mostly stand-alone apps like Mediawiki I wonder is there's much point to have a thousand classes all prefixed Mediawiki, given that mostly everything we deal with is Mediawiki, and what is not, already has its own namespace anyway. If we have a library that is a) related to Mediawiki in substantial way and b) reusable outside, then Mediawiki prefix would be appropriate. But not sure whether it makes much sense for extensions... -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Best practice for WIP patches to help code review office hours
Hi! > No, -2 is restricted to project owners and thus not an op- > tion for the vast majority of contributors. For that pur- > pose, I proposed a Gerrit label "WIP" > (cf. > http://permalink.gmane.org/gmane.science.linguistics.wikipedia.technical/84068). This looks like a nice solution. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Docs, use of, and admin privileges for wikimedia github project?
Hi! > repository. They can use javascript and connect to the github api > however, so it might be possible to build a useful browser interface > that filtered based on the '-' separated name components. You could at Exactly! I've (ab)used github API for a bit, and found this (please correct me if the figures look weird, I trusted API passed through several quick-n-dirty scripts so mistakes possible): - we have 1780 repos - 900 of them are extensions, so putting extensions in separate group cuts it by half - 102 repos do not have "-" in their name, meaning those would be hardest to classify - 1016 have "mediawiki" prefix - 243 are "operations" - 76 are "analytics" - 48 are "labs" - 20 are "pywikibot" - 18 are "thumbor" - 18 are "integration" - 15 are "apps" - 10 are "phabricator" The rest of the groups are smaller or don't look like a good unifying principle immediately. Thus, even with very naive approach we could categorize at least 1464 or over 80% of the repos, leaving just about 300 non-obvious ones, of which probably about 100-200 will end up in "other". Not ideal, but at least better than 1780 :) The list of names is at https://gist.github.com/smalyshev/dfa72c79d9f750058262b902f47c0130 if anybody wants to play with it and save time on extracting it :) -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Docs, use of, and admin privileges for wikimedia github project?
Hi! > On 2016-04-25 19:01, Chad wrote: >> Honestly, I'm not entirely convinced that "mirror everything" is all that >> useful. It mostly results in a ton of unused repos cluttering up lists. > > I, for one, appreciate it. GitHub's interface is unfortunately a lot > more convenient than any of the repository viewers we host ourselves. :( Same here. Especially if I need to browse code and/or to communicate to somebody about it - especially somebody outside MediaWiki community (which means probably not trained with tools like gerrit but most probably having some familiarity with github because who doesn't have it these days?) - I almost always use github. If we have something better, I'd use it - but right now neither gerrit, nor gitblt, nor phabricator's code browser are superior to what github offers. Now, having something like wikimedia.github.io would be an excellent idea. If somebody would do the design, loading up repos list and displaying them with a nice structure - given that we actually have pretty structured names already, so we could start from that - should not be super-hard? -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikipedia in Pig Latin
Hi! > We just had April fools. Please don't. A single test variant is fine, but > we don't need more. I think it would be nice to have a variant such as Pig Latin. Right now most variants that we have are in languages most people don't understand, so it's kind of hard to test some stuff. Pig Latin has the following advantages: - (nearly) everybody knows what it is and can read it if they can read English - it is easy to see if text is in Pig Latin or English - it is not just random messing up text but has specific rules - no additional requirements for keyboard, fonts, editors, etc. - it is funny to look at - and we all could use occasional comic relief :) I'm not sure about production wiki but at least having it in development/test/beta would be nice I think - e.g. as vagrant role or something like that maybe? -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] productivity of mediawiki developers
Hi! > I personally reserve -2 for "this is a fundamentally bad idea" or "this > requires > community consensus before being implemented". Anything that is fixable in the > code should get a -1 or 0. > > Btw, I personally prefer to get -1 reviews over 0 reviews, simply because it's > easier to spot them as "todo" on the gerrit dashboard. If gerrit would > highlight > "stuff with new comments" more prominently, I'd probably use 0 more often. I treat -1 as "this needs to be fixed before it can go in, but once it is fixed it's good". Agree on -2. I use 0 for just commenting on things where I do not feel qualified or entitled to review things but still have something to say, like additional todo items or general discussion. So, most reviews should be +1/-1. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] bluejeans
Hi! > If BNJ isnt actually open source, here is an open source solution that > we could use and help fund as required (e.g. buying their commercial > offerings so that WMF Engineering/Ops doesnt need to support it) > > http://bigbluebutton.org/ I've checked their demo and it uses Flash. Which is very iffy from security standpoint, may lead to various issues on platforms that don't support it or where support is sketchy and is not a good idea in general long-term since Flash is on its way out as a technology. While I'm all for supporting open-source, both by using it and contributing to it, in this particular case it doesn't look like viable solution to me. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Prefix search refactoring
Hi! In order to make prefix search better, and to bring all variants of prefix search under one roof, we did some refactoring in the search engine implementation, so that various prefix searches now use the same code path and all use the SearchEngine class. The changes are as follows: SearchEngine gets the following new API functions: * public function completionSearch( $search ) - implements prefix completion search, returns SearchSuggestionSet * public function completionSearchWithVariants( $search ) - implements prefix completion search including variants handling, returns SearchSuggestionSet. * public function defaultPrefixSearch( $search ) - basic prefix search without fuzzy matching, etc., to be used in scenarios like special pages search, etc. Returns Title[]. The implementation does not have to implement all three methods differently, they can all use the same code if needed. The default implementation still supports the PrefixSearchBackend hook but we plan to deprecate it, and the CirrusSearch implementation does not use it anymore. Instead, there is a private function, protected function completionSearchBackend( $search ), which implementations (including CirrusSearch) should implement to provide search results. SearchEngine implementations can make use of services provided by the base SearchEngine including: - namespace resolution and normalization. The PrefixSearchExtractNamespace hook is still supported for engines wishing to implement namespace lookup not featured in the standard implementation. - fetching titles for result sets (the implementing engine does not have to fetch titles from DB for suggestions) - result reordering to ensure exact matches are on top - basic prefix search implementation using the database - Special: namespace search implementation == Deprecations == We plan to deprecate the PrefixSearchBackend hook and classes TitlePrefixSearch and StringPrefixSearch. We will keep those classes around for basic search fallback implementation and for old extensions, but no new code should be using these classes, instead they should use SearchEngine APIs described above. Mediawiki code has already been fixed to do that. Extensions implementing search engines should also extend SearchEngine and override the APIs above. CirrusSearch is the example of how to do it. == Show me the code == The patches implementing the refactoring are linked from: https://phabricator.wikimedia.org/T121430 Pretty version of the same: https://www.mediawiki.org/wiki/User:Smalyshev_(WMF)/Suggester If you have questions on this, please contact the Discovery team: https://www.mediawiki.org/wiki/Wikimedia_Discovery#Communications -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Mass migration to new syntax - PRO or CON?
Hi! > Please give a quick PRO or CON response as a basis for discussion. My opinion: if it ain't broke, don't fix it. It's ok to use new syntax in new code, but spending time on changing perfectly working code just to use new array syntax looks like misplaced effort to me. There are new features that I would have more support immediate change, like $this in closures - that makes code much more readable and less bug-prone. Even then, I'm ambivalent whether we need to touch the code that much, but I see a point of cleaning it up. But with array syntax, it's just a different syntax, and I don't see a lot of reason to mess with existing working code just to use it. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Mass migration to new syntax - PRO or CON?
Hi! > PRO. These syntax changes were implemented in PHP at the cost of breaking > backward-compatibility, which tells you that people understood their value Wait, are we talking about the same thing? New array syntax does not break BC. Or you mean that if we use new array syntax, we'd break BC with older PHP versions? I'm not sure I understand your argument here. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Mass migration to new syntax - PRO or CON?
Hi! > The question as I understood it, is should we touch every piece of our > codebase in one big mega patch or update it gradually as and when we visit > bits of the codebase (I get the impression the latter isn't happening due > to a desire to have mega patches). Same here and just to be clear, I'm for the gradual approach. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Splitting out GUI repo for WDQS
Hi! OK, GUI files moved to wikidata/query/gui and patches now have to be submitted against it. I imagine it will also be easier to set up testing there given that it's no longer dragging unrelated Java module around. Please tell me if anything doesn't work. Thanks, -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Ifexists across wikis
Hi! > I don't think there is a way to get a database name from an interwiki > prefix. Not a good/easy way, AFAIK. I've looked into it recently and the way current code does it is with a lot of ad-hoc stuff, external configs, hard-coded configs and special cases. I think this ticket: https://phabricator.wikimedia.org/T113034 aims to improve it. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Fwd: Announcing the release of the Wikidata Query Service
Hi! > When will be this type of functionality be available on-wiki? I have a Well, since it has REST endpoint (https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#SPARQL_endpoint) that can be called from JS or PHP or anything in between, technically anytime beginning now :) but I guess it depends on the meaning of "available". It is still beta, so some changes and tweaks and stability challenges possible, but otherwise it is ready for use. That said, I'm not sure how exactly wikis such as the English Wikipedia may be using this engine - that's something we need to start figuring out, now that we have the engine. > vague memory that some type of query service similar to this one was/is > intended to be used to allow arbitrary access to Wikidata data from/on > Wikimedia wikis such as the English Wikipedia. Well, technically query engine is not direct access to Wikidata data - it's a copy of the data, not the actual data, and also for some accesses - such as single property/label value retrieval - it may be more efficient to use the direct access APIs that Wikidata is enabling right now. But for more complex requests I think that is the engine that would be supporting that. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Tools for dealing with citations of withdrawn academic journal articles
Hi! This project sounds like a good idea, but I don't really understand how it would work as a tool. There's no API for retracted journal articles. It seems like the best way to handle it would be when you find out about a retracted journal article to just search Wikipedia for the title of the article. What would a tool for this look like and how would it be more efficient that just searching? I think maybe DOI (https://en.wikipedia.org/wiki/Digital_object_identifier) might be useful there, as many article references are referred by DOI (as an identifyable template parameter) and it may be more precise indicator than just name. Not sure which way is the best to search for it though. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] 2015-07-29 Scrum of Scrums notes
Hi! I saw that a talk done by Discovery was mentioned in these notes. If this was a reference to the talk that Moiz and I gave, we are fortunate that Andrew Lih recorded it, and made it publicly available: https://archive.org/details/videoeditserver-102 Thanks! I've added the link to the notes. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Min php version
Hi! What I'm answering is the proposal that removing support for PHP 5.3 will motivate the user to upgrade their PHP, when that isn't the case. It may not motivate them to upgrade their PHP if their hosting can not provide that, but it will motivate them to upgrade their hosting, if the hosting refuses to upgrade their PHP. Hosting is so commoditized now that I don't believe one can't find a dozen of PHP hosters literally in seconds. And most hosters already support multiple PHP versions anyway. I recall this has been the conclusion reached on this list previously - that this will cause problems for MW out in the world, and gain it an unwarranted reputation for insecurity as un-upgradeable installations get pwned. Thus, if newer MW still supports older PHP, this results in less pwned MW. The balance is up to you, of course. I have hard time buying this argument. If it were true, the strategy of doing version upgrades and phasing out old version support would not survive, or at least would be very rare among software vendors, while in fact most software platform vendors are doing exactly that - phasing out old versions and requiring upgrading to new versions, all the time, both in open source and proprietary world. Yet I don't remember any of the vendors gaining reputation of particularly insecure product because of such upgrade strategy. I do not see why MW would be an exception. I think most people that have business talking about security and evaluating which product is secure and which is not can distinguish the case of product being flawed from the case of somebody running an ancient version of the software and never upgrading. Maybe I'm too optimistic, but I also think solving an education problem by never educating and staying on ancient versions out of fear that uneducated FUD may hurt our reputation does not sound like a winning strategy for me. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Min php version
Hi! Hi php 5.5 is still probly years from being minium in mediawiki due to so many users use php 5.3 and 5.4. The users should really seriously consider upgrading. 5.3 is EOL for a year now (which means, not even security fixes for a year) and 5.4 is going EOL in 2 months. If any of these sites are public-facing (and due to the nature of wikis, many of them, to some measure, are), running out-of-support software may not be that good an idea. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] First impression with Differential
Hi! I don't agree, or I don't understand, Test Plan: is fantastic and I've wanted it in commit messages for years. Thinking about how to test is essential to writing good code, and a reminder to express your thought I agree completely. I am in no way against having Test plan. What I am against - and in this instance as in others - is replacing person's judgement with a mechanism, and forcing to crate a fake test plan where one doesn't make sense or can not be provided. If someone writes Test Plan: whatever, they'll get -1s until they respect reviewers enough to write Test Plan: trivial untested fix in an area that lacks tests. Sometimes they would. Sometimes it's fixing teh in the README, and writing an essay about how this does not need testing is redundant. Everybody understand why it doesn't need testing - so filling in mandatory fields would be time spent not doing something productive but working around the system that was set up wrong. Admittedly, it would be a small time, but this things add up. And, on more principled direction, it just should not happen if we can avoid it. We can solve it very easily - just not make that field mandatory. As you correctly explained, if that fix does need the test, it'll be -1-ed until it has the tests, and this system seems to be working so far. I think placing trust in sound judgement of the developers is the best. In the last week my instance crashed on two checkins that were +2'd but never run; both useful improvements to MediaWiki-Vagrant roles that looked completely legit. I appreciate people writing needed code and others That will always happen. There's no way to prevent it completely. Any setup that promises this will never happen again is just a delusion - every software has bugs, and sooner or later you'll encounter one of them. That could also happen with Test plan - nobody guaranteed it wouldn't say everything fine, and being +2ed, and still break down. reviewing it promptly, but typing: Test Plan: I didn't test this patch at That I have no objections to. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] First impression with Differential
Hi! Test Plan is required. Sounds like a good practice to me. Worst case scenario, type I didn't test this patch at all. I think it's not a good idea, as it trains people to do the opposite thing of what it's intended to train people for. I.e. mandatory, but put 'whatever' there IMHO is worse than non-mandatory but supported by common consensus. If we're setting up a new system, we shouldn't set it up so we constantly work around it. Neither should you, that is the point of code review. Then again, if there In theory, this is true. In practice, there are numerous occasions where people have to self-+2 - typoes, forgotten files, CI glitches, rebases, etc. Well, ok, have to is a strong word here - all of it can be worked around by dragging in somebody and asking them please +2 this - but again, that would be working against the setup and also training people that the system sucks and they have to work around it to be effective. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] iconv/mb_convert expert needed to advise on a patch for T39665
Hi! Don't have a good solution, but some ideas: 1. There's http://php.net/manual/en/class.uconverter.php which uses ICU convertor. It can recognize tons of charsets/encodings (http://site.icu-project.org/charts/charset) and can filter out bad characters, though the way to achieve it may be a bit tricky. E.g.: ?php class MyConverter extends UConverter { function toUCallback( $reason , $source , $codeUnits , $error) { $error = 0; return null; } } $c = new MyConverter(UTF-8, utf-8); var_dump($c-convert(aa\xC3\xC3\xC3\xB8aa)); (there might be a better way, it's just quick-n-dirty example). Con: while it's supported by hhvm, it's PHP 5.5+. Can be backported probably quite easily. 2. recode - http://php.net/manual/en/ref.recode.php. This: var_dump( recode(UTF-8..UTF-16,UTF16..UTF-8, aa\xC3\xC3\xC3\xB8aa) ); seems to work fine. Not sure how is recode support for hhvm. 3. Patch libmbfl to add more aliases and missing encodings. Shouldn't be very hard though I'm not sure what is the policy about patching PHP/HHVM here. 4. Implement ezy...@php.net's suggestion at working around the glibs mess by adopting code from http://code.woboq.org/userspace/glibc/iconv/iconv_prog.c.html. Again, that would require custom patch for PHP, not sure about hhvm. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] For the Sake of Simplicity
Hi! This leads to an interesting marketing possibility, and one that I have seen in action only a few times: the idea that a subsequent release of a product might be smaller or have fewer features than the previous version, and that this property should be considered a selling point. Perhaps the market is not mature enough to accept it yet, but it remains a promising and classic ideal — less is more. Apple is doing it from time to time, and is not shy about it. I'm not sure I personally am a big fan, but it works for many people. Another interesting take on the same topic from ESR: http://esr.ibiblio.org/?p=6737 -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Thumbnail image hinting in articles via metadata tags
Hi! Would it be feasible to include an og:image tag on pages for which we have a reasonable guess as to the thumbnail? Open Graph[3] is supported by what seems anecdotally to me to be a wide range of services, so good hints there would improve thumbnails for links on not just reddit, but Facebook, Twitter, various chat clients, I think several Wordpress plugins, etc. https://phabricator.wikimedia.org/T8 I wonder if this can somehow be connected to Wikidata's image attribute (https://www.wikidata.org/wiki/Property:P18). -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GPL upgrading to version 3
Hi! And is also infeasible. For a web service. GPL is effectively weak copyleft already; I think that's quite weak enough. (As I noted, there is no actual evidence that permissive licenses secure more This is very plausible, as the decision to contribute is rarely driven by the license as a primary factor - you don't say here's random GPL-licensed project, I don't know anything about its domain, language, goals, community, status or needs, but I feel compelled to contribute because it's GPL! - or at least, most people won't say that. As long as the license is not completely un-acceptable, I would assume other factors would dominate such decision. However, I know cases where I personally had to write code or otherwise work around GPL libraries because of license incompatibility with other open-source projects. That, of course, can be also counted as more contributions but I don't think that's what you meant :) contributions than copyleft, and some evidence the other way; despite Out of curiosity, what evidence you mean? fans of permissive licenses repeating the claims ad nauseam over the last fifteen years, they're notably short on examples.) You must already know examples of successful projects under permissive licenses. So you probably seeking the examples of why permissive license solicits _more_ contributions that if the same project was under GPL. Such example would require a rather rare occurrence of a project changing the license while at mature stage and measuring the contributions before and after the license change, otherwise we'd be comparing apples to oranges. My personal opinion is, as I described above, that license doesn't matter too much provided it's not unacceptably restrictive. Thus, for me looking for such examples would be a waste of time :) -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GPL upgrading to version 3
Hi! This entire conversation is a bit disappointing, mainly because I am a supporter of the free software movement, and like to believe that users should have a right to see the source code of software they use. Obviously not everybody feels this way and not everybody is going to support the free software movement, but I can assure you I personally have no plans on contributing to any WMF project that is Apache licensed, but at the very least MediaWiki core is still GPLv2, even if it makes things a bit more difficult. You seem to be equating access to source code with GPL, which IMHO is a very narrow view of the world. The open source world is much wider than GPL (even though nobody can deny that GPL projects are a substantial part of it), and there are many successful, widely acclaimed and widely used software projects which are open source and not GPL. Of course, the choice where to contribute and on which condition is entirely yours, but I *personally* would view such stance as somewhat counterproductive, if your goal is to contribute to the world's repository of high quality software that can be accessed to everyone. -- Stas Malyshev smalys...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l