ArielGlenn added a comment.
I'd like to wait for the first run. I'll retitle the task then too :-)TASK DETAILhttps://phabricator.wikimedia.org/T178047EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ArielGlennCc: brion, gerritbot, hoo, ArielGlenn, Lahi, Gq86,
hoo added a comment.
@ArielGlenn Do we want to close this, yet? Or wait for the first new dumps?TASK DETAILhttps://phabricator.wikimedia.org/T178047EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ArielGlenn, hooCc: brion, gerritbot, hoo, ArielGlenn, Lahi,
gerritbot added a comment.
Change 416409 merged by jenkins-bot:
[mediawiki/extensions/ActiveAbstract@master] don't try to abstract things that aren't text or wikitext
https://gerrit.wikimedia.org/r/416409TASK DETAILhttps://phabricator.wikimedia.org/T178047EMAIL
ArielGlenn added a comment.
Email sent to xmldatadumps-l and wikitech-l.TASK DETAILhttps://phabricator.wikimedia.org/T178047EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ArielGlennCc: brion, gerritbot, hoo, ArielGlenn, Versusxo, Majesticalreaper22,
hoo added a comment.
In T178047#4097208, @ArielGlenn wrote:
Well I don't mind a waiting period, let's agree on... one week? It will probably take longer than that for it to get merged and rolled out anyways. But we need an eta before I send the email :-)
This is probably somewhere in between a
ArielGlenn added a comment.
Well I don't mind a waiting period, let's agree on... one week? It will probably take longer than that for it to get merged and rolled out anyways. But we need an eta before I send the email :-)TASK DETAILhttps://phabricator.wikimedia.org/T178047EMAIL
hoo added a comment.
In T178047#4091341, @ArielGlenn wrote:
Looking at this list I think we are good to go.
Definitely. I still think this should be announced, but given the very limited scope we might even get away without a waiting period before applying the change?TASK
ArielGlenn added a comment.
On ms1001 in the public dumps dir I did this:
list=*wik*
for dirname in $list; do echo "doing $dirname"; zcat "${dirname}/20180320/${dirname}-20180320-stub-articles.xml.gz" |grep -A16 '0' | grep '' | grep -v wikitext | grep -v wikibase-item | grep -v
brion added a comment.
In T178047#4073991, @ArielGlenn wrote:
In T178047#4073899, @brion wrote:
Not sure offhand about the schema; Yahoo's old documentation seems to have vanished from the net. (Probably on the wayback machine but I can't find a URL reference)
We don't have a schema in our
ArielGlenn added a comment.
In T178047#4073899, @brion wrote:
Not sure offhand about the schema; Yahoo's old documentation seems to have vanished from the net. (Probably on the wayback machine but I can't find a URL reference)
We don't have a schema in our repos anywhere that must be updated
brion added a comment.
Not sure offhand about the schema; Yahoo's old documentation seems to have vanished from the net. (Probably on the wayback machine but I can't find a URL reference)
Ideally, I think we'd want a way for the content handler to provide a text extract that can be used here.
ArielGlenn added a comment.
Does anyone know where the schema for these xml files lives? I've grepped around in mw core and in the abstract extension repos and found nothing.TASK DETAILhttps://phabricator.wikimedia.org/T178047EMAIL
ArielGlenn added a comment.
Actually, is this any different than having 'deleted="deleted"'as the attribute when a revision, contributor or comment is no longer available? AFAIK that's not a standard attribute or anything, it's just in our schema. Which reminds me, the change about needs to go
ArielGlenn added a comment.
Well, on wikidatawiki in beta, the new code generates a whole lot of as we expect; on other wikis it produces the usual output. So that looks good.
Now trying to find out about standard xml libraries.TASK DETAILhttps://phabricator.wikimedia.org/T178047EMAIL
ArielGlenn added a comment.
I've updated it accoring to your second suggestion (untested though). I prefer to have empty abstract tags in there rather than skip them completely. The file ought to compress down to something pretty tiny at least!TASK
hoo added a comment.
In T178047#4023267, @ArielGlenn wrote:
I'm tempted to just turn off abstracts for Wikidata altogether, since every item in there is a Qxxx with junk for the abstract.
If this is just NS0 (or content namespaces… which are all Wikibase entity namespaces), this definitely
ArielGlenn added a comment.
I'm tempted to just turn off abstracts for Wikidata altogether, since every item in there is a Qxxx with junk for the abstract. But your approach is better, in case similar content creeps into other projects. WHat do you think about
gerritbot added a comment.
Change 416409 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[mediawiki/extensions/ActiveAbstract@master] don't try to abstract things that aren't text or wikitext
https://gerrit.wikimedia.org/r/416409TASK
hoo added a comment.
Relevant code: https://github.com/wikimedia/mediawiki-extensions-ActiveAbstract/blob/master/AbstractFilter.php#L131
I'm not sure how Wikidata abstracts could be meaningful… I can make a (rather bold) suggestion to just drop an empty string in case we're dealing with non
19 matches
Mail list logo