[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread ArielGlenn
ArielGlenn added a comment.

In T207030#4703693, @Smalyshev wrote:
Ah, ok, didn't see your comment - yes, we probably need to reduce or cancel small file check for lexemes, or eliminate empty shards. I am not sure how easy it is to do the latter - I am on vacation this week so I'd start with the former and go back to the latter after I'm back.


Note that eliminating empty shards won't fix the small size issue. Shard 0, on which the check failed, had content in one of the files, just not enough to make the cutoff.TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ArielGlennCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread ArielGlenn
ArielGlenn added a comment.
Job has not started yet so the change should have made it out in time for this week's run.TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ArielGlennCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread gerritbot
gerritbot added a comment.
Change 470447 merged by ArielGlenn:
[operations/puppet@production] Reduce small file size for lexemes

https://gerrit.wikimedia.org/r/470447TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread gerritbot
gerritbot added a comment.
Change 470447 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Reduce small file size for lexemes

https://gerrit.wikimedia.org/r/470447TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread Smalyshev
Smalyshev added a comment.
Ah, ok, didn't see your comment - yes, we probably need to reduce or cancel small file check for lexemes, or eliminate empty shards. I am not sure how easy it is to do the latter - I am on vacation this week so I'd start with the former and go back to the latter after I'm back.TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread Smalyshev
Smalyshev added a comment.
Small batches is normal - these are empty or semi-empty shards I guess. I wonder though why they are not proceeded to create full dump. Maybe small file check is not correct?TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread ArielGlenn
ArielGlenn added a comment.
root@snapshot1008:~# more /var/log/wikidatadump/dumpwikidatattl-wikidata-20181028-lexemes-BETA-main.log
File size of  is only 518223. Aborting.

The file size cutoff is 2000/8 = 250. So that's why no files wind up in the output directory.
Also, the error message was written expecting one temp file name (probably before there were shards):

echo "File size of $tempFile is only $fileSize. Aborting." >> $mainLogFileTASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ArielGlennCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread ArielGlenn
ArielGlenn added a comment.
Nope. Something else is wrong. I see no cronspam, no lexeme job running now on the snapshot host, this week's json job has started, but the 'latest' file is Oct 14. There is a 20181028 directory but it is empty.
There are a bunch of temp files left in /mnt/dumpsdata/xmldatadumps/temp/ with names like wikidatattl-lexemes.0-batch0.gz. Most are very tiny, 565 bytes. The ones with content are (listed from dumpsdata1001):

-rw-r--r-- 1 dumpsgen dumpsgen   9063 Oct 28 00:45 /data/xmldatadumps/temp/wikidatattl-lexemes.0-batch16.gz
-rw-r--r-- 1 dumpsgen dumpsgen 500120 Oct 28 00:46 /data/xmldatadumps/temp/wikidatattl-lexemes.0-batch17.gz
-rw-r--r-- 1 dumpsgen dumpsgen  16897 Oct 28 00:45 /data/xmldatadumps/temp/wikidatattl-lexemes.1-batch16.gz
-rw-r--r-- 1 dumpsgen dumpsgen 499111 Oct 28 00:46 /data/xmldatadumps/temp/wikidatattl-lexemes.1-batch17.gz
-rw-r--r-- 1 dumpsgen dumpsgen  10023 Oct 28 00:45 /data/xmldatadumps/temp/wikidatattl-lexemes.2-batch16.gz
-rw-r--r-- 1 dumpsgen dumpsgen 492772 Oct 28 00:46 /data/xmldatadumps/temp/wikidatattl-lexemes.2-batch17.gz
-rw-r--r-- 1 dumpsgen dumpsgen  12043 Oct 28 00:45 /data/xmldatadumps/temp/wikidatattl-lexemes.3-batch16.gz
-rw-r--r-- 1 dumpsgen dumpsgen 496777 Oct 28 00:46 /data/xmldatadumps/temp/wikidatattl-lexemes.3-batch17.gz
-rw-r--r-- 1 dumpsgen dumpsgen  11426 Oct 28 00:45 /data/xmldatadumps/temp/wikidatattl-lexemes.4-batch16.gz
-rw-r--r-- 1 dumpsgen dumpsgen 486041 Oct 28 00:46 /data/xmldatadumps/temp/wikidatattl-lexemes.4-batch17.gz
-rw-r--r-- 1 dumpsgen dumpsgen  18385 Oct 28 00:45 /data/xmldatadumps/temp/wikidatattl-lexemes.5-batch16.gz
-rw-r--r-- 1 dumpsgen dumpsgen 479445 Oct 28 00:46 /data/xmldatadumps/temp/wikidatattl-lexemes.5-batch17.gz
-rw-r--r-- 1 dumpsgen dumpsgen   9860 Oct 28 00:45 /data/xmldatadumps/temp/wikidatattl-lexemes.6-batch16.gz
-rw-r--r-- 1 dumpsgen dumpsgen 479984 Oct 28 00:46 /data/xmldatadumps/temp/wikidatattl-lexemes.6-batch17.gz
-rw-r--r-- 1 dumpsgen dumpsgen  15190 Oct 28 00:45 /data/xmldatadumps/temp/wikidatattl-lexemes.7-batch16.gz
-rw-r--r-- 1 dumpsgen dumpsgen 475630 Oct 28 00:46 /data/xmldatadumps/temp/wikidatattl-lexemes.7-batch17.gz

I am making a copy of all these files so they do not get removed on the next run; stashed in /data/temp/ariel/lexemes (/mnt/dumpsdata/temp/... if you are on a snapshot host).TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ArielGlennCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-15 Thread ArielGlenn
ArielGlenn added a comment.
This is now deployed on snapshot1008 (where cron jobs run). We'll know next Monday if this took care of the problem; let's leave the task open til then.TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ArielGlennCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-15 Thread gerritbot
gerritbot added a comment.
Change 467415 merged by ArielGlenn:
[operations/puppet@production] Fix lexeme error msgs

https://gerrit.wikimedia.org/r/467415TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-15 Thread Smalyshev
Smalyshev added a comment.
I think I found the bug. From what it looks like it shouldn't have influenced the dump.TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-15 Thread gerritbot
gerritbot added a comment.
Change 467415 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Fix lexeme error msgs

https://gerrit.wikimedia.org/r/467415TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Aklapper, Smalyshev, ArielGlenn, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, gnosygnu, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs