[Xmldatadumps-l] Re: "Experimental" Status of Enterprise HTML Dumps

2023-05-11 Thread Evan Lloyd New-Schmidt
> you might be able to get a question about it answered on the corresponding discussion page. Thanks Ariel, I'll ask over there. It's good to know there are no plans to change html backup schedule. - Evan ___ Xmldatadumps-l mailing list -- xmldatadump

[Xmldatadumps-l] Re: "Experimental" Status of Enterprise HTML Dumps

2023-05-11 Thread Evan Lloyd New-Schmidt
> From my experience working with the Wiktionary HTML dumps I can say that the data quality is quite poor: there are stale and missing entries (https://phabricator.wikimedia.org/T305407). Thank you Jan, that is very good to know. I'll follow that issue for updates. - Evan __

[Xmldatadumps-l] Re: "Experimental" Status of Enterprise HTML Dumps

2023-05-10 Thread Ariel Glenn WMF
Hello Evan, The Enterprise HTML dumps should be publicly available around the 22nd and the 3rd of each month, though there can be delays. We don't expect that to change any time soon. As to their content or the namespaces, I can't answer to that; someone from WIkimedia Enterprise will have to disc

[Xmldatadumps-l] Re: "Experimental" Status of Enterprise HTML Dumps

2023-05-08 Thread Jan Berkel
On Fri, 5 May 2023, at 22:53, Evan Lloyd New-Schmidt wrote: > Hi, I'm starting a project that will involve repeated processing of HTML > wikipedia articles. > > Using the enterprise dumps seems like it would be much simpler than > converting the XML dumps, but I don't know what the "experimental"