Re: [Wikitech-l] mwdumper ERROR Duplicate entry
Dawson schrieb: Hello, I have used Special:Export at en.wikipedia to export Diabetes_mellitus and ticked the box include templates (I'm only really after the templates). The resulting XML file is 40.1mb so I decided to go with mwdumper.js rather than Special:Import. I'm working on a fresh build of mediawiki on my local system. When running the command: java -jar mwdumper.jar --format=sql:1.5 Wikipedia-20090113203939.xml | mysql -u root -p wiki It is returning the following error: 1 pages (0.102/sec), 1,000 revs (102.062/sec) ERROR 1062 (23000) at line 99: Duplicate entry '45970' for key 1 This happens when the XML dump contains the same page twice (or was it the same revision, even?). Which shouldn't happen. And if it happens, mwdumper shouldn't crash and burn. I don't know a goos way around this, really, sorry. The question is: *why* does the dump include the same page twice? Is that legal in terms of the dump format? If yes, why can't mwdumper cope with it? -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] mwdumper ERROR Duplicate entry
I figured I would go into the XML and manually remove the offending duplicate page/revision, but couldn't find it. I have gone from top to bottom of the XML file and find no template information, even though include templates was ticked. I know it's a lot to ask, but could you take a quick look Daniel? http://dawson.md/Wikipedia-20090113203939.xml.zip (XML/1.9mb) Basically, I'm working on a wiki project that stores information about diseases and I just want to use wikipedia's Template:Infobox_Disease. I tried to download it manually and all associated templates and transcended template files but this was just too complicated and would of taken forever. Someone on the list suggested I use Special:Export and tick the include templates box. This is where I'm now up to. All suggestions/help welcomed. Thank you, Dawson On 15 Jan 2009, at 12:22, Daniel Kinzler wrote: Dawson schrieb: Hello, I have used Special:Export at en.wikipedia to export Diabetes_mellitus and ticked the box include templates (I'm only really after the templates). The resulting XML file is 40.1mb so I decided to go with mwdumper.js rather than Special:Import. I'm working on a fresh build of mediawiki on my local system. When running the command: java -jar mwdumper.jar --format=sql:1.5 Wikipedia-20090113203939.xml | mysql -u root -p wiki It is returning the following error: 1 pages (0.102/sec), 1,000 revs (102.062/sec) ERROR 1062 (23000) at line 99: Duplicate entry '45970' for key 1 This happens when the XML dump contains the same page twice (or was it the same revision, even?). Which shouldn't happen. And if it happens, mwdumper shouldn't crash and burn. I don't know a goos way around this, really, sorry. The question is: *why* does the dump include the same page twice? Is that legal in terms of the dump format? If yes, why can't mwdumper cope with it? -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] mwdumper ERROR Duplicate entry
Hello Roan, I did try this but it only occurs once: revision id45970/id timestamp2002-03-17T04:46:17Z/timestamp contributor usernameRedmist/username id307/id /contributor minor/ comment*/comment text xml:space=preserveSee [[Diabetes]]./text /revision Feel free to checkout http://dawson.md/Wikipedia-20090113203939.xml.zip(XML/1.9mb) and see my last reply. Thanks, Dawson On 15 Jan 2009, at 12:49, Roan Kattouw wrote: Dawson schreef: I figured I would go into the XML and manually remove the offending duplicate page/revision, but couldn't find it. I have gone from top to bottom of the XML file and find no template information, even though include templates was ticked. How about searching for 45970, which is the duplicate ID mwdumper complained about? Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] mwdumper ERROR Duplicate entry
Solution: With the file now being only 280kb I can use Special:Import instead of mwdumper.jar, which works as expected: * All revisions were previously imported. Import finished! So this was a problem with mwdumper *shrug*, oh well. Thanks for all your help, Dawson On 15 Jan 2009, at 12:22, Daniel Kinzler wrote: Dawson schrieb: Hello, I have used Special:Export at en.wikipedia to export Diabetes_mellitus and ticked the box include templates (I'm only really after the templates). The resulting XML file is 40.1mb so I decided to go with mwdumper.js rather than Special:Import. I'm working on a fresh build of mediawiki on my local system. When running the command: java -jar mwdumper.jar --format=sql:1.5 Wikipedia-20090113203939.xml | mysql -u root -p wiki It is returning the following error: 1 pages (0.102/sec), 1,000 revs (102.062/sec) ERROR 1062 (23000) at line 99: Duplicate entry '45970' for key 1 This happens when the XML dump contains the same page twice (or was it the same revision, even?). Which shouldn't happen. And if it happens, mwdumper shouldn't crash and burn. I don't know a goos way around this, really, sorry. The question is: *why* does the dump include the same page twice? Is that legal in terms of the dump format? If yes, why can't mwdumper cope with it? -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Template Special:Export/Import
Dawson schreef: Hello, I have done a Special:Export latest revision of http://en.wikipedia.org/w/index.php?title=Diabetes_mellitus including templates, and copied: {{Infobox Disease | Name = TestSMW | Image = | Caption= | DiseasesDB = | ICD10 = {{ICD10|Group|Major|minor|LinkGroup|LinkMajor}} | ICD9 = 00 | ICDO = | OMIM = | MedlinePlus= | eMedicineSubj = | eMedicineTopic = | MeshID = }} Into my test page http://wiki.medicalstudentblog.co.uk/index.php/ TestSMW -- However as you can see, it comes out all garbaged. Can anyone advise? I should now have all the templates from the export/ import, perhaps I'm missing some other extension(s)? You're missing the ParserFunctions extension. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Is the page status in the database?
Hello, thanks for your reply. At 17.18 15/01/2009 +0100, you wrote: On Thu, Jan 15, 2009 at 5:00 PM, Eugenio Tacchini euge...@favoriti.it wrote: Hello everybody, I'm looking, for academic research purposes, for the status of wikipedia pages. For status I mean: - stub revision.rev_len This is the length of the revision, isn't it? Not a stub flag. - normal pretty obvious - everything above stub level - good article - featured templatelinks (maybe, not sure!!) Ok I will have a look to that table. Thanks. Eugenio ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Is the page status in the database?
On Thu, Jan 15, 2009 at 11:54 AM, Eugenio Tacchini euge...@favoriti.it wrote: Thanks for yor reply. I don't need a generale measure but I need the status for each single page; as far as I have seen probably the only solution is to look at the corresponding templates, maybe via the table marco suggested me. Yes, the way you want to do this is checking templatelinks. This is how disambiguations are checked in the software, and it could be used for stubs and so on too. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l