Re: [Wikitech-l] mwdumper ERROR Duplicate entry

2009-01-15 Thread Daniel Kinzler
Dawson schrieb:
 Hello,
 
 I have used Special:Export at en.wikipedia to export  
 Diabetes_mellitus and ticked the box include templates (I'm only  
 really after the templates).
 
 The resulting XML file is 40.1mb so I decided to go with mwdumper.js  
 rather than Special:Import.
 
 I'm working on a fresh build of mediawiki on my local system. When  
 running the command:
 
 java -jar mwdumper.jar --format=sql:1.5 Wikipedia-20090113203939.xml |  
 mysql -u root -p wiki
 
 It is returning the following error:
 
 1 pages (0.102/sec), 1,000 revs (102.062/sec)
 ERROR 1062 (23000) at line 99: Duplicate entry '45970' for key 1

This happens when the XML dump contains the same page twice (or was it the same
revision, even?). Which shouldn't happen. And if it happens, mwdumper shouldn't
crash and burn.

I don't know a goos way around this, really, sorry. The question is: *why* does
the dump include the same page twice? Is that legal in terms of the dump format?
If yes, why can't mwdumper cope with it?

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] mwdumper ERROR Duplicate entry

2009-01-15 Thread Dawson
I figured I would go into the XML and manually remove the offending  
duplicate page/revision, but couldn't find it.

I have gone from top to bottom of the XML file and find no template  
information, even though include templates was ticked.

I know it's a lot to ask, but could you take a quick look Daniel? 
http://dawson.md/Wikipedia-20090113203939.xml.zip 
  (XML/1.9mb)

Basically, I'm working on a wiki project that stores information about  
diseases and I just want to use wikipedia's Template:Infobox_Disease.  
I tried to download it manually and all associated templates and  
transcended template files but this was just too complicated and would  
of taken forever. Someone on the list suggested I use Special:Export  
and tick the include templates box. This is where I'm now up to.

All suggestions/help welcomed.

Thank you, Dawson

On 15 Jan 2009, at 12:22, Daniel Kinzler wrote:

 Dawson schrieb:
 Hello,

 I have used Special:Export at en.wikipedia to export
 Diabetes_mellitus and ticked the box include templates (I'm only
 really after the templates).

 The resulting XML file is 40.1mb so I decided to go with mwdumper.js
 rather than Special:Import.

 I'm working on a fresh build of mediawiki on my local system. When
 running the command:

 java -jar mwdumper.jar --format=sql:1.5  
 Wikipedia-20090113203939.xml |
 mysql -u root -p wiki

 It is returning the following error:

 1 pages (0.102/sec), 1,000 revs (102.062/sec)
 ERROR 1062 (23000) at line 99: Duplicate entry '45970' for key 1

 This happens when the XML dump contains the same page twice (or was  
 it the same
 revision, even?). Which shouldn't happen. And if it happens,  
 mwdumper shouldn't
 crash and burn.

 I don't know a goos way around this, really, sorry. The question is:  
 *why* does
 the dump include the same page twice? Is that legal in terms of the  
 dump format?
 If yes, why can't mwdumper cope with it?

 -- daniel

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] mwdumper ERROR Duplicate entry

2009-01-15 Thread Dawson
Hello Roan,

I did try this but it only occurs once:

 revision
   id45970/id
   timestamp2002-03-17T04:46:17Z/timestamp
   contributor
 usernameRedmist/username
 id307/id
   /contributor
   minor/
   comment*/comment
   text xml:space=preserveSee [[Diabetes]]./text
 /revision

Feel free to checkout  
http://dawson.md/Wikipedia-20090113203939.xml.zip(XML/1.9mb) 
  and see my last reply.

Thanks, Dawson

On 15 Jan 2009, at 12:49, Roan Kattouw wrote:

 Dawson schreef:
 I figured I would go into the XML and manually remove the offending
 duplicate page/revision, but couldn't find it.

 I have gone from top to bottom of the XML file and find no template
 information, even though include templates was ticked.
 How about searching for 45970, which is the duplicate ID mwdumper
 complained about?

 Roan Kattouw (Catrope)

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] mwdumper ERROR Duplicate entry

2009-01-15 Thread Dawson
Solution:

With the file now being only 280kb I can use Special:Import instead of  
mwdumper.jar, which works as expected:

* All revisions were previously imported.

Import finished! 

So this was a problem with mwdumper *shrug*, oh well.

Thanks for all your help, Dawson

On 15 Jan 2009, at 12:22, Daniel Kinzler wrote:

 Dawson schrieb:
 Hello,

 I have used Special:Export at en.wikipedia to export
 Diabetes_mellitus and ticked the box include templates (I'm only
 really after the templates).

 The resulting XML file is 40.1mb so I decided to go with mwdumper.js
 rather than Special:Import.

 I'm working on a fresh build of mediawiki on my local system. When
 running the command:

 java -jar mwdumper.jar --format=sql:1.5  
 Wikipedia-20090113203939.xml |
 mysql -u root -p wiki

 It is returning the following error:

 1 pages (0.102/sec), 1,000 revs (102.062/sec)
 ERROR 1062 (23000) at line 99: Duplicate entry '45970' for key 1

 This happens when the XML dump contains the same page twice (or was  
 it the same
 revision, even?). Which shouldn't happen. And if it happens,  
 mwdumper shouldn't
 crash and burn.

 I don't know a goos way around this, really, sorry. The question is:  
 *why* does
 the dump include the same page twice? Is that legal in terms of the  
 dump format?
 If yes, why can't mwdumper cope with it?

 -- daniel

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Template Special:Export/Import

2009-01-15 Thread Roan Kattouw
Dawson schreef:
 Hello,

 I have done a Special:Export latest revision of 
 http://en.wikipedia.org/w/index.php?title=Diabetes_mellitus 
   including templates, and copied:

 {{Infobox Disease
   | Name   = TestSMW
   | Image  =
   | Caption=
   | DiseasesDB =
   | ICD10  = {{ICD10|Group|Major|minor|LinkGroup|LinkMajor}}
   | ICD9   = 00
   | ICDO   =
   | OMIM   =
   | MedlinePlus=
   | eMedicineSubj  =
   | eMedicineTopic =
   | MeshID =
 }}

 Into my test page http://wiki.medicalstudentblog.co.uk/index.php/ 
 TestSMW -- However as you can see, it comes out all garbaged. Can  
 anyone advise? I should now have all the templates from the export/ 
 import, perhaps I'm missing some other extension(s)?
You're missing the ParserFunctions extension.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Is the page status in the database?

2009-01-15 Thread Eugenio Tacchini
Hello, thanks for your reply.

At 17.18 15/01/2009 +0100, you wrote:
 On Thu, Jan 15, 2009 at 5:00 PM, Eugenio Tacchini 
euge...@favoriti.it wrote:
  Hello everybody,
  I'm looking, for academic research purposes, for the status of
  wikipedia pages. For status I mean:
  - stub
 revision.rev_len

This is the length of the revision, isn't it? Not a stub flag.

  - normal
 pretty obvious - everything above stub level
  - good article
  - featured
 templatelinks (maybe, not sure!!)

Ok I will have a look to that table.

Thanks.

Eugenio


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Is the page status in the database?

2009-01-15 Thread Aryeh Gregor
On Thu, Jan 15, 2009 at 11:54 AM, Eugenio Tacchini euge...@favoriti.it wrote:
 Thanks for yor reply.

 I don't need a generale measure but I need the status for each single
 page; as far as I have seen probably the only solution is to look at
 the corresponding templates, maybe via the table marco suggested me.

Yes, the way you want to do this is checking templatelinks.  This is
how disambiguations are checked in the software, and it could be used
for stubs and so on too.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l