Re: [Wikitech-l] Fwd: Reg. Research using Wikipedia

Paul Houle Thu, 10 Mar 2011 06:29:50 -0800

  On 3/10/2011 3:46 AM, David Gerard wrote:
> feel the program takes 71 days to finish all the 3.1 million article titles.
> Is there anyway, our university IP address will be given permission or
> sending a official email from our department head to Wikipedia Server
> administrator to consider that the program, I run from this particular
> IP address is not any attack. so, our administrator allows us to do
> faster request like 0.5 sec. So, I can finish my experiment within 35
> days.
> expecting your positive reply
> regards
> Ramesh
>
     I can say,  positively,  that you'll get the job done faster by 
downloading the dump file and cracking into it directly.  I've got 
scripts that can download and extract stuff from the XML dump in an hour 
or so.  I still have some processes that use the API,  but I'm 
increasingly using the dumps because it's faster and easier.


     Note that many facts about Wikipedia topics have already been 
extracted by DBpedia and Freebase.  These are complimentary,  and if 
you're interested in getting results,  you should use both.  DBpedia has 
some things that aren't in Freebase,  such as Wikipedia's link graph and 
redirects,  but Freebase has a type system with 2x better recall for 
many of the prevalent types.

      You might find that DBpedia + Freebase has the information you 
need.  And if it doesn't,  you'll still find it's a useful 'guidance 
control' system for anything you're doing with Wikipedia data.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fwd: Reg. Research using Wikipedia

Reply via email to