On 3/10/2011 3:46 AM, David Gerard wrote:
> feel the program takes 71 days to finish all the 3.1 million article titles.
> Is there anyway, our university IP address will be given permission or
> sending a official email from our department head to Wikipedia Server
> administrator to consider that the program, I run from this particular
> IP address is not any attack. so, our administrator allows us to do
> faster request like 0.5 sec. So, I can finish my experiment within 35
> days.
> expecting your positive reply
> regards
> Ramesh
>
I can say, positively, that you'll get the job done faster by
downloading the dump file and cracking into it directly. I've got
scripts that can download and extract stuff from the XML dump in an hour
or so. I still have some processes that use the API, but I'm
increasingly using the dumps because it's faster and easier.
Note that many facts about Wikipedia topics have already been
extracted by DBpedia and Freebase. These are complimentary, and if
you're interested in getting results, you should use both. DBpedia has
some things that aren't in Freebase, such as Wikipedia's link graph and
redirects, but Freebase has a type system with 2x better recall for
many of the prevalent types.
You might find that DBpedia + Freebase has the information you
need. And if it doesn't, you'll still find it's a useful 'guidance
control' system for anything you're doing with Wikipedia data.
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l