you have several options
1-use regex e.g.:
import re, codecs
site=wikipedia.getSite()
f=codecs.open("file.txt","r","utf-8")
R=re.compile("\{\{(.+?)\}\}") #or other types of regex
for name in R.findall(f.read()):
page=wikipedia.Page(site,name)
#do whatever you like with the page
2- use readlines:
import codecs
site=wikipedia.getSite()
f=codecs.open("file.txt","r","utf-8")
for line in f.readlines():
line=line.replace("\n","").replace("\r","")
name=line.split(":")[0] #or any kind that you like to get the title
page=wikipedia.Page(site,name)
#do whatever you like with the page
for not loading the whole file, I don't think it's possible or simply you
can read it, save it to so other variables or files and close it (e.g.
f.close())
Best
On Sun, Dec 1, 2013 at 1:26 PM, Mathieu Stumpf <
[email protected]> wrote:
> Hello,
>
> I want to add esperanto words to fr.wiktionary using as input a file
> where each line have the format "word:the fine definition". So I copied
> the basic.py, and started hacking it to achieve my goal.
>
> Now, it's seems like the -file argument expect a file where each line is
> formated as "[[Article name]]". Of course I can just create a second
> input file, and read both in parallel, so I feed the genFactory with the
> further, and use the second to build the wiktionary entry. But maybe you
> could give me a hint on how can I write a generator that can feed a
> pagegenerators.GeneratorFactory() without creating a "miror file" and
> without loading the whole file in the main memory.
>
> Kind regards,
> Mathieu
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Amir
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l