Hi,

On Sat, Dec 12, 2009 at 4:35 PM, David Gerard <[email protected]> wrote:

>
> 2009/12/11 Behrang Saeedzadeh <[email protected]>:
> > Hi,
> >
> > I have downloaded enwiki-latest-all-titles-in-ns0.gz and I want to
> extract
> > main titles and store them in another file. For example, some titles have
> > meta information (e.g. disambiguation etc.) and I want these to be
> removed.
> > Can I remove all the text between parentheses from the titles to achieve
> > this?
> >
>
You have to parse it by hand.


> > Also some titles start with the "!" character. and some are enclosed
> between
> > two or three of them such as !Adiso_Amigos!. What is the purpose of "!"
> in
> > such cases?

It's part of the topic's name (in case of <
http://en.wikipedia.org/wiki/%C2%A1Adios_Amigos!>, the band's name). The
reverse exclamation mark is part of the Spanish language.

> > Also why some titles are enclosed between two double quotes such
> > as "400_Years_of_Telescope"?
>
Same case: The " are part of the topic's name (e.g. <
http://en.wikipedia.org/wiki/%22Weird_Al%22_Yankovic>).

Marco

PS: Next time, please do correct copy&paste so people have a chance to see
what you want. Both your supplied examples had to be corrected, the second
one was missing a "the": <http://en.wikipedia.org/wiki/
"400_Years_of_the_Telescope">


-- 
VMSoft GbR
Nabburger Str. 15
81737 München
Geschäftsführer: Marco Schuster, Volker Hemmert
http://vmsoft-gbr.de
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to