Dear Ben Sidi Ahmed, DBpedia might have all the data you need, already extracted (also in over 80 languages): http://wiki.dbpedia.org/Downloads37
Here are all first 2 sentences of each article in a structured format: http://downloads.dbpedia.org/3.7/en/short_abstracts_en.nt.bz2 Here is the first abstract: http://downloads.dbpedia.org/3.7/en/long_abstracts_en.nt.bz2 If you just want them for single articles you can also query the DBpedia API: The first 2 sentences for London (all languages) : http://dbpedia.org/snorql/?query=SELECT+*+WHERE+{%0D%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FLondon%3E+rdfs%3Acomment+%3Fshort_abstract+.%0D%0A} The first 2 sentences for London (only English ): http://dbpedia.org/snorql/?query=SELECT+*+WHERE+{%0D%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FLondon%3E+rdfs%3Acomment+%3Fshort_abstract+.%0D%0AFILTER+%28lang%28%3Fshort_abstract%29%3D%22en%22%29%0D%0A} All that contain the keyword "London": http://dbpedia.org/snorql/?query=SELECT+*+WHERE+{%0D%0A%3Fs+rdfs%3Acomment+%3Fshort_abstract+.%0D%0AFILTER+%28lang%28%3Fshort_abstract%29%3D%22en%22%29%0D%0AFILTER+%28bif%3Acontains%28%3Fshort_abstract%2C+%22London%22%29+%29%0D%0A} You can also query them on a synchronized database (which gets updates every 5 minutes from Wikipedia): http://live.dbpedia.org/ Hope that helps, Sebastian On 11/27/2011 06:02 PM, Khalida BEN SIDI AHMED wrote: > Hello! > I don't know if the subject of this question belongs to the scope of this > group. Anyway, I will be pleased if I find an aswer to my question. > I'm writing some Java code in order to realize NLP tasks upon texts using > Wikipedia. What can I do in order to extract the first paragraph of a > Wikipedia article? Thanks a lot. > > Truly yours > Ben Sidi Ahmed > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Projects: http://nlp2rdf.org , http://dbpedia.org Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
