Out of interest, my regex was pmc\s*\=\s*(.*?)[\|\}]
and then also pmid\s*\=\s*(.*?)[\|\}] with ignorecase flag set on. Make a great day, Max Klein ‽ http://notconfusing.com/ On Wed, Oct 22, 2014 at 12:48 PM, Aaron Halfaker <[email protected]> wrote: > Hey folks, > > Somehow I missed this thread, but I've already addressed this request on > the Village Pump[1]. See: > > See. > http://datasets.wikimedia.org/public-datasets/enwiki/etc/pmids.articles.20141008.tsv > > > I extracted PMIDs with the following regex: /\bpmid *= *[0-9]+\b/i > > It includes page_id, page_namespace, page_title, rev_id (most recent), > pmid in TAB separated values. > > Let me know if you have questions or if you think the regex matching > strategy is insufficient. It's pretty quick to take another pass. > > 1. > https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Extracting_PMIDs > > On Wed, Oct 22, 2014 at 1:27 PM, Maximilian Klein <[email protected]> > wrote: > >> Jake, >> I have script that does this already for DOIs, Its was one-line change to >> make. These files should answer what you were looking for. >> >> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt >> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txt >> >> In the future you can tell them to use halfak's >> https://pythonhosted.org/mediawiki-utilities/ >> This is the code I used to get those lists. >> https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1da3ff135fd8c520 >> >> Make a great day, >> Max Klein ‽ http://notconfusing.com/ >> >> On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <[email protected]> >> wrote: >> >>> Jake, >>> >>> Yes, its a rather straightforward parse based on the citation format >>> which Jeremy described. Doc James and I already have this coded up for a >>> soon to be published [[WP:MED]] readership/editorship paper. >>> >>> Searching for PMID's in the entirety of the Wikipedia article base would >>> be a bit time consuming -- but if one needs to pull down only articles in >>> WikiProject Medicine, for example, I am also able to help on that front. >>> >>> Perhaps we'll take this offline, but if anyone else is interested in the >>> dirty details, feel free to contact one of us off-list. -AW >>> >>> -- >>> Andrew G. West, PhD >>> http://www.andrew-g-west.com >>> >>> >>> >>> On 10/20/2014 11:57 PM, Jake Orlowitz wrote: >>> >>>> Hi folks, >>>> >>>> Relaying a question from a Stanford medical researcher: >>>> >>>> "Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs >>>> from Wiki references? Furthermore, could you dump those IDs out into a >>>> list for analysis?" >>>> >>>> Best, >>>> Jake Orlowitz (Ocaasi) >>>> >>>> >>>> _______________________________________________ >>>> Wiki-research-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>> >>>> >>> >>> _______________________________________________ >>> Wiki-research-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >> >> >> _______________________________________________ >> Wiki-research-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> >> > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > >
_______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
