Aaron and Max! Thanks so much for your work on this. I have sent over the methods and data to Stanford and they're excited about incorporating this into a full study. You are awesome. Jake (Ocaasi)
On 10/22/14, [email protected] <[email protected]> wrote: > Send Wiki-research-l mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wiki-research-l digest..." > > > Today's Topics: > > 1. Re: Extracting PMIDs (Maximilian Klein) > 2. Re: Extracting PMIDs (Aaron Halfaker) > 3. Re: Extracting PMIDs (Maximilian Klein) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 22 Oct 2014 11:27:09 -0700 > From: Maximilian Klein <[email protected]> > To: Research into Wikimedia content and communities > <[email protected]>, Jake Orlowitz > <[email protected]> > Subject: Re: [Wiki-research-l] Extracting PMIDs > Message-ID: > <cakbmofi_rdgq+vh2mtwwpbobahrwlpw5dxbs9y+xmuga2vp...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Jake, > I have script that does this already for DOIs, Its was one-line change to > make. These files should answer what you were looking for. > > https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt > https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txt > > In the future you can tell them to use halfak's > https://pythonhosted.org/mediawiki-utilities/ > This is the code I used to get those lists. > https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1da3ff135fd8c520 > > Make a great day, > Max Klein ‽ http://notconfusing.com/ > > On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <[email protected]> > wrote: > >> Jake, >> >> Yes, its a rather straightforward parse based on the citation format which >> Jeremy described. Doc James and I already have this coded up for a soon to >> be published [[WP:MED]] readership/editorship paper. >> >> Searching for PMID's in the entirety of the Wikipedia article base would >> be a bit time consuming -- but if one needs to pull down only articles in >> WikiProject Medicine, for example, I am also able to help on that front. >> >> Perhaps we'll take this offline, but if anyone else is interested in the >> dirty details, feel free to contact one of us off-list. -AW >> >> -- >> Andrew G. West, PhD >> http://www.andrew-g-west.com >> >> >> >> On 10/20/2014 11:57 PM, Jake Orlowitz wrote: >> >>> Hi folks, >>> >>> Relaying a question from a Stanford medical researcher: >>> >>> "Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs >>> from Wiki references? Furthermore, could you dump those IDs out into a >>> list for analysis?" >>> >>> Best, >>> Jake Orlowitz (Ocaasi) >>> >>> >>> _______________________________________________ >>> Wiki-research-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >>> >> >> _______________________________________________ >> Wiki-research-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141022/af0ad9fa/attachment-0001.html> > > ------------------------------ > > Message: 2 > Date: Wed, 22 Oct 2014 14:48:32 -0500 > From: Aaron Halfaker <[email protected]> > To: Research into Wikimedia content and communities > <[email protected]> > Subject: Re: [Wiki-research-l] Extracting PMIDs > Message-ID: > <canqe2t-1pgbumoenwcbfdvvzthtv1cmykb-5bhedssijlc2...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hey folks, > > Somehow I missed this thread, but I've already addressed this request on > the Village Pump[1]. See: > > See. > http://datasets.wikimedia.org/public-datasets/enwiki/etc/pmids.articles.20141008.tsv > > > I extracted PMIDs with the following regex: /\bpmid *= *[0-9]+\b/i > > It includes page_id, page_namespace, page_title, rev_id (most recent), pmid > in TAB separated values. > > Let me know if you have questions or if you think the regex matching > strategy is insufficient. It's pretty quick to take another pass. > > 1. > https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Extracting_PMIDs > > On Wed, Oct 22, 2014 at 1:27 PM, Maximilian Klein <[email protected]> wrote: > >> Jake, >> I have script that does this already for DOIs, Its was one-line change to >> make. These files should answer what you were looking for. >> >> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt >> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txt >> >> In the future you can tell them to use halfak's >> https://pythonhosted.org/mediawiki-utilities/ >> This is the code I used to get those lists. >> https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1da3ff135fd8c520 >> >> Make a great day, >> Max Klein ‽ http://notconfusing.com/ >> >> On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <[email protected]> >> wrote: >> >>> Jake, >>> >>> Yes, its a rather straightforward parse based on the citation format >>> which Jeremy described. Doc James and I already have this coded up for a >>> soon to be published [[WP:MED]] readership/editorship paper. >>> >>> Searching for PMID's in the entirety of the Wikipedia article base would >>> be a bit time consuming -- but if one needs to pull down only articles in >>> WikiProject Medicine, for example, I am also able to help on that front. >>> >>> Perhaps we'll take this offline, but if anyone else is interested in the >>> dirty details, feel free to contact one of us off-list. -AW >>> >>> -- >>> Andrew G. West, PhD >>> http://www.andrew-g-west.com >>> >>> >>> >>> On 10/20/2014 11:57 PM, Jake Orlowitz wrote: >>> >>>> Hi folks, >>>> >>>> Relaying a question from a Stanford medical researcher: >>>> >>>> "Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs >>>> from Wiki references? Furthermore, could you dump those IDs out into a >>>> list for analysis?" >>>> >>>> Best, >>>> Jake Orlowitz (Ocaasi) >>>> >>>> >>>> _______________________________________________ >>>> Wiki-research-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>> >>>> >>> >>> _______________________________________________ >>> Wiki-research-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >> >> >> _______________________________________________ >> Wiki-research-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141022/fd25c173/attachment-0001.html> > > ------------------------------ > > Message: 3 > Date: Wed, 22 Oct 2014 15:06:39 -0700 > From: Maximilian Klein <[email protected]> > To: Research into Wikimedia content and communities > <[email protected]> > Subject: Re: [Wiki-research-l] Extracting PMIDs > Message-ID: > <CAKbmofjaf86Sz=63nu_vhpgoeqt_gx4d_2hfrzmftbjh8ra...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Out of interest, my regex was > > pmc\s*\=\s*(.*?)[\|\}] > > and then also > > pmid\s*\=\s*(.*?)[\|\}] > > > with ignorecase flag set on. > > Make a great day, > Max Klein ‽ http://notconfusing.com/ > > On Wed, Oct 22, 2014 at 12:48 PM, Aaron Halfaker <[email protected]> > wrote: > >> Hey folks, >> >> Somehow I missed this thread, but I've already addressed this request on >> the Village Pump[1]. See: >> >> See. >> http://datasets.wikimedia.org/public-datasets/enwiki/etc/pmids.articles.20141008.tsv >> >> >> I extracted PMIDs with the following regex: /\bpmid *= *[0-9]+\b/i >> >> It includes page_id, page_namespace, page_title, rev_id (most recent), >> pmid in TAB separated values. >> >> Let me know if you have questions or if you think the regex matching >> strategy is insufficient. It's pretty quick to take another pass. >> >> 1. >> https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Extracting_PMIDs >> >> On Wed, Oct 22, 2014 at 1:27 PM, Maximilian Klein <[email protected]> >> wrote: >> >>> Jake, >>> I have script that does this already for DOIs, Its was one-line change to >>> make. These files should answer what you were looking for. >>> >>> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt >>> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txt >>> >>> In the future you can tell them to use halfak's >>> https://pythonhosted.org/mediawiki-utilities/ >>> This is the code I used to get those lists. >>> https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1da3ff135fd8c520 >>> >>> Make a great day, >>> Max Klein ‽ http://notconfusing.com/ >>> >>> On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <[email protected]> >>> wrote: >>> >>>> Jake, >>>> >>>> Yes, its a rather straightforward parse based on the citation format >>>> which Jeremy described. Doc James and I already have this coded up for a >>>> soon to be published [[WP:MED]] readership/editorship paper. >>>> >>>> Searching for PMID's in the entirety of the Wikipedia article base would >>>> be a bit time consuming -- but if one needs to pull down only articles >>>> in >>>> WikiProject Medicine, for example, I am also able to help on that front. >>>> >>>> Perhaps we'll take this offline, but if anyone else is interested in the >>>> dirty details, feel free to contact one of us off-list. -AW >>>> >>>> -- >>>> Andrew G. West, PhD >>>> http://www.andrew-g-west.com >>>> >>>> >>>> >>>> On 10/20/2014 11:57 PM, Jake Orlowitz wrote: >>>> >>>>> Hi folks, >>>>> >>>>> Relaying a question from a Stanford medical researcher: >>>>> >>>>> "Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs >>>>> from Wiki references? Furthermore, could you dump those IDs out into a >>>>> list for analysis?" >>>>> >>>>> Best, >>>>> Jake Orlowitz (Ocaasi) >>>>> >>>>> >>>>> _______________________________________________ >>>>> Wiki-research-l mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Wiki-research-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>> >>> >>> >>> _______________________________________________ >>> Wiki-research-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >>> >> >> _______________________________________________ >> Wiki-research-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141022/d72b4c12/attachment.html> > > ------------------------------ > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > End of Wiki-research-l Digest, Vol 110, Issue 12 > ************************************************ > -- Jake Orlowitz Wikipedia: Ocaasi <http://enwp.org/User:Ocaasi> Facebook: Jake Orlowitz <http://www.facebook.com/jorlowitz> Twitter: JakeOrlowitz <https://twitter.com/JakeOrlowitz> LinkedIn: Jake Orlowitz<http://www.linkedin.com/profile/view?id=197604531> Email: [email protected] Skype: jorlowitz Cell: (484) 684-2104 Home: (484) 380-3940 _______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
