Re: [Wiki-research-l] Wiki-research-l Digest, Vol 110, Issue 12

Jake Orlowitz Wed, 22 Oct 2014 15:11:10 -0700

Aaron and Max! Thanks so much for your work on this.  I have sent over
the methods and data to Stanford and they're excited about
incorporating this into a full study.  You are awesome.  Jake (Ocaasi)


On 10/22/14, [email protected]
<[email protected]> wrote:
> Send Wiki-research-l mailing list submissions to
>       [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>       https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
>       [email protected]
>
> You can reach the person managing the list at
>       [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
>    1. Re: Extracting PMIDs (Maximilian Klein)
>    2. Re: Extracting PMIDs (Aaron Halfaker)
>    3. Re: Extracting PMIDs (Maximilian Klein)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 22 Oct 2014 11:27:09 -0700
> From: Maximilian Klein <[email protected]>
> To: Research into Wikimedia content and communities
>       <[email protected]>, Jake Orlowitz
>       <[email protected]>
> Subject: Re: [Wiki-research-l] Extracting PMIDs
> Message-ID:
>       <cakbmofi_rdgq+vh2mtwwpbobahrwlpw5dxbs9y+xmuga2vp...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Jake,
> I have script that does this already for DOIs, Its was one-line change to
> make. These files should answer what you were looking for.
>
> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt
> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txt
>
> In the future you can tell them to use halfak's
> https://pythonhosted.org/mediawiki-utilities/
> This is the code I used to get those lists.
> https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1da3ff135fd8c520
>
> Make a great day,
> Max Klein ‽ http://notconfusing.com/
>
> On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <[email protected]>
> wrote:
>
>> Jake,
>>
>> Yes, its a rather straightforward parse based on the citation format which
>> Jeremy described. Doc James and I already have this coded up for a soon to
>> be published [[WP:MED]] readership/editorship paper.
>>
>> Searching for PMID's in the entirety of the Wikipedia article base would
>> be a bit time consuming -- but if one needs to pull down only articles in
>> WikiProject Medicine, for example, I am also able to help on that front.
>>
>> Perhaps we'll take this offline, but if anyone else is interested in the
>> dirty details, feel free to contact one of us off-list. -AW
>>
>> --
>> Andrew G. West, PhD
>> http://www.andrew-g-west.com
>>
>>
>>
>> On 10/20/2014 11:57 PM, Jake Orlowitz wrote:
>>
>>> Hi folks,
>>>
>>> Relaying a question from a Stanford medical researcher:
>>>
>>> "Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs
>>> from Wiki references?  Furthermore, could you dump those IDs out into a
>>> list for analysis?"
>>>
>>> Best,
>>> Jake Orlowitz (Ocaasi)
>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141022/af0ad9fa/attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 22 Oct 2014 14:48:32 -0500
> From: Aaron Halfaker <[email protected]>
> To: Research into Wikimedia content and communities
>       <[email protected]>
> Subject: Re: [Wiki-research-l] Extracting PMIDs
> Message-ID:
>       <canqe2t-1pgbumoenwcbfdvvzthtv1cmykb-5bhedssijlc2...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hey folks,
>
> Somehow I missed this thread, but I've already addressed this request on
> the Village Pump[1].  See:
>
> See.
> http://datasets.wikimedia.org/public-datasets/enwiki/etc/pmids.articles.20141008.tsv
>
>
> I extracted PMIDs with the following regex: /\bpmid *= *[0-9]+\b/i
>
> It includes page_id, page_namespace, page_title, rev_id (most recent), pmid
> in TAB separated values.
>
> Let me know if you have questions or if you think the regex matching
> strategy is insufficient.  It's pretty quick to take another pass.
>
> 1.
> https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Extracting_PMIDs
>
> On Wed, Oct 22, 2014 at 1:27 PM, Maximilian Klein <[email protected]> wrote:
>
>> Jake,
>> I have script that does this already for DOIs, Its was one-line change to
>> make. These files should answer what you were looking for.
>>
>> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt
>> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txt
>>
>> In the future you can tell them to use halfak's
>> https://pythonhosted.org/mediawiki-utilities/
>> This is the code I used to get those lists.
>> https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1da3ff135fd8c520
>>
>> Make a great day,
>> Max Klein ‽ http://notconfusing.com/
>>
>> On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <[email protected]>
>> wrote:
>>
>>> Jake,
>>>
>>> Yes, its a rather straightforward parse based on the citation format
>>> which Jeremy described. Doc James and I already have this coded up for a
>>> soon to be published [[WP:MED]] readership/editorship paper.
>>>
>>> Searching for PMID's in the entirety of the Wikipedia article base would
>>> be a bit time consuming -- but if one needs to pull down only articles in
>>> WikiProject Medicine, for example, I am also able to help on that front.
>>>
>>> Perhaps we'll take this offline, but if anyone else is interested in the
>>> dirty details, feel free to contact one of us off-list. -AW
>>>
>>> --
>>> Andrew G. West, PhD
>>> http://www.andrew-g-west.com
>>>
>>>
>>>
>>> On 10/20/2014 11:57 PM, Jake Orlowitz wrote:
>>>
>>>> Hi folks,
>>>>
>>>> Relaying a question from a Stanford medical researcher:
>>>>
>>>> "Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs
>>>> from Wiki references?  Furthermore, could you dump those IDs out into a
>>>> list for analysis?"
>>>>
>>>> Best,
>>>> Jake Orlowitz (Ocaasi)
>>>>
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141022/fd25c173/attachment-0001.html>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 22 Oct 2014 15:06:39 -0700
> From: Maximilian Klein <[email protected]>
> To: Research into Wikimedia content and communities
>       <[email protected]>
> Subject: Re: [Wiki-research-l] Extracting PMIDs
> Message-ID:
>       <CAKbmofjaf86Sz=63nu_vhpgoeqt_gx4d_2hfrzmftbjh8ra...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Out of interest, my regex was
>
> pmc\s*\=\s*(.*?)[\|\}]
>
> and then also
>
> pmid\s*\=\s*(.*?)[\|\}]
>
>
> with ignorecase flag set on.
>
> Make a great day,
> Max Klein ‽ http://notconfusing.com/
>
> On Wed, Oct 22, 2014 at 12:48 PM, Aaron Halfaker <[email protected]>
> wrote:
>
>> Hey folks,
>>
>> Somehow I missed this thread, but I've already addressed this request on
>> the Village Pump[1].  See:
>>
>> See.
>> http://datasets.wikimedia.org/public-datasets/enwiki/etc/pmids.articles.20141008.tsv
>>
>>
>> I extracted PMIDs with the following regex: /\bpmid *= *[0-9]+\b/i
>>
>> It includes page_id, page_namespace, page_title, rev_id (most recent),
>> pmid in TAB separated values.
>>
>> Let me know if you have questions or if you think the regex matching
>> strategy is insufficient.  It's pretty quick to take another pass.
>>
>> 1.
>> https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Extracting_PMIDs
>>
>> On Wed, Oct 22, 2014 at 1:27 PM, Maximilian Klein <[email protected]>
>> wrote:
>>
>>> Jake,
>>> I have script that does this already for DOIs, Its was one-line change to
>>> make. These files should answer what you were looking for.
>>>
>>> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt
>>> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txt
>>>
>>> In the future you can tell them to use halfak's
>>> https://pythonhosted.org/mediawiki-utilities/
>>> This is the code I used to get those lists.
>>> https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1da3ff135fd8c520
>>>
>>> Make a great day,
>>> Max Klein ‽ http://notconfusing.com/
>>>
>>> On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <[email protected]>
>>> wrote:
>>>
>>>> Jake,
>>>>
>>>> Yes, its a rather straightforward parse based on the citation format
>>>> which Jeremy described. Doc James and I already have this coded up for a
>>>> soon to be published [[WP:MED]] readership/editorship paper.
>>>>
>>>> Searching for PMID's in the entirety of the Wikipedia article base would
>>>> be a bit time consuming -- but if one needs to pull down only articles
>>>> in
>>>> WikiProject Medicine, for example, I am also able to help on that front.
>>>>
>>>> Perhaps we'll take this offline, but if anyone else is interested in the
>>>> dirty details, feel free to contact one of us off-list. -AW
>>>>
>>>> --
>>>> Andrew G. West, PhD
>>>> http://www.andrew-g-west.com
>>>>
>>>>
>>>>
>>>> On 10/20/2014 11:57 PM, Jake Orlowitz wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> Relaying a question from a Stanford medical researcher:
>>>>>
>>>>> "Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs
>>>>> from Wiki references?  Furthermore, could you dump those IDs out into a
>>>>> list for analysis?"
>>>>>
>>>>> Best,
>>>>> Jake Orlowitz (Ocaasi)
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wiki-research-l mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141022/d72b4c12/attachment.html>
>
> ------------------------------
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> End of Wiki-research-l Digest, Vol 110, Issue 12
> ************************************************
>


-- 
Jake Orlowitz
  Wikipedia: Ocaasi <http://enwp.org/User:Ocaasi>
  Facebook: Jake Orlowitz <http://www.facebook.com/jorlowitz>
  Twitter: JakeOrlowitz <https://twitter.com/JakeOrlowitz>
  LinkedIn: Jake Orlowitz<http://www.linkedin.com/profile/view?id=197604531>
  Email: [email protected]
  Skype: jorlowitz
  Cell: (484) 684-2104
  Home: (484) 380-3940

_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] Wiki-research-l Digest, Vol 110, Issue 12

Reply via email to