Re: [Wiki-research-l] Extracting PMIDs

Aaron Halfaker Wed, 22 Oct 2014 15:08:45 -0700

Ahh.  What are pmcs?

On Wed, Oct 22, 2014 at 5:06 PM, Maximilian Klein <[email protected]> wrote:


> Out of interest, my regex was
>
> pmc\s*\=\s*(.*?)[\|\}]
>
> and then also
>
> pmid\s*\=\s*(.*?)[\|\}]
>
>
> with ignorecase flag set on.
>
> Make a great day,
> Max Klein ‽ http://notconfusing.com/
>
> On Wed, Oct 22, 2014 at 12:48 PM, Aaron Halfaker <[email protected]
> > wrote:
>
>> Hey folks,
>>
>> Somehow I missed this thread, but I've already addressed this request on
>> the Village Pump[1].  See:
>>
>> See.
>> http://datasets.wikimedia.org/public-datasets/enwiki/etc/pmids.articles.20141008.tsv
>>
>>
>> I extracted PMIDs with the following regex: /\bpmid *= *[0-9]+\b/i
>>
>> It includes page_id, page_namespace, page_title, rev_id (most recent),
>> pmid in TAB separated values.
>>
>> Let me know if you have questions or if you think the regex matching
>> strategy is insufficient.  It's pretty quick to take another pass.
>>
>> 1.
>> https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Extracting_PMIDs
>>
>> On Wed, Oct 22, 2014 at 1:27 PM, Maximilian Klein <[email protected]>
>> wrote:
>>
>>> Jake,
>>> I have script that does this already for DOIs, Its was one-line change
>>> to make. These files should answer what you were looking for.
>>>
>>> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt
>>>
>>> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txt
>>>
>>> In the future you can tell them to use halfak's
>>> https://pythonhosted.org/mediawiki-utilities/
>>> This is the code I used to get those lists.
>>> https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1da3ff135fd8c520
>>>
>>> Make a great day,
>>> Max Klein ‽ http://notconfusing.com/
>>>
>>> On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <[email protected]
>>> > wrote:
>>>
>>>> Jake,
>>>>
>>>> Yes, its a rather straightforward parse based on the citation format
>>>> which Jeremy described. Doc James and I already have this coded up for a
>>>> soon to be published [[WP:MED]] readership/editorship paper.
>>>>
>>>> Searching for PMID's in the entirety of the Wikipedia article base
>>>> would be a bit time consuming -- but if one needs to pull down only
>>>> articles in WikiProject Medicine, for example, I am also able to help on
>>>> that front.
>>>>
>>>> Perhaps we'll take this offline, but if anyone else is interested in
>>>> the dirty details, feel free to contact one of us off-list. -AW
>>>>
>>>> --
>>>> Andrew G. West, PhD
>>>> http://www.andrew-g-west.com
>>>>
>>>>
>>>>
>>>> On 10/20/2014 11:57 PM, Jake Orlowitz wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> Relaying a question from a Stanford medical researcher:
>>>>>
>>>>> "Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs
>>>>> from Wiki references?  Furthermore, could you dump those IDs out into a
>>>>> list for analysis?"
>>>>>
>>>>> Best,
>>>>> Jake Orlowitz (Ocaasi)
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wiki-research-l mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>

_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] Extracting PMIDs

Reply via email to