Re: [Wiki-research-l] Extracting PMIDs

Maximilian Klein Wed, 22 Oct 2014 15:07:12 -0700

Out of interest, my regex was

pmc\s*\=\s*(.*?)[\|\}]


and then also

pmid\s*\=\s*(.*?)[\|\}]


with ignorecase flag set on.

Make a great day,
Max Klein ‽ http://notconfusing.com/

On Wed, Oct 22, 2014 at 12:48 PM, Aaron Halfaker <[email protected]>
wrote:

> Hey folks,
>
> Somehow I missed this thread, but I've already addressed this request on
> the Village Pump[1].  See:
>
> See.
> http://datasets.wikimedia.org/public-datasets/enwiki/etc/pmids.articles.20141008.tsv
>
>
> I extracted PMIDs with the following regex: /\bpmid *= *[0-9]+\b/i
>
> It includes page_id, page_namespace, page_title, rev_id (most recent),
> pmid in TAB separated values.
>
> Let me know if you have questions or if you think the regex matching
> strategy is insufficient.  It's pretty quick to take another pass.
>
> 1.
> https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Extracting_PMIDs
>
> On Wed, Oct 22, 2014 at 1:27 PM, Maximilian Klein <[email protected]>
> wrote:
>
>> Jake,
>> I have script that does this already for DOIs, Its was one-line change to
>> make. These files should answer what you were looking for.
>>
>> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmc_list.txt
>> https://raw.githubusercontent.com/notconfusing/listiness/pmc/pmid_list.txt
>>
>> In the future you can tell them to use halfak's
>> https://pythonhosted.org/mediawiki-utilities/
>> This is the code I used to get those lists.
>> https://github.com/notconfusing/listiness/commit/e140ce9202b9c1098dec40ca1da3ff135fd8c520
>>
>> Make a great day,
>> Max Klein ‽ http://notconfusing.com/
>>
>> On Mon, Oct 20, 2014 at 9:20 PM, Andrew G. West <[email protected]>
>> wrote:
>>
>>> Jake,
>>>
>>> Yes, its a rather straightforward parse based on the citation format
>>> which Jeremy described. Doc James and I already have this coded up for a
>>> soon to be published [[WP:MED]] readership/editorship paper.
>>>
>>> Searching for PMID's in the entirety of the Wikipedia article base would
>>> be a bit time consuming -- but if one needs to pull down only articles in
>>> WikiProject Medicine, for example, I am also able to help on that front.
>>>
>>> Perhaps we'll take this offline, but if anyone else is interested in the
>>> dirty details, feel free to contact one of us off-list. -AW
>>>
>>> --
>>> Andrew G. West, PhD
>>> http://www.andrew-g-west.com
>>>
>>>
>>>
>>> On 10/20/2014 11:57 PM, Jake Orlowitz wrote:
>>>
>>>> Hi folks,
>>>>
>>>> Relaying a question from a Stanford medical researcher:
>>>>
>>>> "Do you know if it is possible to extract PubMed ID (PMID) or PMCIDs
>>>> from Wiki references?  Furthermore, could you dump those IDs out into a
>>>> list for analysis?"
>>>>
>>>> Best,
>>>> Jake Orlowitz (Ocaasi)
>>>>
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>

_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] Extracting PMIDs

Reply via email to