Hi Greg, Thank you for your e-mail. I see that your code is very ad-hoc, like mine. In fact, it's quite similar. It doesn't directly solve all my problems but you gave me two good ideas: 1. To look first in the metadata 2. To search as well with PII (DOIs don't work when they include parenthesis, as in elsevier journals, and it looks easy to convert a DOI into a PII)
Thanks again! Miguel Le 9 Dec 2008 à 00:15, Gregory Jefferis a écrit : > Hi Miguel, > > For DOI parsing I'm afraid what I've put together is really a dirty > great > hack, but it works for me. You can take a look at the perl here: > > http://pastie.org/334440 > > The core DOI regexes are: > > if($page=~/doi[: ]+([0-9.]+[ \/][A-Z0-9.\-_]+)/im){ > $doicand=$1; > $doicand=~s/\s+/\//; > $pmid=efetch($doicand); > } elsif($page=~/doi[: ]+(10\.[0-9]{4})[ \/0]([A-Z0-9.\-_]+)/im){ > # Be more restrictive about initial part but less about > # actual DOI string - offer 3 alternatives for 'hinge' > # including standard slash > $doicand=$1."/".$2; > $pmid=efetch($doicand); > } > > However to speed things up, the script first looks at the file name, > the pdf > metadata and eventually the text (using pdftotext) looking for dois to > identify enough information to pull up the record from PubMed. These > days it > works fine with all my journals (I'm a neuroscientist). > > Best, > > Greg. > > PS Let me know if you would like the full app (just a wrapper script > to > handle drag and drop of PDFs + pdftotext pinched from inside Bibdesk). > > > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, > Nevada. > The future of the web can't happen without you. Join us at MIX09 to > help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Miguel Ortiz Lombardía !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!! NEW ADDRESS !!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Architecture et Fonction des Macromolécules Biologiques UMR6098, CNRS, Université Aix-Marseille I & II Case 932 163 Avenue de Luminy 13288 Marseille cedex 9 France Tel : +33(0) 491 82 55 93 Fax: +33(0) 491 26 67 20 e-mail: [EMAIL PROTECTED] Web: http://www.pangea.org/mol/spip.php?rubrique2 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users