Re: [CODE4LIB] Language codes

2016-06-01 Thread Andrew Cunningham
It is better to refer to BCP-47 instead. https://tools.ietf.org/html/bcp47 An RFC can be updated, when it is, it recieves a new number. For language tagging, the relevant information is split across two RFCs. BCP-47 is a permanent IEFT ifentifier referencing the latest versions of the two RFCs

Re: [CODE4LIB] Language codes

2016-06-01 Thread Andrew Cunningham
On 2 Jun 2016 9:40 am, "Andrew Cunningham" <lang.supp...@gmail.com> wrote: > > > Ultimately it is what a library is working on, if you are cataloguing then all you have is ISO-639-3/B > Opps, meant to input ISO-639-2/B Andrew

Re: [CODE4LIB] Language codes

2016-06-01 Thread Andrew Cunningham
Outside the library sector, the most common approach to language tagging and matching isn't ISO-639-2 or ISO-639-3, rather BCP-47. Quite a number of ISO-639-2 language tags represent what ISO-639-3 refers to as macro languages. For instance 'kar' in ISO-639-2 resolves to 20 language codes in

[CODE4LIB] Fwd: [camms-ccaam] Common encoding errors

2016-02-22 Thread Andrew Cunningham
an for African Studies and Catalog Librarian* *Sterling Memorial Library* *Yale University* *charles.ri...@yale.edu <charles.ri...@yale.edu>* *(203)432-7566 <%28203%29432-7566> or (203)432-9301 <%28203%29432-9301>* -- Andrew Cunningham lang.supp...@gmail.com

Re: [CODE4LIB] Best way to handle non-US keyboard chars in URLs?

2016-02-21 Thread Andrew Cunningham
ing the resource. It seems to me your primary concern is for users who can not read the resource in any event Andrew -- Andrew Cunningham lang.supp...@gmail.com

[CODE4LIB]

2016-02-08 Thread Andrew Cunningham
Thanks I will look into them. On 9 February 2016 at 03:56, Han, Yan - (yhan) <y...@email.arizona.edu> wrote: > Yes. Use iText or PDFBox > > These are common PDF libraries. > > > > > > On 2/6/16, 2:24 PM, "Code for Libraries on behalf of Andrew Cunningham&q

[CODE4LIB]

2016-02-08 Thread Andrew Cunningham
have known it existed without your posting. > > > On Mon, Feb 8, 2016 at 11:56 AM, Han, Yan - (yhan) <y...@email.arizona.edu > > > wrote: > > > Yes. Use iText or PDFBox > > > > These are common PDF libraries. > > > > > > > >

[CODE4LIB]

2016-02-06 Thread Andrew Cunningham
based on the ActualText content rather than the visible text layers in the PDF? Andrew -- Andrew Cunningham lang.supp...@gmail.com

Re: [CODE4LIB] Library community web standards (was: LibGuides v2 - Templates and Nav)

2014-09-30 Thread Andrew Cunningham
Hi Brad, An interesting idea, but many potential failure points. I have been in the position of spending considerable time to develop,best practive materials on web internationalisation for our state government, without any prospect of being able to roll it out within our own library. Wether we

Re: [CODE4LIB] Natural language programming

2014-07-01 Thread Andrew Cunningham
Since you maybe looking at Drupal intergratin down the path, I would look at using python znd the NLTK , and develop a web service that coild ghen be used by drupal On 01/07/2014 11:13 PM, Katie konrad.ka...@gmail.com wrote: Hello, Has anyone here experience in the world of natural language

Re: [CODE4LIB] Cataloguing Telugu

2014-04-07 Thread Andrew Cunningham
Stuart, had a quick look at the proposal, not sure cataloguing is an appropriate term, nor are they citations. I suspect that a simple database, web interface, simple search interface and Telugu collation should suffice. No specific tools would be needed. We are talking about a fairly common

Re: [CODE4LIB] pdf2txt

2013-10-11 Thread Andrew Cunningham
You may want to consider how best to handle PDF files where the text would contain ligatures and glyph ids rather than the underlying characters. A. On 12/10/2013 4:58 AM, Eric Lease Morgan emor...@nd.edu wrote: On Oct 11, 2013, at 1:49 PM, Matthew Sherman matt.r.sher...@gmail.com wrote:

Re: [CODE4LIB] pdf2txt

2013-10-11 Thread Andrew Cunningham
Hi Mark, I suspect the tool wil only be able to handle select languages, and very doubtful you could develop a tool to handle non-LCG text. For a fully internationalised tool, you would have fo ignore all text layers in a PDF and run all PDFs through OCR to generate text. Then you'd need to

Re: [CODE4LIB] pdf2txt

2013-10-11 Thread Andrew Cunningham
Perl has its own encoding model, strings vould be unicode or legacy encoding, unicode is Unicode is indicated by the presence of a flag on a string. Out its decided on a string by string basis. If it is a legacy encoding, then it could be any legacy encoding. If your data is truly multilingual,

Re: [CODE4LIB] Python and Ruby

2013-07-29 Thread Andrew Cunningham
Both Ruby and Python, have their strengths and weaknesses, and as others have mentioned, it will come down to need and existing projects you want to leverage. We use both Python and Ruby internally. Know your tools and their strengths and weaknesses. My personal interested is more and more

Re: [CODE4LIB] Python and Ruby

2013-07-29 Thread Andrew Cunningham
White space is potentially an illusion it isn't necessarilly there, esp when the whitespace is not a character ... ;) On 30/07/2013 8:02 AM, Michael J. Giarlo leftw...@alumni.rutgers.edu wrote: And you would think Python developers would know how to... ( •_•) ( •_•)⌐■-■ (⌐■_■) read

Re: [CODE4LIB] tiff2pdf, then back to pdf?

2013-04-26 Thread Andrew Cunningham
Although I do find the persistent myth of PDF/A as an archival format amusing. Under very specific circumstances it can be, but its rare for those circumstances to be deliberatively met. And for many languages it is impossible to use pdf for archival purpuses ever. It is the nature of PDF. On

Re: [CODE4LIB] From Chinese characters to convert Pinyin and Traditional and Simplified Chinese and Hangul

2013-04-18 Thread Andrew Cunningham
-- Andrew Cunningham Project Manager, Research and Development (Social and Digital Inclusion) Public Libraries and Community Engagement State Library of Victoria 328 Swanston Street Melbourne VIC 3000 Australia Ph: +61-3-8664-7430 Mobile: 0459 806 589 Email: acunning...@slv.vic.gov.au

Re: [CODE4LIB] one tool and/or resource that you recommend to newbie coders in a library?

2012-11-01 Thread Andrew Cunningham
University http://medlib.fiu.edu http://medlib.fiu.edu/m (Mobile) -- Andrew Cunningham Project Manager, Research and Development Social and Digital Inclusion Unit Public Libraries and Community Engagement State Library of Victoria 328 Swanston Street Melbourne VIC 3000 Australia Ph: +61-3-8664

Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-20 Thread Andrew Cunningham
must work in Unicode, irrespective of the data format that the underlying system expects the data to be. If the underlying system expects MARC8 then the save as process should be able to translate the data into MARC8 on output. -Robert Haschart -- Andrew Cunningham Senior Project Manager

Re: [CODE4LIB] Unicode font for PDF generation?

2012-03-18 Thread Andrew Cunningham
with overlapping codepoints in the merged fonts. Thanks, Mark -- Andrew Cunningham Senior Project Manager, Research and Development Vicnet State Library of Victoria Australia andr...@vicnet.net.au lang.supp...@gmail.com

Re: [CODE4LIB] Unicode font for PDF generation?

2012-03-18 Thread Andrew Cunningham
. Does anyone have experience with font authoring and merging different fonts? It looks as though FontForge can merge fonts, but it's not clear how to deal with overlapping codepoints in the merged fonts. Thanks, Mark -- Andrew Cunningham Senior Project Manager, Research and Development Vicnet

Re: [CODE4LIB] Unicode font for PDF generation?

2012-03-18 Thread Andrew Cunningham
For additional CJKV fonts look at: http://en.wikipedia.org/wiki/List_of_CJK_fonts -- Andrew Cunningham Senior Project Manager, Research and Development Vicnet State Library of Victoria Australia andr...@vicnet.net.au lang.supp...@gmail.com

Re: [CODE4LIB] Plea for help from Horowhenua Library Trust to Koha Community

2011-11-23 Thread Andrew Cunningham
who are involved with libraries at some level, isn't it? I'm wondering if cultural property rights can be use to over turn a trademark. Not only is koha a maori word it is a cultural concept. -- Andrew Cunningham Senior Project Manager, Research and Development Vicnet State Library

Re: [CODE4LIB] Plea for help from Horowhenua Library Trust to Koha Community

2011-11-23 Thread Andrew Cunningham
I'd be inclined to have a quite chat with Maori political activists and see what their feleings are on non-New Zealand companies applying for trademark status on Maori words in New Zealand. -- Andrew Cunningham Senior Project Manager, Research and Development Vicnet State Library of Victoria

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Andrew Cunningham
I'd suspect that MARCXML isn't going anywhere fast, a shame perhaps. The key difference between MARCXML and MARC is that MARCXML inherits XMLs internationalisation features. It is an aspect at which MARC is very poor. Andrew -- Andrew Cunningham Senior Project Manager, Research

Re: [CODE4LIB] character-sets for dummies?

2009-12-16 Thread Andrew Cunningham
/       New Zealand Electronic Text Centre http://researcharchive.vuw.ac.nz/     Institutional Repository -- Andrew Cunningham Vicnet Research and Development Coordinator State Library of Victoria Australia andr...@vicnet.net.au lang.supp...@gmail.com