It is better to refer to BCP-47 instead.
https://tools.ietf.org/html/bcp47
An RFC can be updated, when it is, it recieves a new number. For language
tagging, the relevant information is split across two RFCs. BCP-47 is a
permanent IEFT ifentifier referencing the latest versions of the two RFCs
On 2 Jun 2016 9:40 am, "Andrew Cunningham" <lang.supp...@gmail.com> wrote:
>
>
> Ultimately it is what a library is working on, if you are cataloguing
then all you have is ISO-639-3/B
>
Opps, meant to input ISO-639-2/B
Andrew
Outside the library sector, the most common approach to language tagging
and matching isn't ISO-639-2 or ISO-639-3, rather BCP-47.
Quite a number of ISO-639-2 language tags represent what ISO-639-3 refers
to as macro languages. For instance 'kar' in ISO-639-2 resolves to 20
language codes in
an for African Studies and Catalog Librarian*
*Sterling Memorial Library*
*Yale University*
*charles.ri...@yale.edu <charles.ri...@yale.edu>*
*(203)432-7566 <%28203%29432-7566> or (203)432-9301 <%28203%29432-9301>*
--
Andrew Cunningham
lang.supp...@gmail.com
ing
the resource.
It seems to me your primary concern is for users who can not read the
resource in any event
Andrew
--
Andrew Cunningham
lang.supp...@gmail.com
Thanks I will look into them.
On 9 February 2016 at 03:56, Han, Yan - (yhan) <y...@email.arizona.edu>
wrote:
> Yes. Use iText or PDFBox
>
> These are common PDF libraries.
>
>
>
>
>
> On 2/6/16, 2:24 PM, "Code for Libraries on behalf of Andrew Cunningham&q
have known it existed without your posting.
>
>
> On Mon, Feb 8, 2016 at 11:56 AM, Han, Yan - (yhan) <y...@email.arizona.edu
> >
> wrote:
>
> > Yes. Use iText or PDFBox
> >
> > These are common PDF libraries.
> >
> >
> >
> >
based
on the ActualText content rather than the visible text layers in the PDF?
Andrew
--
Andrew Cunningham
lang.supp...@gmail.com
Hi Brad,
An interesting idea, but many potential failure points.
I have been in the position of spending considerable time to develop,best
practive materials on web internationalisation for our state government,
without any prospect of being able to roll it out within our own library.
Wether we
Since you maybe looking at Drupal intergratin down the path, I would look
at using python znd the NLTK , and develop a web service that coild ghen be
used by drupal
On 01/07/2014 11:13 PM, Katie konrad.ka...@gmail.com wrote:
Hello,
Has anyone here experience in the world of natural language
Stuart, had a quick look at the proposal, not sure cataloguing is an
appropriate term, nor are they citations.
I suspect that a simple database, web interface, simple search interface
and Telugu collation should suffice. No specific tools would be needed. We
are talking about a fairly common
You may want to consider how best to handle PDF files where the text would
contain ligatures and glyph ids rather than the underlying characters.
A.
On 12/10/2013 4:58 AM, Eric Lease Morgan emor...@nd.edu wrote:
On Oct 11, 2013, at 1:49 PM, Matthew Sherman matt.r.sher...@gmail.com
wrote:
Hi Mark,
I suspect the tool wil only be able to handle select languages, and very
doubtful you could develop a tool to handle non-LCG text.
For a fully internationalised tool, you would have fo ignore all text
layers in a PDF and run all PDFs through OCR to generate text.
Then you'd need to
Perl has its own encoding model, strings vould be unicode or legacy
encoding, unicode is Unicode is indicated by the presence of a flag on a
string. Out its decided on a string by string basis.
If it is a legacy encoding, then it could be any legacy encoding.
If your data is truly multilingual,
Both Ruby and Python, have their strengths and weaknesses, and as others
have mentioned, it will come down to need and existing projects you want to
leverage.
We use both Python and Ruby internally.
Know your tools and their strengths and weaknesses.
My personal interested is more and more
White space is potentially an illusion it isn't necessarilly there,
esp when the whitespace is not a character ...
;)
On 30/07/2013 8:02 AM, Michael J. Giarlo leftw...@alumni.rutgers.edu
wrote:
And you would think Python developers would know how to...
( •_•)
( •_•)⌐■-■
(⌐■_■)
read
Although I do find the persistent myth of PDF/A as an archival format
amusing.
Under very specific circumstances it can be, but its rare for those
circumstances to be deliberatively met.
And for many languages it is impossible to use pdf for archival purpuses
ever.
It is the nature of PDF.
On
--
Andrew Cunningham
Project Manager, Research and Development
(Social and Digital Inclusion)
Public Libraries and Community Engagement
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Australia
Ph: +61-3-8664-7430
Mobile: 0459 806 589
Email: acunning...@slv.vic.gov.au
University
http://medlib.fiu.edu
http://medlib.fiu.edu/m (Mobile)
--
Andrew Cunningham
Project Manager, Research and Development
Social and Digital Inclusion Unit
Public Libraries and Community Engagement
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Australia
Ph: +61-3-8664
must work
in Unicode, irrespective of
the data format that the underlying system expects the data to be. If
the underlying system expects
MARC8 then the save as process should be able to translate the data
into MARC8 on output.
-Robert Haschart
--
Andrew Cunningham
Senior Project Manager
with overlapping codepoints in the merged fonts.
Thanks,
Mark
--
Andrew Cunningham
Senior Project Manager, Research and Development
Vicnet
State Library of Victoria
Australia
andr...@vicnet.net.au
lang.supp...@gmail.com
.
Does anyone have experience with font authoring and merging different
fonts?
It looks as though FontForge can merge fonts, but it's not clear how to
deal with overlapping codepoints in the merged fonts.
Thanks,
Mark
--
Andrew Cunningham
Senior Project Manager, Research and Development
Vicnet
For additional CJKV fonts look at:
http://en.wikipedia.org/wiki/List_of_CJK_fonts
--
Andrew Cunningham
Senior Project Manager, Research and Development
Vicnet
State Library of Victoria
Australia
andr...@vicnet.net.au
lang.supp...@gmail.com
who
are involved with libraries at some level, isn't it?
I'm wondering if cultural property rights can be use to over turn a
trademark. Not only is koha a maori word it is a cultural concept.
--
Andrew Cunningham
Senior Project Manager, Research and Development
Vicnet
State Library
I'd be inclined to have a quite chat with Maori political activists
and see what their feleings are on non-New Zealand companies applying
for trademark status on Maori words in New Zealand.
--
Andrew Cunningham
Senior Project Manager, Research and Development
Vicnet
State Library of Victoria
I'd suspect that MARCXML isn't going anywhere fast, a shame perhaps.
The key difference between MARCXML and MARC is that MARCXML inherits
XMLs internationalisation features.
It is an aspect at which MARC is very poor.
Andrew
--
Andrew Cunningham
Senior Project Manager, Research
/ New Zealand Electronic Text Centre
http://researcharchive.vuw.ac.nz/ Institutional Repository
--
Andrew Cunningham
Vicnet Research and Development Coordinator
State Library of Victoria
Australia
andr...@vicnet.net.au
lang.supp...@gmail.com
27 matches
Mail list logo