[CODE4LIB] From Chinese characters to convert Pinyin and Traditional and Simplified Chinese and Hangul
Hi, I'm Wataru ONO, librarian at Hitotsubashi University Library in Japanese. This tool is From Chinese characters to convert Pinyin and Traditional and Simplified Chinese and Hangul https://googledrive.com/host/0B_vZSxPrv8xmVnZwSkk0ZmU2Zmc/han2pin.html You can convert between Simplified and Traditional Chinese and Japanese characters. This is made of pure javascript. If you are interested in this tool, please feel free to use and down load. Best regards
Re: [CODE4LIB] From Chinese characters to convert Pinyin and Traditional and Simplified Chinese and Hangul
HI Wataru, very interesting script, although I'd be inclined to suggest an enhancement. It would be useful to add language tagging to the input field and each of the conversions. The page as it stands will not use appropriate fonts for each language, web browsers need appropriate language to facilitate appropriate font fallback behaviours. Andrew On 18 April 2013 19:29, Wataru Ono ono.wataru.p...@gmail.com wrote: Hi, I'm Wataru ONO, librarian at Hitotsubashi University Library in Japanese. This tool is From Chinese characters to convert Pinyin and Traditional and Simplified Chinese and Hangul https://googledrive.com/host/0B_vZSxPrv8xmVnZwSkk0ZmU2Zmc/han2pin.html You can convert between Simplified and Traditional Chinese and Japanese characters. This is made of pure javascript. If you are interested in this tool, please feel free to use and down load. Best regards -- Andrew Cunningham Project Manager, Research and Development (Social and Digital Inclusion) Public Libraries and Community Engagement State Library of Victoria 328 Swanston Street Melbourne VIC 3000 Australia Ph: +61-3-8664-7430 Mobile: 0459 806 589 Email: acunning...@slv.vic.gov.au lang.supp...@gmail.com http://www.openroad.net.au/ http://www.mylanguage.gov.au/ http://www.slv.vic.gov.au/
[CODE4LIB] OGG vs. WEBM
Hey brain, We are developing a video-training / instructional video library institute that liberally rips off khanacademy, except rather we want to host our videos instead of streaming it with a third-party. This is a simultaneous effort with the instruction and reference librarians to wean video production from flash but still have features like, ah, tables of contents, TimeJump, testing, and so on. There is a pretty large group who make tutorials and stuff for the library / university, so I'm trying to keep the HTML5 video process for them as simple as possible by automating a lot of it, but they are having to convert their videos to OGG, export SRT files for captions, and enter the timestamp anywhere they want a table of contents (like, Section Title: 1m23s). So I'm already feeling guilty about laying it on, until one of the stakeholders of the project who is writing up format-conversion tutorials asked if before we really get started we should add a third format - WebM. What do you think? I feel like between mp4 and ogg I'm hitting all the browsers. I can see the benefit of serving WebM and OGG to keep everything open, but they use tools like Camtasia and Captivate which pump out MP4 natively. Is either Ogg or WebM on its way out? Should I just say, uh, yeah, might as well throw WebM in there. I appreciate your insight : ), Michael Schofield(@nova.edu) | Web Services Librarian | (954) 262-4536 Alvin Sherman Library, Research, and Information Technology Center www.ns4lib.com
Re: [CODE4LIB] OGG vs. WEBM
Hi, Michael, I'd recommend using MP4 and WebM to get the greatest amount of browser coverage at the best compression levels at the same quality without converting to three formats. Make sure that the MP4 you're using is Web optimized (puts the metadata at the beginning of the file). For older browsers that do not support HTML5 Video or that do not support MP4 or WebM, you will also want to have a fallback to Flash which reuses the MP4. Take a look at the players here that have Flash fallback: http://praegnanz.de/html5video/ My favorite player right now is MediaElement.js [1] because of the unified look and API between HTML5 and Flash players. I'd also recommend converting your SRT files to WebVTT [2], which is similar and the actively developed standard for subtitles, captions, audio descriptions, and timed data. Look for a polyfill that will utilize the track element. If you want a table of contents you can look for support for chapters in a polyfill or embed them on the page similar to what I've done with transcripts [3][4]. Hope that helps. Let me know if you have questions. Jason [1] http://mediaelementjs.com/ [2] http://dev.w3.org/html5/webvtt/ [3] http://jronallo.github.io/blog/using-the-webvtt-ruby-gem-to-display-subtitles-on-the-page/ [4] http://d.lib.ncsu.edu/student-leaders/videos/fire-in-my-gut-carroll On Thu, Apr 18, 2013 at 9:11 AM, Michael Schofield mschofi...@nova.edu wrote: Hey brain, We are developing a video-training / instructional video library institute that liberally rips off khanacademy, except rather we want to host our videos instead of streaming it with a third-party. This is a simultaneous effort with the instruction and reference librarians to wean video production from flash but still have features like, ah, tables of contents, TimeJump, testing, and so on. There is a pretty large group who make tutorials and stuff for the library / university, so I'm trying to keep the HTML5 video process for them as simple as possible by automating a lot of it, but they are having to convert their videos to OGG, export SRT files for captions, and enter the timestamp anywhere they want a table of contents (like, Section Title: 1m23s). So I'm already feeling guilty about laying it on, until one of the stakeholders of the project who is writing up format-conversion tutorials asked if before we really get started we should add a third format - WebM. What do you think? I feel like between mp4 and ogg I'm hitting all the browsers. I can see the benefit of serving WebM and OGG to keep everything open, but they use tools like Camtasia and Captivate which pump out MP4 natively. Is either Ogg or WebM on its way out? Should I just say, uh, yeah, might as well throw WebM in there. I appreciate your insight : ), Michael Schofield(@nova.edu) | Web Services Librarian | (954) 262-4536 Alvin Sherman Library, Research, and Information Technology Center www.ns4lib.com
Re: [CODE4LIB] : Persian Romanization table
Hi Yan, The business of going from Original Script Persian to transliteration is much trickier than what we did, which was to go from Romanized Urdu BACK to Original Script Urdu. Unfortunately I haven�t tried going the other way, but it seems like it would require an ALA-Romanized Persian dictionary to make it work. Name might be easier, since there�s a lot of Persian in original script I the LC authority files and since names are often repeated you could get a lot use out of a modest sized dataset. I don�t know any rules of Persian orthography, but if there were any (like �i� before �e� except after �c� �) it would THEORETICALLY be possible to leverage those. Joel Hahn did a nice macro of Hebrew for OCLC (which has similar vocalization issues) but my Hebrew cataloger tells me that the vowels still have to be tweaked. Since I know even less about Hebrew than I do about Persian, I don�t know if there�s any part of his methodology you could repurpose for Persian. Sorry I can�t be of more help with this issue. JJ -Original Message- From: Han, Yan [mailto:h...@u.library.arizona.edu] Sent: Wednesday, April 17, 2013 8:14 PM To: Jacobs, Jane W; Code for Libraries (CODE4LIB@LISTSERV.ND.EDU); lit...@ala.org Cc: Seyede Pouye Khoshkhoosani Subject: RE: : Persian Romanization table Hello, All and Jane First I would like to appreciate Jane Jacobs at Queens Library providing me Urdu Romanization table. As we are working on creating Persian/Pushutu transliterate software, my Persian language expert has the following question : In according to our conversation for transliterating Persian to Roman letters, I faced a big problem: As the short vowels do not show up on or under the letters in Persian, how a machine can read a word in Persian. For example we have the word �??? ; to the machine this word is PDR, because it cannot read the vowels. There is no rule for the short vowels in the Persian language; so the machine does not understand if the first letter is �pi�, �pa� or �po�. Is there any way to overcome this obstacle? This seems to me that we missed a critical piece of information here. (Something like a dictionary). Without it, there is no way to have good translation from computer. We will have to have a Persian speaker to check/correct the computer's transliteration. Any suggestions ? Thanks, Yan -Original Message- From: Jacobs, Jane W [mailto:jane.w.jac...@queenslibrary.org] Sent: Wednesday, January 23, 2013 6:28 AM To: Han, Yan Subject: RE: : Persian Romanization table Hi Yan, As per my message to the listserve, here are the config files for Urdu. If you do a Persian config file, I d love to get it and if possible add it to the MARC::Detrans site. Let me know if you want to follow this road. JJ -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Han, Yan Sent: Tuesday, January 22, 2013 5:31 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] : Persian Romanization table Hello, All, I have a project to deal with Persian materials. I have already uses Google Translate API to translate. Now I am looking for an API to transliterate /Romanize (NOT Translate) Persian to English (not English to Persian). In other words, I have Persian in, and English out. There is a Romanization table (Persian romanization table - Library of Congresshttp://www.loc.gov/catdir/cpso/romanization/persian.pdf www.loc.gov/catdir/cpso/romanization/persian.pdfhttp://www.loc.gov/catdir/cpso/romanization/persian.pdf). For example, If should output as Kit?b My finding is that existing tools only do the opposite 1. Google Transliterate: you enter English, output Persian (Input Bookmark , output ??? , Input ??? , output ??? ) 2. OCLC language: the same as Google Transliterate. 3. http://mylanguages.org/persian_romanization.php : works, but no API. Anyone know such API exists? Thanks much, Yan Connect with Queens Library: * QueensLibrary.org http://www.queenslibrary.org/ * Facebook http://www.facebook.com/queenslibrarynyc * Twitter http://www.twitter.com/queenslibrary * LinkedIn http://www.linkedin.com/company/queens-library * Google+ https://plus.google.com/u/0/116278397527253207785 * Foursquare https://foursquare.com/queenslibrary * YouTube http://www.youtube.com/queenslibrary * Flickr http://www.flickr.com/photos/qbpllid/ * Goodreads http://www.goodreads.com/group/show/58240.Queens_Library The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is
Re: [CODE4LIB] From Chinese characters to convert Pinyin and Traditional and Simplified Chinese and Hangul
Hi Andrew, Thank you for your nice suggestions. When use the Japanese language in google font, using the google font API, I'd like to enhance to be able to specify appropriate font. Wataru 2013/4/18 Andrew Cunningham lang.supp...@gmail.com: HI Wataru, very interesting script, although I'd be inclined to suggest an enhancement. It would be useful to add language tagging to the input field and each of the conversions. The page as it stands will not use appropriate fonts for each language, web browsers need appropriate language to facilitate appropriate font fallback behaviours. Andrew On 18 April 2013 19:29, Wataru Ono ono.wataru.p...@gmail.com wrote: Hi, I'm Wataru ONO, librarian at Hitotsubashi University Library in Japanese. This tool is From Chinese characters to convert Pinyin and Traditional and Simplified Chinese and Hangul https://googledrive.com/host/0B_vZSxPrv8xmVnZwSkk0ZmU2Zmc/han2pin.html You can convert between Simplified and Traditional Chinese and Japanese characters. This is made of pure javascript. If you are interested in this tool, please feel free to use and down load. Best regards -- Andrew Cunningham Project Manager, Research and Development (Social and Digital Inclusion) Public Libraries and Community Engagement State Library of Victoria 328 Swanston Street Melbourne VIC 3000 Australia Ph: +61-3-8664-7430 Mobile: 0459 806 589 Email: acunning...@slv.vic.gov.au lang.supp...@gmail.com http://www.openroad.net.au/ http://www.mylanguage.gov.au/ http://www.slv.vic.gov.au/ -- _ 小野 亘 (Ono, Wataru) E-Mail: ono.wataru.p...@gmail.com (業務用: ono.wat...@dm.hit-u.ac.jp) 一橋大学附属図書館(学術・図書部) 学術情報課雑誌情報主担当 Tel: 042-580-8242 Fax: 042-580-8232
Re: [CODE4LIB] From Chinese characters to convert Pinyin and Traditional and Simplified Chinese and Hangul
Hi Wataru, This is really neat. Since you have pinyin for the Chinese, have you considered providing romanization for the Japanese and Korean as well? This link might also be of interest from the Hong Kong Innovative User group: http://hkiug.ln.edu.hk/unicode/hkiug_tsvcc_table-UnicodeVersion-1.0.html It contains some unused/classical characters that might not fit into one language. Judy -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Wataru Ono Sent: Thursday, April 18, 2013 2:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] From Chinese characters to convert Pinyin and Traditional and Simplified Chinese and Hangul Hi, I'm Wataru ONO, librarian at Hitotsubashi University Library in Japanese. This tool is From Chinese characters to convert Pinyin and Traditional and Simplified Chinese and Hangul https://googledrive.com/host/0B_vZSxPrv8xmVnZwSkk0ZmU2Zmc/han2pin.html You can convert between Simplified and Traditional Chinese and Japanese characters. This is made of pure javascript. If you are interested in this tool, please feel free to use and down load. Best regards
Re: [CODE4LIB] OGG vs. WEBM
Thanks Jason, You're a life saver and I've seen you present a couple times - so automatic trust :). Here is my hurried, post-lunch/pre-meeting responses: I couldn't find any documentation [but HTML5 vid is newish territory for me] about the prominence of WebVTT and I am worried that it is a flavor-of-the-[week/er month]. The tools the staff use handle the mp4 and SRT output but I'm not opposed to writing something that automates the conversion if it means we won't have to revisit file formats for a couple years. I am definitely using MediaElements.js for all the same reasons. It's great. I don't know much about metadata at the beginning of the MP4. Best, Michael -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jason Ronallo Sent: Thursday, April 18, 2013 9:34 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] OGG vs. WEBM Hi, Michael, I'd recommend using MP4 and WebM to get the greatest amount of browser coverage at the best compression levels at the same quality without converting to three formats. Make sure that the MP4 you're using is Web optimized (puts the metadata at the beginning of the file). For older browsers that do not support HTML5 Video or that do not support MP4 or WebM, you will also want to have a fallback to Flash which reuses the MP4. Take a look at the players here that have Flash fallback: http://praegnanz.de/html5video/ My favorite player right now is MediaElement.js [1] because of the unified look and API between HTML5 and Flash players. I'd also recommend converting your SRT files to WebVTT [2], which is similar and the actively developed standard for subtitles, captions, audio descriptions, and timed data. Look for a polyfill that will utilize the track element. If you want a table of contents you can look for support for chapters in a polyfill or embed them on the page similar to what I've done with transcripts [3][4]. Hope that helps. Let me know if you have questions. Jason [1] http://mediaelementjs.com/ [2] http://dev.w3.org/html5/webvtt/ [3] http://jronallo.github.io/blog/using-the-webvtt-ruby-gem-to-display-subtitles-on-the-page/ [4] http://d.lib.ncsu.edu/student-leaders/videos/fire-in-my-gut-carroll On Thu, Apr 18, 2013 at 9:11 AM, Michael Schofield mschofi...@nova.edu wrote: Hey brain, We are developing a video-training / instructional video library institute that liberally rips off khanacademy, except rather we want to host our videos instead of streaming it with a third-party. This is a simultaneous effort with the instruction and reference librarians to wean video production from flash but still have features like, ah, tables of contents, TimeJump, testing, and so on. There is a pretty large group who make tutorials and stuff for the library / university, so I'm trying to keep the HTML5 video process for them as simple as possible by automating a lot of it, but they are having to convert their videos to OGG, export SRT files for captions, and enter the timestamp anywhere they want a table of contents (like, Section Title: 1m23s). So I'm already feeling guilty about laying it on, until one of the stakeholders of the project who is writing up format-conversion tutorials asked if before we really get started we should add a third format - WebM. What do you think? I feel like between mp4 and ogg I'm hitting all the browsers. I can see the benefit of serving WebM and OGG to keep everything open, but they use tools like Camtasia and Captivate which pump out MP4 natively. Is either Ogg or WebM on its way out? Should I just say, uh, yeah, might as well throw WebM in there. I appreciate your insight : ), Michael Schofield(@nova.edu) | Web Services Librarian | (954) 262-4536 Alvin Sherman Library, Research, and Information Technology Center www.ns4lib.com
[CODE4LIB] Job: Documentation Consultant RFP: Linked Archival Metadata: A Guidebook at Tufts University
_Proposals are due by 4pm on May 6, 2013_ **Overview** The Linked Archival Metadata planning project (LiAM), led by the Digital Collections and Archives at Tufts University and funded by the Institute of Museum and Library Services (IMLS), is requesting proposals for the creation of _Linked Archival Metadata: A Guidebook_. The LiAM planning project anticipates contracting with the individual or firm who, in the LiAM planning project's sole opinion, is best qualified to create the Guidebook. The Guidebook will provide archivists with an overview of the current linked data landscape, define basic concepts, identify practical strategies for adoption, and emphasize the tangible payoffs for archives implementing linked data. It will focus on clarifying why archives and archival users can benefit from linked data and will identify a graduated approach to applying linked data methods to archival description. The Guidebook will serve as a roadmap for efforts in adopting linked data as well as implementing and developing new linked data tools and projects. The selected consultant will be responsible for drafting and editing the Guidebook, and, in consultation with LiAM project staff, coordinating the process of creating the Guidebook. Coordination activities will include working with experts designated by project staff and members of the broader archival and linked data communities to gather input as well as working with project staff to ensure that the Guidebook is completed on time. Therefore, it is important that the consultant allows sufficient time for public comment on draft of the Guidebook. The consultant responsibilities and deliverables listed below provide more detailed information on this project. The LiAM planning project intends this contract to commence on or about May 20, 2013. The final deliverables must be submitted to the LiAM planning project staff by January 31, 2014. **Draft Scope of Work** (to be finalized with selected consultant) **Public Involvement Plan** The consultant will develop a public involvement plan for public input and involvement in the Guidebook creation, including outreach to archivist and linked data mailing lists, including possible sessions at annual conferences. While members of the LiAM planning project will be responsible for sending any actual e-mail communication on mailing lists and facilitating sessions at conferences, the consultant will create a schedule for public involvement which will allow for meaningful feedback from the community and time to integrate that feedback into the final product. **Guidebook** A brief outline and structure of the Guidebook is included the[ Prospectus for Linked Archival Metadata: A Guidebook.](http://sites.tufts.edu/liam/deliverables/prospectus-for-linked- archival-metadata-a-guidebook/) The consultant will create the Guidebook in accordance with that prospectus, modified only in consultation with the LiAM planning project staff. The LiAM planning project staff will provide the consultant with subject expertise and referrals to places where there is more information; the consultant is not expected to do all of the research on his or her own. However, where appropriate, the consultant is expected to follow up on leads to see if there is more information than that provided by the LiAM planning project. That being said, the consultant's primary role is to create a guidebook which is useful to all of the intended audiences. As described in the Prospectus, there are three intended audiences for the Guidebook: * Archivists new to linked data * Archivists familiar with linked data * Technologists working in archives or with archivists While these are the primary audiences for the Guidebook, we will also include an executive summary with the core objectives, anticipated outcomes, and implications that will provide administrators or other senior leaders with the information that they will need in order to understand the benefits and potential costs of this path. **Application Instructions** Application materials will be reviewed by the members of the LiAM planning project. Proposals should include a Statement of Qualifications outlining pertinent experience along with current references and a brief writing sample. Any questions should be submitted via e-mail to Anne Sauer, Director of the Tufts University Digital Collections and Archives, at anne.sa...@tufts.edu. The LiAM planning project reserves the right under applicable law to reject or waive procedural irregularities, to reject any and all proposals and to terminate the selection process at anytime if, at its sole discretion, it determines such action would be in the best interests of the LiAM planning project. The LiAM planning project reserves the right to make a selection directly from the proposal, or to require an interview of the top applicants. The LiAM planning project reserves the right to negotiate the final
[CODE4LIB] Job: User Experience Designer at Peabody Essex Museum
The Peabody Essex Museum is seeking an extremely creative and strategic thinker to be part of our award-winning Integrated Media Department. Come create the transformative museum experiences that PEM is known for and help define PEM's future as we move forward with our $650 million Campaign to advance the museum's mission. The Campaign includes $200 million for a 175,000-square-foot expansion including $100 million to support creative new installations of the collection and several infrastructure improvements to existing facilities. Our User Experience Designer will develop engaging and innovative interactives that help shape the visitor experience and establish PEM as a world-class 21st century museum. Reporting to the Director of Integrated Media, the User Experience Designer is responsible for the design and production of all digital interactive experiences. Digital platforms include websites, in gallery mobile experiences, digital signage and wayfinding, interactive kiosks, immersive media environments, and some digital branding initiatives. The User Experience Designer collaborates with staff across the museum to conceptualize and design interactive media for museum exhibitions and the reimagining of the installation of the museum's permanent collection. This is a dream opportunity to work in a mission-driven and highly creative environment implementing new and innovative technologies (web, mobile, in gallery UX) that enhance the experience of museum visitors. We are looking for a person with a forward-thinking approach to responsive Web design, as well as interest in emerging Web technologies, user-experience and social networking trends. The position requires: • At least four years of experience in a fast-paced production environment • B.A. or B.F.A. degree in Human Computer Interaction, Digital Media, Design or a related artistic field, or a relevant combination of education and experience • Strong portfolio that includes both user experience and user centered design • A basic understanding of HTML5, PHP, MySQL and a proven track record of working with developers to effectively realize their designs Interested candidates should send their resumes with cover letters and salary requirements to Human Resources, Peabody Essex Museum, East India Square, Salem, MA 01970-3783, or apply by email to j...@pem.org. For more information about PEM check out our employment page http://www.pem.org/about/_employment/ Brought to you by code4lib jobs: http://jobs.code4lib.org/job/7483/
[CODE4LIB] Job: Digital Archivist at Princeton University
The Princeton University Library is one of the world's leading research libraries, serving a diverse community of 5,200 undergraduates, 2,600 graduate students, 853 faculty members, and many visiting scholars. Its holdings include more than 7 million printed volumes, 5 million manuscripts, 2 million non-print items, and extensive collections of digital text, data, and images. The Library employs a dedicated and knowledgeable staff of more than 300 professional and support staff working in a large central library, 9 specialized branches, and 3 storage facilities.The Digital Archivist will work at the Princeton University Library's Seeley G. Mudd Manuscript Library, a unit of the Department of Rare Books and Special Collections. This library houses the Princeton University Archives (current holdings of approximately 15,000 cubic feet) as well as a major collection of 20th-century public policy papers (current holdings of approximately 20,000 cubic feet).Major ResponsibilitiesReporting to the Assistant University Archivist for Technical Services, the Digital Archivist is dedicated to the processing, description, and preservation of University records in both digital and analog form. The Digital Archivist's primary focus will be to participate in the continued development and evolution of an electronic records program at the Mudd Manuscript Library. This work will include developing, implementing, and executing processes enabling effective acquisition, appraisal, ingest, description, preservation, access to and security of born-digital and hybrid archival collections acquired by the University Archives. The archivist will be expected to remain current with emerging standards and professional best practices and be able to manage complex projects. The Digital Archivist will be integrated into the functions of the Library and the Mudd Library Technical Services Unit. The position will participate in the archival processing, accessioning, and reference programs of the Mudd Manuscript Library and contribute to work relating to development and evaluation of infrastructure for digital archives, access systems and tools, digitization, and related technical issues. This position also works with a variety of stakeholders, including archivists and librarians, developers, IT staff, and donors, and will supervise the work of student assistants.Nominations and Applications: Review of applications will begin June 1 and will continue until the position is filled. Nominations and applications (cover letter, resume and the names, titles, addresses and phone numbers of three references) will be accepted only from the Jobs at Princeton website: http://www.princeton.edu/jobs. Essential Qualifications: *Master's degree from an ALA-accredited program with a concentration in archives management, or equivalent combination of education and experience.*Demonstrated knowledge of archives and records management theory and practice, including experience processing archival records. *Comprehensive knowledge of electronic records management principles and practices and digital preservation theory and practice. *Knowledge of strategies, such as computer forensics, and technology developed or adopted by the archival community for managing born-digital archival and manuscript material. *Knowledge of relevant standards for archival description including DACS, EAD, and EAC-CPF, and familiarity with other metadata standards such as METS and PREMIS.*Excellent supervisory and organizational skills and ability to plan, coordinate, and implement complex projects. *Ability to work both independently and collaboratively with a variety of staff in a rapidly changing environment. Preferred Qualifications: *Two to three years of relevant professional experience.*Experience implementing policies, standards, and procedures for stewardship of digital material in an archival or special collections setting. *Experience with FTK, floppy drive controllers (e.g. Catweasel, Kryoflux), writeblockers, Sleuth Kit, fiwalk, and emulators.*Experience with XSLT, XQuery and/or scripting languages (e.g. Ruby, Python). *Experience working in an active university records program. Education Required: Other-see essential qualifications Brought to you by code4lib jobs: http://jobs.code4lib.org/job/7489/
Re: [CODE4LIB] : Persian Romanization table
As explained in the last paragraph of http://www.loc.gov/catdir/cpso/romanization/persian.pdf : In romanizing Persian, the Library of Congress has found it necessary to consult dictionaries as an appendage to the romanization tables, primarily for the purpose of supplying vowels. For Persian, the principal dictionary consulted is: M. Muʼīn. Farhang-i Fārsī-i mutavassit. That is, any algorithm for romanizing Persian would need to not only map from Persian letters to roman ones but also to look up the word in a digital form of this dictionary in order to know what vowels to insert. The digital dictionary doesn't actually need to be transliterated; that is, instead of doing this: original == transliterated without vowels == transliterated with vowels (romanized) you can instead do this: original == Persian letters with vowels == transliterated with vowels (romanized) which would allow your dictionary to use the original form as the input. As Jane indicates, Persian and Hebrew both often omit vowels in the original, yet they are always supplied in romanization. Since dictionary lookups are not always perfect (especially with proper names), a human will likely have to tweak the vowels. The transliteration table also discusses when to capitalize the words in the romanized form: something else that will be quite difficult to code. In short, you will probably need to have a Persian-speaking librarian review the transliterated output of your code to correct errors. --Kevin On 4/18/13 10:37 AM, Jacobs, Jane W wrote: Hi Yan, The business of going from Original Script Persian to transliteration is much trickier than what we did, which was to go from Romanized Urdu BACK to Original Script Urdu. Unfortunately I haven�t tried going the other way, but it seems like it would require an ALA-Romanized Persian dictionary to make it work. Name might be easier, since there�s a lot of Persian in original script I the LC authority files and since names are often repeated you could get a lot use out of a modest sized dataset. I don�t know any rules of Persian orthography, but if there were any (like �i� before �e� except after �c� �) it would THEORETICALLY be possible to leverage those. Joel Hahn did a nice macro of Hebrew for OCLC (which has similar vocalization issues) but my Hebrew cataloger tells me that the vowels still have to be tweaked. Since I know even less about Hebrew than I do about Persian, I don�t know if there�s any part of his methodology you could repurpose for Persian. Sorry I can�t be of more help with this issue. JJ -Original Message- From: Han, Yan [mailto:h...@u.library.arizona.edu] Sent: Wednesday, April 17, 2013 8:14 PM To: Jacobs, Jane W; Code for Libraries (CODE4LIB@LISTSERV.ND.EDU); lit...@ala.org Cc: Seyede Pouye Khoshkhoosani Subject: RE: : Persian Romanization table Hello, All and Jane First I would like to appreciate Jane Jacobs at Queens Library providing me Urdu Romanization table. As we are working on creating Persian/Pushutu transliterate software, my Persian language expert has the following question : In according to our conversation for transliterating Persian to Roman letters, I faced a big problem: As the short vowels do not show up on or under the letters in Persian, how a machine can read a word in Persian. For example we have the word �??? ; to the machine this word is PDR, because it cannot read the vowels. There is no rule for the short vowels in the Persian language; so the machine does not understand if the first letter is �pi�, �pa� or �po�. Is there any way to overcome this obstacle? This seems to me that we missed a critical piece of information here. (Something like a dictionary). Without it, there is no way to have good translation from computer. We will have to have a Persian speaker to check/correct the computer's transliteration. Any suggestions ? Thanks, Yan -Original Message- From: Jacobs, Jane W [mailto:jane.w.jac...@queenslibrary.org] Sent: Wednesday, January 23, 2013 6:28 AM To: Han, Yan Subject: RE: : Persian Romanization table Hi Yan, As per my message to the listserve, here are the config files for Urdu. If you do a Persian config file, I d love to get it and if possible add it to the MARC::Detrans site. Let me know if you want to follow this road. JJ -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Han, Yan Sent: Tuesday, January 22, 2013 5:31 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] : Persian Romanization table Hello, All, I have a project to deal with Persian materials. I have already uses Google Translate API to translate. Now I am looking for an API to transliterate /Romanize (NOT Translate) Persian to English (not English to Persian). In other words, I have Persian in, and English out. There is a Romanization table (Persian romanization table - Library of