Re: [CODE4LIB] Oral history app and server
Yes. Or else it's a machine learning problem at far side, with speakers organized by, I dunno, geography. Regardless, the models will need training. Al Matthews, AUC Robert W. Woodruff Library 404.978.2057 o 404.769.2617 c - Reply message - From: "Gary McGath" To: "CODE4LIB@LISTSERV.ND.EDU" Subject: [CODE4LIB] Oral history app and server Date: Wed, Oct 3, 2012 5:06 pm Continuing on this part: My friend says that using any existing speech recognition software won't work at all well for transcribing interviews with a variety of people. All such software needs to be "trained" to the speaker's voice. A possible alternative is for a designated person to train the software and re-speak it into the speech recognition software. On 10/3/12 6:22 AM, Gary McGath wrote: > On 10/2/12 8:44 AM, Paul Orkiszewski wrote: >> - Processes the audio through speech recognition either in real time or >> post-interview, and populates the dbase record with rendered text (at >> whatever level of accuracy) > > You could do this piece with Dragon; see this post for some discussion: > > http://www.nuance.com/dragon/transcription-solutions/index.htm > > A friend of mine is an expert in this area and might be able to answer > some questions. -- Gary McGath, Professional Software Developer http://www.garymcgath.com - ** The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email in error please notify the system manager or the sender immediately and do not disclose the contents to anyone or make copies. ** IronMail scanned this email for viruses, vandals and malicious content. ** **
Re: [CODE4LIB] Oral history app and server
Continuing on this part: My friend says that using any existing speech recognition software won't work at all well for transcribing interviews with a variety of people. All such software needs to be "trained" to the speaker's voice. A possible alternative is for a designated person to train the software and re-speak it into the speech recognition software. On 10/3/12 6:22 AM, Gary McGath wrote: > On 10/2/12 8:44 AM, Paul Orkiszewski wrote: >> - Processes the audio through speech recognition either in real time or >> post-interview, and populates the dbase record with rendered text (at >> whatever level of accuracy) > > You could do this piece with Dragon; see this post for some discussion: > > http://www.nuance.com/dragon/transcription-solutions/index.htm > > A friend of mine is an expert in this area and might be able to answer > some questions. -- Gary McGath, Professional Software Developer http://www.garymcgath.com
Re: [CODE4LIB] Oral history app and server
"Record ; send ; speech-to-text ; share and improve" -- that's pretty much the algorithm. Or musically - Vamp til ready ||: fire aim ready :|| Paul On 10/3/12 4:01 PM, Al Matthews wrote: Hi all. Thanks Jason for the excellent links. Chrome seems to be out front with this last I looked. After somehow spending an hour reading all this, it seems like audio doesn't work yet, right? Except on Chromium "canary" on Mac. Which is something. Mozilla's also big into this as well http://mozillapopcorn.org/ https://wiki.mozilla.org/Audio_Data_API . The latter remains Firefox-specific and Mozilla marks it as deprecated. Still, it exists. Android has a speech API http://android-developers.blogspot.com/2010/03/speech-input-api-for-android.html, and implements Media Capture it seems. As a fine alternative, and more general, http://cmusphinx.sourceforge.net/wiki/gstreamer seems like a sane postprocessed example. Dear to me, that last. But doesn't one simplify all this by keeping recording off the cloud and building out the separate components? Record ; send ; speech-to-text ; share and improve . I do like this, Paul, the idea. Al Matthews, Software Dev, Atlanta University Center From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jason Ronallo [jrona...@gmail.com] Sent: Wednesday, October 03, 2012 2:00 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Oral history app and server Paul, You may want to look at WebRTC: http://www.webrtc.org/ Especially getUserMedia which allows for video capture within the browser from a users webcam: http://www.html5rocks.com/en/tutorials/getusermedia/intro/ This is bleeding edge stuff and probably not ready for a real project, but it may be that something like this enables the kind of project you're wanting to do. Chrome seems to be out front with this last I looked. Jason On Tue, Oct 2, 2012 at 8:44 AM, Paul Orkiszewski wrote: Hi 4libers, Does anyone know of something - a kiosk, an iPad app, a web application - that: - Initiates an oral history interview by getting demographic info and permission to use and stream for scholarly purposes. - ** The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email in error please notify the system manager or the sender immediately and do not disclose the contents to anyone or make copies. ** IronMail scanned this email for viruses, vandals and malicious content. ** ** -- *Paul Orkiszewski* Coordinator of Library Technology Services / Associate Professor University Library Appalachian State University 218 College Street P.O. Box 32026 Boone, NC 28608-2026 E-mail: orkiszews...@appstate.edu Phone: 828 262 6588 Fax: 828 262 2797
Re: [CODE4LIB] Oral history app and server
Hi all. Thanks Jason for the excellent links. > Chrome seems to be out front with this last I looked. After somehow spending an hour reading all this, it seems like audio doesn't work yet, right? Except on Chromium "canary" on Mac. Which is something. Mozilla's also big into this as well http://mozillapopcorn.org/ https://wiki.mozilla.org/Audio_Data_API . The latter remains Firefox-specific and Mozilla marks it as deprecated. Still, it exists. Android has a speech API http://android-developers.blogspot.com/2010/03/speech-input-api-for-android.html, and implements Media Capture it seems. As a fine alternative, and more general, http://cmusphinx.sourceforge.net/wiki/gstreamer seems like a sane postprocessed example. Dear to me, that last. But doesn't one simplify all this by keeping recording off the cloud and building out the separate components? Record ; send ; speech-to-text ; share and improve . I do like this, Paul, the idea. Al Matthews, Software Dev, Atlanta University Center From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jason Ronallo [jrona...@gmail.com] Sent: Wednesday, October 03, 2012 2:00 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Oral history app and server Paul, You may want to look at WebRTC: http://www.webrtc.org/ Especially getUserMedia which allows for video capture within the browser from a users webcam: http://www.html5rocks.com/en/tutorials/getusermedia/intro/ This is bleeding edge stuff and probably not ready for a real project, but it may be that something like this enables the kind of project you're wanting to do. Chrome seems to be out front with this last I looked. Jason On Tue, Oct 2, 2012 at 8:44 AM, Paul Orkiszewski wrote: > Hi 4libers, > > Does anyone know of something - a kiosk, an iPad app, a web application - > that: > > - Initiates an oral history interview by getting demographic info and > permission to use and stream for scholarly purposes. - ** The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email in error please notify the system manager or the sender immediately and do not disclose the contents to anyone or make copies. ** IronMail scanned this email for viruses, vandals and malicious content. ** **
Re: [CODE4LIB] Oral history app and server
Hi Paul, Thanks for your response! I like the idea that this could be a standalone way to capture first-person accounts as well as a way to launch more in-depth/traditional oral history interviews. Some of your requirements remind me of the National Library of Medicine's video player: "NLM Video Search accurately and quickly searches digital videos with embedded transcripts. In addition to offering a full-text search of a film’s transcript, the tool graphically displays where a search word or phrase occurs within the timeline of a film. ... NLM Video Search is based on a combination of open-source and inexpensive commercial multimedia tools enhanced with speech recognition technology." More here: http://www.hhs.gov/open/initiatives/hhsinnovates/round3/dustmonitor.html Best, Robin
Re: [CODE4LIB] Oral history app and server
Very cool. Audio should be easier than video. Thanks Jason! -- Paul On 10/3/12 2:00 PM, Jason Ronallo wrote: Paul, You may want to look at WebRTC: http://www.webrtc.org/ Especially getUserMedia which allows for video capture within the browser from a users webcam: http://www.html5rocks.com/en/tutorials/getusermedia/intro/ This is bleeding edge stuff and probably not ready for a real project, but it may be that something like this enables the kind of project you're wanting to do. Chrome seems to be out front with this last I looked. Jason On Tue, Oct 2, 2012 at 8:44 AM, Paul Orkiszewski wrote: Hi 4libers, Does anyone know of something - a kiosk, an iPad app, a web application - that: - Initiates an oral history interview by getting demographic info and permission to use and stream for scholarly purposes. -- *Paul Orkiszewski* Coordinator of Library Technology Services / Associate Professor University Library Appalachian State University 218 College Street P.O. Box 32026 Boone, NC 28608-2026 E-mail: orkiszews...@appstate.edu Phone: 828 262 6588 Fax: 828 262 2797
Re: [CODE4LIB] Oral history app and server
Paul, You may want to look at WebRTC: http://www.webrtc.org/ Especially getUserMedia which allows for video capture within the browser from a users webcam: http://www.html5rocks.com/en/tutorials/getusermedia/intro/ This is bleeding edge stuff and probably not ready for a real project, but it may be that something like this enables the kind of project you're wanting to do. Chrome seems to be out front with this last I looked. Jason On Tue, Oct 2, 2012 at 8:44 AM, Paul Orkiszewski wrote: > Hi 4libers, > > Does anyone know of something - a kiosk, an iPad app, a web application - > that: > > - Initiates an oral history interview by getting demographic info and > permission to use and stream for scholarly purposes.
Re: [CODE4LIB] Oral history app and server
Hi Robin, Thanks so much for your comments. I was thinking of a completely automated process. I'm thinking of it as oral history because, at least in the initial use of the program, we'd use a set list of questions for all respondents. I realise it probably won't be as good/useful as the product of a trained interviewer, and the system could accommodate machine and human mediation. That could be a part of the metadata so you could analyze how people respond to human vs computer questioning. Another possibility would be to use one set of questions for the computer interview, then invite participants to schedule a person-to-person interview. Kind of like recruiting people into a cult. I guess the main thing I'm trying to do is leverage technology to get oral histories available in an admittedly less-than-perfect form as quickly as possible so it can be improved via crowd sourcing. The interview's the easy part, but there's often a lag until it becomes useable. If people are committed and know what they're doing, the loop closes with a searchable archive of transcribed interviews. This is for people and organizations who are kind of committed and don't really know what they're doing. Thanks again for your thoughts and the links! Paul On 10/2/12 3:39 PM, Robin Dean wrote: Hi Paul, Just to clarify what you mean by "automated"--are you looking for a process that completely removes the need for an interviewer, and only involves people recording their answers to a questionnaire alone with a machine? The seems to be the model the "Outhouse" project was experimenting with. Even then, this article says that in one of the Outhouse initiatives, "around half" of the participants preferred to do face-to-face interviews rather than be recorded alone in a booth: http://camra.culturemap.org.au/central-darling/outhouse-research I think it's a good idea to digitally capture more first-person stories, but I have trouble thinking of them as "oral histories" without a human interviewer. If you're interested, here are a couple more projects that are looking at how to increase the number of digital oral histories that are captured, preserved, and usefully made accessible. Colorado Voice Preserve (they are currently looking at the infrastructure needed for a statewide oral history initiative, including technical requirements): http://www.voicepreserve.org IMLS "Oral History in the Digital Age" site: http://ohda.matrix.msu.edu/ Best, Robin Dean Director, Alliance Digital Repository Colorado Alliance of Research Libraries http://adrresources.coalliance.org/ -- *Paul Orkiszewski* Coordinator of Library Technology Services / Associate Professor University Library Appalachian State University 218 College Street P.O. Box 32026 Boone, NC 28608-2026 E-mail: orkiszews...@appstate.edu Phone: 828 262 6588 Fax: 828 262 2797
Re: [CODE4LIB] Using dbpedia to generate EAC-CPF collections
Wow. That's pretty spiff! I'd love to see your Roman Empire SNAC, can you send me the info? Michele -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ethan Gruber Sent: Wednesday, October 03, 2012 11:04 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Using dbpedia to generate EAC-CPF collections Hi all, In the last few weeks, I have undertaken a project of EAC-CPF stubs using dbpedia and VIAF data for the Roman emperors and their relations. There's a lot of great information available through dbpedia, and since it's available in RDF, I put together a PHP script that can start at one point in dbpedia (e.g., http://dbpedia.org/resource/Augustus) and traverse through its relations to create a network of stubs using links to parents, children, spouses, influences, successors, and predecessors provided in the RDF. Left unchecked, the script would crawl forward through the Byzantine period to spread laterally (chronologically speaking) to generate a network of the ruling hierarchy of the West up to the modern period. It also goes backwards to the successors of Alexander the Great. For all I know, it goes back through all of the Egyptian dynasties to Narmer ca. 3000 BC, but I haven't let the script go that far. The script is fairly generalizable, and can begin at any dbpedia resource. It's available at https://github.com/ewg118/xEAC/blob/master/misc/dbpedia-to-eac.php I should also note that this is a work in progress. To execute the script, you'll need to place a "temp" folder in the same place you download/execute it (for writing EAC records). At a glance, here's what it does: -Creates nameEntries for all of the names available in various languages in dbpedia -If a VIAF ID is available in the RDF, the script will pull some alternate record IDs from VIAF, as well as birth and death dates -Can pull in subjects, occupations, and related resources on the web -Generate corporate/personal/family relations given the parents/children/spouses/influences/successors/predecessors/dynasties linked in dbpedia. These relations are added into an array which continually processes until presumably it reaches the end of time. -You can specify an "end" record to attempt to break this chain, but I cannot guarantee that it'll work. Anastasius (emperor of Rome ca. 500 AD) does actually successfully terminate the Augustus chain. -Import birth and death places (and associated birth and death dates, if available) I think that these stubs are a good starting point for handing off the management of EAC content to subject specialists who can add chronological and geographical context. I wrote a bit more about this script and the process applied to xEAC, an XForms-based engine for creating, editing, managing, and publishing EAC-CPF collections at http://eaditor.blogspot.com/2012/10/using-dbpedia-to-jumpstart-eac-cpf.html There's a prototype collection of the Roman Empire; if anyone is interested in taking a look at it, drop me a line off the list. Ethan
[CODE4LIB] Using dbpedia to generate EAC-CPF collections
Hi all, In the last few weeks, I have undertaken a project of EAC-CPF stubs using dbpedia and VIAF data for the Roman emperors and their relations. There's a lot of great information available through dbpedia, and since it's available in RDF, I put together a PHP script that can start at one point in dbpedia (e.g., http://dbpedia.org/resource/Augustus) and traverse through its relations to create a network of stubs using links to parents, children, spouses, influences, successors, and predecessors provided in the RDF. Left unchecked, the script would crawl forward through the Byzantine period to spread laterally (chronologically speaking) to generate a network of the ruling hierarchy of the West up to the modern period. It also goes backwards to the successors of Alexander the Great. For all I know, it goes back through all of the Egyptian dynasties to Narmer ca. 3000 BC, but I haven't let the script go that far. The script is fairly generalizable, and can begin at any dbpedia resource. It's available at https://github.com/ewg118/xEAC/blob/master/misc/dbpedia-to-eac.php I should also note that this is a work in progress. To execute the script, you'll need to place a "temp" folder in the same place you download/execute it (for writing EAC records). At a glance, here's what it does: -Creates nameEntries for all of the names available in various languages in dbpedia -If a VIAF ID is available in the RDF, the script will pull some alternate record IDs from VIAF, as well as birth and death dates -Can pull in subjects, occupations, and related resources on the web -Generate corporate/personal/family relations given the parents/children/spouses/influences/successors/predecessors/dynasties linked in dbpedia. These relations are added into an array which continually processes until presumably it reaches the end of time. -You can specify an "end" record to attempt to break this chain, but I cannot guarantee that it'll work. Anastasius (emperor of Rome ca. 500 AD) does actually successfully terminate the Augustus chain. -Import birth and death places (and associated birth and death dates, if available) I think that these stubs are a good starting point for handing off the management of EAC content to subject specialists who can add chronological and geographical context. I wrote a bit more about this script and the process applied to xEAC, an XForms-based engine for creating, editing, managing, and publishing EAC-CPF collections at http://eaditor.blogspot.com/2012/10/using-dbpedia-to-jumpstart-eac-cpf.html There's a prototype collection of the Roman Empire; if anyone is interested in taking a look at it, drop me a line off the list. Ethan
Re: [CODE4LIB] Oral history app and server
On 10/2/12 8:44 AM, Paul Orkiszewski wrote: > Hi 4libers, > > Does anyone know of something - a kiosk, an iPad app, a web application > - that: I don't know of anything like it out there, but let's look at what it might take. I've done some software work in connection with Harvard's Iranian Oral History Project. > - Initiates an oral history interview by getting demographic info and > permission to use and stream for scholarly purposes. I'm not sure what you're saying here. It sounds as if you're talking about automated correspondence with the sources. That would be a huge project in itself, so I assume you've got something more narrowly focused in mind. > - Goes through a standard set of questions (in our case stuff about the > Appalachian State experience) There are two pieces to this: Recording the responses and storing the relevant metadata. The recording probably shouldn't be tied to a specific device or application, since field work can involve a lot of different conditions. The researcher in the field would want something to enter the metadata (who, what, when, where); this would be a straightforward piece. > - Stores the metadata, permissions release, and pointers to the audio > files created for each question in a dbase record You don't say what the scope of the work is; from the way you're putting the questions, I'm assuming it's a small-scale project with one researcher doing the interviews and putting the information together. Even so, It's probably best to have the field work be a separate application from assembling the information in the database. If nothing else, once you're at this point there's more standard software that can be used. > - Processes the audio through speech recognition either in real time or > post-interview, and populates the dbase record with rendered text (at > whatever level of accuracy) You could do this piece with Dragon; see this post for some discussion: http://www.nuance.com/dragon/transcription-solutions/index.htm A friend of mine is an expert in this area and might be able to answer some questions. > - Provide a search interface, where the meatadata, demographic info > (within reasonable privacy limits), and the transcript (however garbled) > is searchable. I'd suggest basing something on Apache Lucene. > - Crowd source the improvement of the transcriptions over time This needs to be better specified. One solution is to put the text onto a wiki. If you're talking about integrating it into the application that does all the rest, it could get messy. > - Package the interface as an app, and set up a machine image on Amazon > EC2, such that when someone uses the image and points a browser to it, > it goes through a set up routine so that smaller schools and historical > societies can set up their own sites in the cloud. I haven't tried > streaming on a free tier EC2 server, but you get 30 GB of storage, so > you could get a fair number of hours of audio (depending on the > settings) before you have to start paying. This, I assume, is why you're talking about treating the whole thing as a single application. Putting it all together would be a huge chunk of work. Dragon's software isn't free, and I don't know of anything for free that does decent speech transcription, so that would be a stumbling block to making it available to other institutions. > > ? > > Anyone interested in trying it with me if there's nothing already out > there? I'm leaning toward iPad, so we'd need iOS, server admin, dbase, > and media expertise. I have newbie-but-getting-better skill in the last > 3. Zero skill in iOS. I'm available for freelance work and it sounds very interesting, but you've just outlined a huge project that would be a significant burden even for the LoC's resources. That's not to say it can't be useful as a blue-sky starting point for something more reasonable. If you have funding, let's talk off-list. If you just want to continue blue-skying the idea for a while, I'm glad to continue on-list (and I promise not to bill you for that :). -- Gary McGath, Professional Software Developerdevelo...@mcgath.com
[CODE4LIB] Job: Education and Curatorial Traineeships at The Royal Commission on the Ancient and Historical Monuments of Scotland
'Skills for the Future' Fixed Term Appointment for 12 Months Salary £15,070 per annum 10 posts - Education Skills (4 places available) or in Curatorial Skills (6 places available) Applications are invited for one of ten traineeships at the Royal Commission on the Ancient and Historical Monuments of Scotland (RCAHMS) in either Education Skills (4 places available) or in Curatorial Skills (6 places available). The traineeships are being offered by RCAHMS as part of 'Skills for the Future', a national programme to create new opportunities in work-based training for the heritage sector funded by the Heritage Lottery Fund. You will be based in the RCAHMS office in Edinburgh, where you'll have the chance to develop practical skills by working with experienced RCAHMS staff on a range of interesting projects and work programmes. You will also have the opportunity to work with other regional and national heritage bodies on a placement basis. Towards the end of the 12 month program, you will also complete a 10-credit undergraduate-level distance learning module in your chosen skill area with the University of Dundee. By the end of the year, you'll have built up a practical portfolio of completed work which you can show to prospective employers as well as a distance-learning qualification that can be offered to universities as evidence of academic ability. You'll need to show us that you've achieved Credit Standard Grade or Intermediate 2, or equivalent, in English and Mathematics, in order to apply. No other qualifications or specific experience in the heritage sector are required, but you must be able to demonstrate that you are interested in the sector, that you can communicate effectively in speaking and writing, and that you can work both independently and as part of a team. You'll also need to be willing to engage with a wide range of different audiences, be familiar with using standard computer packages and know how to use the Web for communication and research. The traineeships are also open to applicants who have previously completed higher academic qualifications, except those who have already completed a postgraduate qualification in either Museum or Archive Studies. RCAHMS is a registered Scottish charity and an equal opportunities employer. Applications for these traineeships are open to everyone who can meet the conditions set out above, and we especially welcome applications from those without existing degree qualifications, and those with disabilities. The Curatorial traineeships will undertake a placement within our National Collection of Aerial Photography, much of which originated from military sources, and applications for these traineeships will be especially welcomed from those with a military background. It is a condition of appointment that Education trainees are cleared to work with children by joining the Protection of Vulnerable Groups Scheme, administered by Disclosure Scotland. RCAHMS will cover the cost involved in this. For further details of the training plan and an application form, go to the Jobs section, please click APPLY button below, email person...@rcahms.gov.uk or write to; Personnel, RCAHMS, John Sinclair House, 16 Bernard Terrace, Edinburgh, EH8 9NX. Tel: 0131 662 1456; Fax: 0131 662 1477 The closing date for the return of completed application forms is 12.00noon on Friday, 19th October 2012. Assessment centres will be held the week beginning 19th November 2012. Brought to you by code4lib jobs: http://jobs.code4lib.org/job/3683/