Re: Speech-To-Text?

2016-04-23 Thread deloptes
Kent West wrote:

> I have an Android phone recording of a meeting that I'd like to
> transcribe to text, and was hoping there was an automagic way to convert
> most of it.
> 
> My google-fu seems to be failing me, or else that info simply isn't out
> there.
> 

https://en.wikipedia.org/wiki/Speech_recognition_software_for_Linux
http://www.voxforge.org/

> 
> Anyone out there who can help me accomplish my goal, using pocketsphinx
> or any other suitable tool?
> 

I spent some years back in the beginning of the century doing research in
the area and playing with linux tools.
My conclusion back then (10+y ago) was that there is no usable STT and TTS
system. I later did a research and Master Thesis on Dialogue Systems
Things may have changed  now, but I doubt the quality of the linux software
anyway.
Basically there were/are just two providers of speach recognition engine
(IBM and Philips) resulting from joint force research in the 90'. It cost
millions to build. I've tested it and it worked very well 10y ago. It cost
about 150$US. But there are also cheaper versions with smaller domain
specific vocabulary.
If you want to play around you may try those linux tools, if you want 
professional results, where you do not have too correct 60% of the text, go
buy or borrow one.

regards






Speech-To-Text?

2016-04-22 Thread Kent West
I have an Android phone recording of a meeting that I'd like to 
transcribe to text, and was hoping there was an automagic way to convert 
most of it.


My google-fu seems to be failing me, or else that info simply isn't out 
there.


The closest I've found is the use of a tool called pocketsphinx (which 
seems to be a toolset for developers to be used in applications, not a 
finished product suitable for use by an end-user).


I saved my Android audio file to Google Drive, and downloaded it from 
there to my main Debian box (for unknown reasons, I couldn't use GMail 
to mail it as an attachment, or attach it to an FB Message).


As per what I was able to figure out about it, I then converted my .mp4 
file to a .wav file:


ffmpeg -i ../Downloads/Dana\ of\ City.m4a -ar 16000 -ac 1 DanaOfCity.wav

I verified that worked with

aplay DanaOfCity.wav

Then I worked out the basics (I hoped) of the command to be this (mostly 
from an example at 
https://askubuntu.com/questions/161515/speech-recognition-app-to-convert-mp3-to-text 
(answer 8, footnote 2):


pocketsphinx_continuous -infile ~/Downloads/DanaOfCity.wav -hmm 
en_US/hub4wsj_sc_8k -lm en_US/hub4.5000.DMP 2> pocketsphinx.log


which I understand to mean:
- my input file is my data file "DanaOfCity.wav"
- the "-hmm" parameter (which pocketsphinx knows automatically is in 
/usr/share/pocketsphinx/model) is to be the "en_US/hub4wwsj_sc_8k" 
directory (which contains the "mdef" model (whatever that is), which it 
had trouble finding until I added the "hub..." part)
- the "-lm" (is that a digit one or a letter ell? ah, it's an ell) 
parameter works like the "-hmm" parameter above
- the 2> pocketsphinx.log routes the normal output of the app to a file 
named "pocketsphinx.log" in the current directory


When I run this, the process files with this output in my 
pocketsphinx.log output:


westk@westek:~/bub$ cat pocketsphinx.log
INFO: pocketsphinx.c(145): Parsed model-specific feature parameters from 
/usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params

Current configuration:
[NAME][DEFLT][VALUE]
-agcnonenone
-agcthresh2.02.00e+00
-allphone
-allphone_cinono
-alpha0.979.70e-01
-ascale20.02.00e+01
-aw11
-backtracenono
-beam1e-481.00e-48
-bestpathyesyes
-bestpathlw9.59.50e+00
-ceplen1313
-cmncurrentcurrent
-cmninit8.056,-3,1
-compallsennono
-debug0
-dict
-dictcasenono
-dithernono
-doublebwnono
-ds11
-fdict /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
-feat1s_c_d_dd1s_c_d_dd
-featparams 
/usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params

-fillprob1e-81.00e-08
-frate100100
-fsg
-fsgusealtpronyesyes
-fsgusefilleryesyes
-fwdflatyesyes
-fwdflatbeam1e-641.00e-64
-fwdflatefwid44
-fwdflatlw8.58.50e+00
-fwdflatsfwin2525
-fwdflatwbeam7e-297.00e-29
-fwdtreeyesyes
-hmm /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
-input_endianlittlelittle
-jsgf
-keyphrase
-kws
-kws_delay1010
-kws_plp1e-11.00e-01
-kws_threshold11.00e+00
-latsize50005000
-lda
-ldadim00
-lifter00
-lm /usr/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP
-lmctl
-lmname
-logbase1.00011.000100e+00
-logfn
-logspecnono
-lowerf133.41.00e+00
-lpbeam1e-401.00e-40
-lponlybeam7e-297.00e-29
-lw6.56.50e+00
-maxhmmpf33
-maxwpf-1-1
-mdef /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
-mean /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
-mfclogdir
-min_endfr00
-mixw
-mixwfloor0.0011.00e-07
-mllr
-mmapyesyes
-ncep1313
-nfft512512
-nfilt4020
-nwpen1.01.00e+00
-pbeam1e-481.00e-48
-pip1.01.00e+00
-pl_beam1e-101.00e-10
-pl_pbeam1e-101.00e-10
-pl_pip1.01.00e+00
-pl_weight3.03.00e+00
-pl_window55
-rawlogdir
-remove_dcnoyes
-remove_noiseyesyes
-remove_silenceyesyes
-round_filtersyesno
-samprate160001.60e+04
-seed