I have an Android phone recording of a meeting that I'd like to
transcribe to text, and was hoping there was an automagic way to convert
most of it.
My google-fu seems to be failing me, or else that info simply isn't out
there.
The closest I've found is the use of a tool called pocketsphinx (which
seems to be a toolset for developers to be used in applications, not a
finished product suitable for use by an end-user).
I saved my Android audio file to Google Drive, and downloaded it from
there to my main Debian box (for unknown reasons, I couldn't use GMail
to mail it as an attachment, or attach it to an FB Message).
As per what I was able to figure out about it, I then converted my .mp4
file to a .wav file:
ffmpeg -i ../Downloads/Dana\ of\ City.m4a -ar 16000 -ac 1 DanaOfCity.wav
I verified that worked with
aplay DanaOfCity.wav
Then I worked out the basics (I hoped) of the command to be this (mostly
from an example at
https://askubuntu.com/questions/161515/speech-recognition-app-to-convert-mp3-to-text
(answer 8, footnote 2):
pocketsphinx_continuous -infile ~/Downloads/DanaOfCity.wav -hmm
en_US/hub4wsj_sc_8k -lm en_US/hub4.5000.DMP 2> pocketsphinx.log
which I understand to mean:
- my input file is my data file "DanaOfCity.wav"
- the "-hmm" parameter (which pocketsphinx knows automatically is in
/usr/share/pocketsphinx/model) is to be the "en_US/hub4wwsj_sc_8k"
directory (which contains the "mdef" model (whatever that is), which it
had trouble finding until I added the "hub..." part)
- the "-lm" (is that a digit one or a letter ell? ah, it's an ell)
parameter works like the "-hmm" parameter above
- the 2> pocketsphinx.log routes the normal output of the app to a file
named "pocketsphinx.log" in the current directory
When I run this, the process files with this output in my
pocketsphinx.log output:
westk@westek:~/bub$ cat pocketsphinx.log
INFO: pocketsphinx.c(145): Parsed model-specific feature parameters from
/usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
Current configuration:
[NAME][DEFLT][VALUE]
-agcnonenone
-agcthresh2.02.00e+00
-allphone
-allphone_cinono
-alpha0.979.70e-01
-ascale20.02.00e+01
-aw11
-backtracenono
-beam1e-481.00e-48
-bestpathyesyes
-bestpathlw9.59.50e+00
-ceplen1313
-cmncurrentcurrent
-cmninit8.056,-3,1
-compallsennono
-debug0
-dict
-dictcasenono
-dithernono
-doublebwnono
-ds11
-fdict /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
-feat1s_c_d_dd1s_c_d_dd
-featparams
/usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
-fillprob1e-81.00e-08
-frate100100
-fsg
-fsgusealtpronyesyes
-fsgusefilleryesyes
-fwdflatyesyes
-fwdflatbeam1e-641.00e-64
-fwdflatefwid44
-fwdflatlw8.58.50e+00
-fwdflatsfwin2525
-fwdflatwbeam7e-297.00e-29
-fwdtreeyesyes
-hmm /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
-input_endianlittlelittle
-jsgf
-keyphrase
-kws
-kws_delay1010
-kws_plp1e-11.00e-01
-kws_threshold11.00e+00
-latsize50005000
-lda
-ldadim00
-lifter00
-lm /usr/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP
-lmctl
-lmname
-logbase1.00011.000100e+00
-logfn
-logspecnono
-lowerf133.41.00e+00
-lpbeam1e-401.00e-40
-lponlybeam7e-297.00e-29
-lw6.56.50e+00
-maxhmmpf33
-maxwpf-1-1
-mdef /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
-mean /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
-mfclogdir
-min_endfr00
-mixw
-mixwfloor0.0011.00e-07
-mllr
-mmapyesyes
-ncep1313
-nfft512512
-nfilt4020
-nwpen1.01.00e+00
-pbeam1e-481.00e-48
-pip1.01.00e+00
-pl_beam1e-101.00e-10
-pl_pbeam1e-101.00e-10
-pl_pip1.01.00e+00
-pl_weight3.03.00e+00
-pl_window55
-rawlogdir
-remove_dcnoyes
-remove_noiseyesyes
-remove_silenceyesyes
-round_filtersyesno
-samprate160001.60e+04
-seed