[sphinx-users] didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

murugan . r Fri, 21 Sep 2018 00:13:36 -0700

### Problem
-*Sir I didn't integrete for custom en-in Acoustic Model(Adapting the 
default acoustic model-Indian English) and custom Language Model.*
i was download a acoustic model from this link: [](
https://sourceforge.net/projects/cmusphinx/files/)
i follow  the instruction this link: 
https://cmusphinx.github.io/wiki/tutorialam/


*sphinx_fe -argfile en_in/feat.params -samprate 16000 -c audio.fileids -di 
. -do . -ei wav -eo mfc -mswav yes*

*pocketsphinx_mdef_convert -text en_in/mdef en_in/mdef.txt*

*cp -a /usr/local/libexec/sphinxtrain/bw .*
*cp -a /usr/local/libexec/sphinxtrain/mk_s2sendump .*
*cp -a /usr/local/libexec/sphinxtrain/map_adapt .*
*cp -a /usr/local/libexec/sphinxtrain/mllr_solve .*

*./bw \*
* -hmmdir en_in \*
* -moddeffn en_in/mdef.txt \*
* -ts2cbfn .cont. \*
* -feat 1s_c_d_dd \*
* -cmn current \*
* -agc none \*
* -dictfn en_in.dic \*
* -ctlfn audio.fileids \*
* -lsnfn audio.transcription \*
* -accumdir .*


*./mllr_solve \*
*    -meanfn en_in/means \*
*    -varfn en_in/variances \*
*    -outmllrfn mllr_matrix -accumdir .*

*cp -a en_in en_in_own*

*./map_adapt \*
*    -moddeffn en_in/mdef.txt \*
*    -ts2cbfn .cont. \*
*    -meanfn en_in/means \*
*    -varfn en_in/variances \*
*    -mixwfn en_in/mixture_weights \*
*    -tmatfn en_in/transition_matrices \*
*    -accumdir . \*
*    -mapmeanfn en_in_own/means \*
*    -mapvarfn en_in_own/variances \*
*    -mapmixwfn en_in_own/mixture_weights \*
*    -maptmatfn en_in_own/transition_matrices*

*./mk_s2sendump \*
*    -pocketsphinx yes \*
*    -moddeffn en_in_own/mdef.txt \*
*    -mixwfn en_in_own/mixture_weights \*
*    -sendumpfn en_in_own/sendump*

*pocketsphinx_continuous -hmm en_in_own -lm en-us.lm.bin -dict en_in.dic 
-infile 38.wav > 4.txt*

it is working but not predicting a particular words. words is relevant to 
banking sectors.so i build again own language model using language model 
build tool (Building a simple language model using a web service)

***own language model: lm.dict & lm.bin:***
transcript file: own_vocab.txt
[](http://www.speech.cs.cmu.edu/tools/product/1537337608_14460/)

*sphinx_lm_convert -i own.lm -o own.lm.bin*
*sphinx_lm_convert -i own.lm.bin -ifmt bin -o own.lm -ofmt arpa*

*pocketsphinx_continuous -inmic yes -lm own.lm.bin -dict own.dic*

sir, it is working fine. detecting that particular words. but one confusion,

>* which default acoustic model it takes and run on that command " 
pocketsphinx_continuous -inmic yes -lm own.lm.bin -dict own.dic" ?*

but i *integrete these two AM and LM*,  and run on,

*pocketsphinx_continuous -hmm en_in_own -lm own.lm.bin -dict own.dic 
-infile 1.wav > result_own.txt*

it was not return any words. and it shows error.* phone words dict in the 
LM not present in the AM*.

INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic 
model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic 
model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic 
model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic 
model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic 
model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic 
model; word 'absolutely' ignored

but some how i identify the issue. what it is *phone words*(own.dict) EH, 
EY, AH, AE always present*s in the en_in acoustic model(INDIAN ENGLISH)*  
but it is in SMALL CASE.(en_in/ mdef file).
*another Acoustic models like 
en-us, hub4_cd_continuous_8gau_1s_c_d_dd, wsj_all_cd30.mllt_cd_cont_4000 
mdef phones in CAPITAL LETTERS. *

Columns definitions
#base lft  rt p attrib tmat      ... state id's ...
  SIL   -   - - filler    0      0      1      2 N
  UNK   -   - -    n/a    1      3      4      5 N
   aa   -   - -    n/a    2      6      7      8 N
   ae   -   - -    n/a    3      9     10     11 N
   ah   -   - -    n/a    4     12     13     14 N

i tried something own.dic phones into small case but it was not reflect 
both AM & LM.

Basically that LM tool gives these kind of structure words and phones. it 
is affecting acoustic model model. these two not sync.

i tried another way something to create a own.lm.bin & own.dic also

**Build an other way LM:**
*text2wfreq < own_vocab.txt | wfreq2vocab > own_vocab.tmp.vocab*

*text2idngram -vocab own_vocab.tmp.vocab -idngram own_vocab.idngram < 
own_vocab.txt*
 
*idngram2lm -vocab_type 0 -idngram own_vocab.idngram -vocab 
own_vocab.tmp.vocab -arpa own.lm*

*sphinx_lm_convert -i own.lm -o own.lm.bin*

**Build a own.dic an other way:**
i was followed these 
link:[](https://cmusphinx.github.io/wiki/tutorialdict/) &  
[](https://github.com/cmusphinx/g2p-seq2seq)

*g2p-seq2seq --decode own_vocab.tmp.vocab --model_dir 
g2p-seq2seq/g2p-seq2seq-model-6.2-cmudict-nostress --output own.dic*
*pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 
10.txt*

it is working fine to predicting a particular words but  that confusion is,

> which acoustic model is combined to run on that command 
"pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 
10.txt"

but i* integrete these two AM and LM, * and run on,
*pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 
10.txt -hmm en_in_own*

Again it was return the same error. it was not display any text. the error 
log is,

INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic 
model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic 
model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic 
model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic 
model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic 
model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic 
model; word 'absolutely' ignored

LM tool produced dict(word-phone) format:
A AH
A(2) EY
ABLE EY B AH L
ABOUT AH B AW T
ABSOLUTELY AE B S AH L UW T L IY

LM g2p-seq2seq produced dict(word-phone) format:
s EH S
s EH S
a EY
able EY B AH L
about AH B AW T
absolutely AE B S AH L UW T L IY

en_in_own mdef phones structure:
ia   f  aa s    n/a   20   2023   2038   2063 N
   ia   f  ae e    n/a   20   2023   2038   2063 N
   ia   f  ae s    n/a   20   2023   2038   2063 N
   ia   f  ah e    n/a   20   2023   2038   2063 N
   ia   f  ah s    n/a   20   2023   2038   2063 N
   ia   f  ao e    n/a   20   2023   2038   2063 N
   ia   f  ao s    n/a   20   2023   2038   2063 N
   ia   f  aw e    n/a   20   2023   2038   2063 N

really is those small case was an issue or not? i was not able to predict 
this issue.

Sir How can i fix this issue? 
i didn't integrete for custom en-in AM and custom LM,  ERROR: "dict.c", 
Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

- OS: Linux with version 16.04
- Python3:
- Sphinx version:
PocketSphinx 5prealpha

-- 
You received this message because you are subscribed to the Google Groups 
"sphinx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sphinx-users.
For more options, visit https://groups.google.com/d/optout.

[sphinx-users] didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

Reply via email to