Re: [PLUG] Current state of Linux voice recognition

2017-06-30 Thread Richard Owlett
On 06/29/2017 02:24 PM, VY wrote:
>> Do you recall the sample bit depth and rate - [so I could compare to
>> hardware I'll be looking at].
>
> This has been more than a decade so my recollection may be off.
>
> I was not doing anything that require very clear perception.  It
> was a home toy project albeit with practice home-automation use.
> I think it was something like 12bit or 24bit.   Somehow, that number
> comes to my mind when I saw your question.
> But again, I could be off
>
> -v

If the information was available it would have been a useful data point.
Thank you.


>
>
>
> On Thu, Jun 29, 2017 at 10:17 AM, Richard Owlett 
> wrote:
>
>> On 06/29/2017 11:17 AM, VY wrote:
>>> About 11 years ago, I had a PC and using older version of this software:
>>>
>>>   https://cmusphinx.github.io/
>>
>> That page speaks of Pocketsphinx, an apparently current version is in
>> the Debian repository. Synaptic reports "This package contains end-user
>> speech recognition tools", so it should be good start.
>>
>> The site has valuable background information.
>>
>>>
>>> I had to write some scripts but I have had some good success.  I did
>>> have a hardware audio/video capture card but not sure if that is
>>> making any difference or not.
>>
>> Do you recall the sample bit depth and rate - [so I could compare to
>> hardware I'll be looking at].
>>
>>> I would think it is much better by now.  I no longer have such a
>>> system in place.
>>
>> Thank you.


___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-29 Thread VY
> Do you recall the sample bit depth and rate - [so I could compare to
> hardware I'll be looking at].

This has been more than a decade so my recollection may be off.

I was not doing anything that require very clear perception.  It was a home
toy project albeit with practice home-automation use.
I think it was something like 12bit or 24bit.   Somehow, that number comes
to my mind when I saw your question.
But again, I could be off

-v



On Thu, Jun 29, 2017 at 10:17 AM, Richard Owlett 
wrote:

> On 06/29/2017 11:17 AM, VY wrote:
> > About 11 years ago, I had a PC and using older version of this software:
> >
> >   https://cmusphinx.github.io/
>
> That page speaks of Pocketsphinx, an apparently current version is in
> the Debian repository. Synaptic reports "This package contains end-user
> speech recognition tools", so it should be good start.
>
> The site has valuable background information.
>
> >
> > I had to write some scripts but I have had some good success.  I did
> > have a hardware audio/video capture card but not sure if that is
> > making any difference or not.
>
> Do you recall the sample bit depth and rate - [so I could compare to
> hardware I'll be looking at].
>
> > I would think it is much better by now.  I no longer have such a
> > system in place.
>
> Thank you.
>
>
>
>
> ___
> PLUG mailing list
> PLUG@lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug
>
___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-29 Thread Richard Owlett
On 06/29/2017 11:17 AM, VY wrote:
> About 11 years ago, I had a PC and using older version of this software:
>
>   https://cmusphinx.github.io/

That page speaks of Pocketsphinx, an apparently current version is in 
the Debian repository. Synaptic reports "This package contains end-user 
speech recognition tools", so it should be good start.

The site has valuable background information.

>
> I had to write some scripts but I have had some good success.  I did
> have a hardware audio/video capture card but not sure if that is
> making any difference or not.

Do you recall the sample bit depth and rate - [so I could compare to 
hardware I'll be looking at].

> I would think it is much better by now.  I no longer have such a
> system in place.

Thank you.




___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-29 Thread Richard Owlett
On 06/29/2017 02:15 AM, King Beowulf wrote:
> On 06/28/2017 06:52 AM, Richard Owlett wrote:
>> Up until about ten years ago, while still using Windows, I was
>> following  voice recognition. At that time the only option was
>> commercial product  which cost too much and wasn't a good match
>> for my desires at that time.
>>
>> Time has passed and I'm retired. What I'm looking for would be
>> a large vocabulary, single speaker, continuous speech system.
>> The application would be straight text note taking - I'm a slow
>> and lousy typist.
>>
>> I'm already investigating good microphones with good A/D resolution
>> and preferably high sample rate [I've ideas on pre-processing I
>> would like to experiment with].
>>
>> Can anyone recommend some survey articles &/or competent current
>> reviews.
>> TIA
>>
>>
>
> There are a few speech recognition engines that are F/OSS.  A brief
> summery is here:
> https://en.wikipedia.org/wiki/Speech_recognition_software_for_Linux
> https://en.wikipedia.org/wiki/List_of_speech_recognition_software
> might still be useful:
> http://tldp.org/HOWTO/Speech-Recognition-HOWTO/software.html

I had the first Wikipedia article. The software listing of the TLDP 
reference may be dated but the whole article appears to be good guidance 
for more research.

>
> Some leverage Googles speech API - and everything you say gets
> uploaded to Google.

Anything involving Google is a non-stater!

> There are several engines and frontends for GTK and KDE
> (QT).  Quality can be a bit rough, but that depends on your accent and
> how the software engine was/is  trained.
>
> It's been awhile since I played with any of this stuff. The Google API
> was pretty good, but tended to lag a bit - perhaps better now.
>
> -Ed

Thank you.







___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-29 Thread King Beowulf
On 06/28/2017 06:52 AM, Richard Owlett wrote:
> Up until about ten years ago, while still using Windows, I was following 
> voice recognition. At that time the only option was commercial product 
> which cost too much and wasn't a good match for my desires at that time.
> 
> Time has passed and I'm retired. What I'm looking for would be a large 
> vocabulary, single speaker, continuous speech system. The application 
> would be straight text note taking - I'm a slow and lousy typist.
> 
> I'm already investigating good microphones with good A/D resolution and 
> preferably high sample rate [I've ideas on pre-processing I would like 
> to experiment with].
> 
> Can anyone recommend some survey articles &/or competent current reviews.
> TIA
> 
> 

There are a few speech recognition engines that are F/OSS.  A brief
summery is here:
https://en.wikipedia.org/wiki/Speech_recognition_software_for_Linux
https://en.wikipedia.org/wiki/List_of_speech_recognition_software
might still be useful:
http://tldp.org/HOWTO/Speech-Recognition-HOWTO/software.html

Some leverage Googles speech API - and everything you say gets uploaded
to Google.  There are several engines and frontends for GTK and KDE
(QT).  Quality can be a bit rough, but that depends on your accent and
how the software engine was/is  trained.

It's been awhile since I played with any of this stuff. The Google API
was pretty good, but tended to lag a bit - perhaps better now.

-Ed
___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-28 Thread John Jason Jordan
On Wed, 28 Jun 2017 16:39:16 -0700 (PDT)
Rich Shepard  dijo:

>On Wed, 28 Jun 2017, John Jason Jordan wrote:
>
>> Human languages all have a unique phonetic inventory, that is, all
>> the individual sounds of the language. English has 41 - 43 phonemes
>> (depending on your dialect), ...
>
>   Does this include Brooklyn and New Joisey?

Oh yes!

Most Portlanders have 41 phonemes, but people in Brooklyn have 42,
because 'cot' and 'caught' are pronounced with different vowels in
Brooklyn, but in Portland those two vowels have merged into one. This
is called the 'low back merger,' or sometimes just the 'cot-caught
merger.'

As for New Jersey, they also have 42, for the same reason as in
Brooklyn, although their most salient difference is the substitution of
the diphthong [ǝi] for syllabic r [ɹ̩], so where I say Jersey as
[ʤɹ̩zi] they pronounce it as [ʤǝ͡izi]. Note that their diphthong is
uh-ee, not oh-ee.
___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-28 Thread Dick Steffens
On 06/28/2017 04:39 PM, Rich Shepard wrote:
> On Wed, 28 Jun 2017, John Jason Jordan wrote:
>
>> Human languages all have a unique phonetic inventory, that is, all the
>> individual sounds of the language. English has 41 - 43 phonemes (depending
>> on your dialect), ...
> Does this include Brooklyn and New Joisey?

Or suthern Ohia?

-- 
Regards,

Dick Steffens

___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-28 Thread Rich Shepard
On Wed, 28 Jun 2017, John Jason Jordan wrote:

> Human languages all have a unique phonetic inventory, that is, all the
> individual sounds of the language. English has 41 - 43 phonemes (depending
> on your dialect), ...

   Does this include Brooklyn and New Joisey?

Rich
___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-28 Thread John Jason Jordan
On Wed, 28 Jun 2017 10:48:47 -0500
Richard Owlett  dijo:

>On 06/28/2017 09:54 AM, Larry Brigman wrote:
>> Human voice frequency range tops out at 8khz.  Normal speech is
>> around 2-3khz.  

> That's the theory that's been around "forever".
>I'm in possession of a factoid that prompts me to do some research 
>needing high resolution at high sample rates.

There is a lot to say about acoustic phonetics, and I'm not sure where
to start.

It is correct that most human speech tops out at about 3 KHz, but the
lower limit mentioned above is too high. In fact, an adult male with a
large head and vocal tract can produce speech sounds as low as 60 Hz
(e.g., a basso profundo). Most males start at about 85 Hz. For
vowels, sounds above 3 KHz are actually just harmonics, in decreasing
volume the higher the harmonic. The harmonics normally contribute
little to perception of vowels. Some consonants, however, use much
higher frequencies, notably the stridents, of which English has an
embarras de richesse.

Human languages all have a unique phonetic inventory, that is, all the
individual sounds of the language. English has 41 - 43 phonemes
(depending on your dialect), and many of them have two or more
allophones. Of these 10 - 11 (again, depending on dialect) are vowels,
plus there are a handful of diphthongs. Each sound has a specific set
of frequencies, plus there are other issues that hearers pick up, e.g.,
length of the sound.

Now I address the issue of frequency, starting with the vowels. When
you utter a vowel you actually produce three frequencies (called
formants) simultaneously. The lower two are the critical ones, and the
upper one could be considered as a kind of checksum. The formants for
the vowel [i] (as in 'beet') average around 280, 2250, and 2900 Hz,
whereas for the vowel [ɪ] (as in 'bit') the formants are around 400,
1900 and 2550 Hz. 

Now here is the crucial point: It is the distance between the two
lower formants that makes our brains think 'oh I just heard an [i],' or
I just heard an [ɪ]. Why is this important? Because every human has a
different 'fundamental frequency,' determined mostly by the size of the
vocal tract. Just as your 6th grade science teacher demonstrated by
pinging the sides of glasses filled with different levels of water, the
larger the volume of air the lower the frequency that will be produced.
Men tend to have larger vocal tracts than women, so males tend to have a
lower fundamental frequency than women. If our perception of vowels was
determined just by the absolute frequencies we wouldn't be able to
understand anything. But the system works because the distance between
the lower two formants is identical whether the speaker has a high or a
low fundamental frequency. The numbers I gave above for [i] are
actually an average; for a man they might be 120,  2090, 2730, whereas
for a woman they might be 380, 2350, and 2990. Note for  both speakers
the difference between the lower two formants is still 1970 Hz for [i]. 

'Speaker normalization' is a term used by phoneticians to describe an
amazing feature of the human brain - the instantaneous unconscious
ability to perceive the fundamental frequency of a speaker the moment
they open their mouth and utter the first couple of sounds, even if you
have never heard the speaker before. 

Now I turn to consonants. Consonants also have formants, but the upper
formants are the most important, and they can be much higher, even
higher than 3 KHz. For example, the upper formant for [s] (as in 'hiss')
ranges from around 4900 to 6000 Hz, depending on the speaker's
fundamental frequency and the vowel(s) that precede or follow it - which
leads me to problems with telephony.

A long time ago when telephone systems were first being developed the
telephone companies decided, for purely economic reasons, to limit the
bandwidth that their equipment could perceive and reproduce to 300 -
3400 Hz. (Those figures are present-day standards; in the beginning
they weren't even that generous.) Equipment that could do a wider range
would have increased expense massively. Unfortunately, this produces
the famous expressions 's as in Sam' or 'f as in Frank,' because the
equipment doesn't go high enough to reproduce the upper formant of [s],
making it impossible to distinguish it from [f] on a telephone.

Having said all of that, there is a lot more to human speech
recognition than having equipment capable of adequate bandwidth. Our
human brains juggle so much input so rapidly that we have to use
shortcuts. Let me give you just one example: If you hear an article (a,
an, the) your brain knows that it always introduces a noun phrase so the
next word absolutely must be a noun, a nominal modifier, or an
intensifier. If you speak a language then every word in your lexicon is
flagged as to which categories it can be used for. This means that as
you try to decipher the next word that you are hearing you can discard a
vast amount of your lexicon as 

Re: [PLUG] Current state of Linux voice recognition

2017-06-28 Thread Larry Brigman
Actually it is fact. Amateur radio operators and radio to digital voice use
a 2k wide bandwidth filter.
IRLP project uses 8khz sampling.

On Jun 28, 2017 8:51 AM, "Richard Owlett"  wrote:

> On 06/28/2017 09:54 AM, Larry Brigman wrote:
> > Not really answering your question but providing important facts
> > about the domain.
> > Human voice frequency range tops out at 8khz.  Normal speech is around
> > 2-3khz.  Any soundcard that samples at 48khz will work just fine.
>
>  That's the theory that's been around "forever".
> I'm in possession of a factoid that prompts me to do some research
> needing high resolution at high sample rates.
>
> For more than 60 years I've had significant high frequency hearing loss
> in one ear due to repeated ear infections in childhood. Forty years ago
> I consulted a major medical center in Boston at the prompting of a RN
> friend. The formal diagnosis was "nerve deafness" based on comparing my
> tests with headphones and a "bone conduction" setup. I personally
> suspect there are also complications attributable to scar tissue on the
> eardrum.
>
> I date from the era when all males attending "land grant institutions"
> were required to take two years of ROTC. I planned to take Advanced
> ROTC. I barely passed the physical due to my hearing loss. I have
> *personal* knowledge that 'high frequency' components are important as I
> must use my "good ear" when expecting to understand female speakers.
>
> If I recall my terminology [and spelling ;] correctly, consider
> "sibilants" and "fricatives" [possibly also "full stops"].
>
>
> >
> > On Wed, Jun 28, 2017 at 6:52 AM, Richard Owlett 
> wrote:
> >
> >> Up until about ten years ago, while still using Windows, I was following
> >> voice recognition. At that time the only option was commercial product
> >> which cost too much and wasn't a good match for my desires at that time.
> >>
> >> Time has passed and I'm retired. What I'm looking for would be a large
> >> vocabulary, single speaker, continuous speech system. The application
> >> would be straight text note taking - I'm a slow and lousy typist.
> >>
> >> I'm already investigating good microphones with good A/D resolution and
> >> preferably high sample rate [I've ideas on pre-processing I would like
> >> to experiment with].
> >>
> >> Can anyone recommend some survey articles &/or competent current
> reviews.
> >> TIA
>
>
>
>
> ___
> PLUG mailing list
> PLUG@lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug
>
___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-28 Thread wes
Speech recognition recognition is built in to any smart phone. May not
serve your high-frequency research purposes, but it'll help you take notes
without issue.

-wes

On Wed, Jun 28, 2017 at 6:52 AM, Richard Owlett  wrote:

> Up until about ten years ago, while still using Windows, I was following
> voice recognition. At that time the only option was commercial product
> which cost too much and wasn't a good match for my desires at that time.
>
> Time has passed and I'm retired. What I'm looking for would be a large
> vocabulary, single speaker, continuous speech system. The application
> would be straight text note taking - I'm a slow and lousy typist.
>
> I'm already investigating good microphones with good A/D resolution and
> preferably high sample rate [I've ideas on pre-processing I would like
> to experiment with].
>
> Can anyone recommend some survey articles &/or competent current reviews.
> TIA
>
>
> ___
> PLUG mailing list
> PLUG@lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug
>
___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-28 Thread Richard Owlett
On 06/28/2017 09:54 AM, Larry Brigman wrote:
> Not really answering your question but providing important facts
> about the domain.
> Human voice frequency range tops out at 8khz.  Normal speech is around
> 2-3khz.  Any soundcard that samples at 48khz will work just fine.

 That's the theory that's been around "forever".
I'm in possession of a factoid that prompts me to do some research 
needing high resolution at high sample rates.

For more than 60 years I've had significant high frequency hearing loss 
in one ear due to repeated ear infections in childhood. Forty years ago 
I consulted a major medical center in Boston at the prompting of a RN 
friend. The formal diagnosis was "nerve deafness" based on comparing my 
tests with headphones and a "bone conduction" setup. I personally 
suspect there are also complications attributable to scar tissue on the 
eardrum.

I date from the era when all males attending "land grant institutions" 
were required to take two years of ROTC. I planned to take Advanced 
ROTC. I barely passed the physical due to my hearing loss. I have 
*personal* knowledge that 'high frequency' components are important as I 
must use my "good ear" when expecting to understand female speakers.

If I recall my terminology [and spelling ;] correctly, consider 
"sibilants" and "fricatives" [possibly also "full stops"].


>
> On Wed, Jun 28, 2017 at 6:52 AM, Richard Owlett  wrote:
>
>> Up until about ten years ago, while still using Windows, I was following
>> voice recognition. At that time the only option was commercial product
>> which cost too much and wasn't a good match for my desires at that time.
>>
>> Time has passed and I'm retired. What I'm looking for would be a large
>> vocabulary, single speaker, continuous speech system. The application
>> would be straight text note taking - I'm a slow and lousy typist.
>>
>> I'm already investigating good microphones with good A/D resolution and
>> preferably high sample rate [I've ideas on pre-processing I would like
>> to experiment with].
>>
>> Can anyone recommend some survey articles &/or competent current reviews.
>> TIA




___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-28 Thread Alex Bedard
Surprisingly, the state of speech recognition, at least when it comes to 
speech to text / dictation solutions, was actually better 10 years ago 
as far as Linux is concerned today. At least when it comes to affordable 
consumer/end user solutions

The predecessor of Dragon's Naturally Speaking actually had a Linux 
version which worked pretty well. However, after IBM sold that business 
unit to Nuance/Dragon, they stopped offering the Linux version of their 
product.

Apparently, it is possible to use Naturally Speaking through Wine, 
although I haven't tested it... Using things emulated through Wine can 
be pretty hit or miss.

There are some open source voice recognition engines for Linux, but 
generally speaking they are terrible for dictation/speech to text.

Also, in regards to sound cards, either in Windows or Linux, I would 
recommend not using your soundcard with a 3.5mm jack microphone. Good 
USB microphones generally have a DSP chip that produces a clearer sound 
and I've had better success with dictating through these than with 
analog/3.5mm microphones. Some of these can be found for as low as ~30$.

Alex

On 06/28/2017 06:52 AM, Richard Owlett wrote:
> Up until about ten years ago, while still using Windows, I was following
> voice recognition. At that time the only option was commercial product
> which cost too much and wasn't a good match for my desires at that time.
>
> Time has passed and I'm retired. What I'm looking for would be a large
> vocabulary, single speaker, continuous speech system. The application
> would be straight text note taking - I'm a slow and lousy typist.
>
> I'm already investigating good microphones with good A/D resolution and
> preferably high sample rate [I've ideas on pre-processing I would like
> to experiment with].
>
> Can anyone recommend some survey articles &/or competent current reviews.
> TIA
>
>
> ___
> PLUG mailing list
> PLUG@lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug

___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Current state of Linux voice recognition

2017-06-28 Thread Larry Brigman
Not really answering your question but providing important facts about the
domain.
Human voice frequency range tops out at 8khz.  Normal speech is around
2-3khz.  Any soundcard that samples at 48khz will work just fine.

On Wed, Jun 28, 2017 at 6:52 AM, Richard Owlett  wrote:

> Up until about ten years ago, while still using Windows, I was following
> voice recognition. At that time the only option was commercial product
> which cost too much and wasn't a good match for my desires at that time.
>
> Time has passed and I'm retired. What I'm looking for would be a large
> vocabulary, single speaker, continuous speech system. The application
> would be straight text note taking - I'm a slow and lousy typist.
>
> I'm already investigating good microphones with good A/D resolution and
> preferably high sample rate [I've ideas on pre-processing I would like
> to experiment with].
>
> Can anyone recommend some survey articles &/or competent current reviews.
> TIA
>
>
> ___
> PLUG mailing list
> PLUG@lists.pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug
>
___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


[PLUG] Current state of Linux voice recognition

2017-06-28 Thread Richard Owlett
Up until about ten years ago, while still using Windows, I was following 
voice recognition. At that time the only option was commercial product 
which cost too much and wasn't a good match for my desires at that time.

Time has passed and I'm retired. What I'm looking for would be a large 
vocabulary, single speaker, continuous speech system. The application 
would be straight text note taking - I'm a slow and lousy typist.

I'm already investigating good microphones with good A/D resolution and 
preferably high sample rate [I've ideas on pre-processing I would like 
to experiment with].

Can anyone recommend some survey articles &/or competent current reviews.
TIA


___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug