Re: speech - text on FR?

2008-06-16 Thread Stroller

On 16 Jun 2008, at 01:34, Dan Staley wrote:

 I actually just interfaced with the Sphinx project at one of the
 research positions I hold.  It is actually a very well written  
 interface
 (for the most part...there were a few things poorly documented and/or
 implemented)

Apparently the Openmoko GSoC contributor has also found this:
http://lists.openmoko.org/pipermail/community/2008-June/018752.html

He's following the list, so I'm sure he'll be along shortly.  
Hopefully you'll be able to give him some pointers.

Stroller.


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: speech - text on FR?

2008-06-16 Thread Gilles Casse
On Lun 16 juin 2008 6:00, Brandon Kruse a écrit :

 They also have a sphinx mobile-type of library, which seems to be very
 lightweight, and might be worth looking into.


This benchmark (August 2007) compares PocketSphinx, Sphinx 2, 3 (on AMD
Athlon 1670 MHz, 512MB of RAM).

http://raphaelnunes.wordpress.com/2007/08/08/benchmark-of-sphinx2-sphinx3-pocketsphinx/

Best regards,
Gilles


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: speech - text on FR?

2008-06-16 Thread saurabh gupta
Hello Ajit,

On Mon, Jun 16, 2008 at 4:37 AM, Ajit Natarajan [EMAIL PROTECTED] wrote:

 Hello,

 I know nothing about speech recognition, so if the following won't work,
 please let me know (gently :) ).

 I understand that there is a project called Sphinx in CMU which attempts
 speech recognition.  It seems pretty complex.  I couldn't get it to work
 on my Linux desktop.  I'm not sure if it would work on an FR since it
 may need a lot of CPU horsepower and memory.


Indeed the sphinx packages are very well written but they were compiled with
the aim of desktop processors. With a lot of data management and large
storage, it implements a lot floating point calculations and algorithms. On
FR like devices, codes has to be properly adopted and modified. In fact this
is the very aim of the GSoC Speech Recognition Project to prepare a speech
recognition engine which can run on a processor with 256 or maximum 400MHz
without floating point hardware.



 I see a speech project on the OM projects page.  To me, it seems like
 the project is attempting command recognition, e.g., voice dialing.
 However, it would be great if the FR can function as a rudimentary
 dictation machine, i.e., allow the user to speak and convert to text.


yes, once the speech recognition engine is ready then a lot of applications
can be built on it. The basic aim of speech recognition will be to identify
the work spoken by comparing it with the HMM models of the stored words
dictionary and calculating the maximum probability. Once a word has been
detected, any API can be called corresponding to that word.


 Perhaps the following may work.

 1. Ask the user to speak some standard words.  Record the speech and
establish the mapping from the words to the corresponding speech.
It may even be good to maintain separate databases for different
purposes, e.g., one for UNIX command lines, one for emails, and a
third for technical documents.

 2. The speech recognizer then functions similar to a keyboard in that it
converts speech to text which it then enters into the application
that has focus.

 3. The user must speak word by word.  The speech recognizer finds the
closest match for the speech my checking against the recordings made
in step 1 (and step 4).  The user may need to set the database from
which the match must be made.

 4. If there is no close match, or if the user is unhappy with the
selection made in step 3, the user can type in the correct word.  A
new record can be added to the appropriate database.

 The process may be frustrating for the user at first, but over time, the
 speech recognition should become better and better.

 The separate databases may be needed, for example, because the word
 period should usually translate to the symbol `.' except when writing
 about time periods when it should translate to the word `period'.

 I do not know what the storage requirements would be to maintain this
 database.  I do not know if the closest match algorithm in step 3 is
 even possible.  But if we could get a good dictation engine, that would
 be a killer app, in my opinion.  No more typing!  No more carpal tunnel
 injuries.  No more having to worry about small on screen keyboards that
 challenge finger typing.


It would be certainly a great application. But at the moment I am not very
sure about the capability of free runner and the applications which it can
handle. May be in future more and more betterment can be introduced in the
current applications:)



 Thanks.

 Ajit

 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community




-- 
Saurabh Gupta
Electronics and Communication Engg.
NSIT,New Delhi
http://saurabh1403.wordpress.com
___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: speech - text on FR?

2008-06-16 Thread saurabh gupta
On Mon, Jun 16, 2008 at 6:04 AM, Dan Staley [EMAIL PROTECTED] wrote:

 I actually just interfaced with the Sphinx project at one of the
 research positions I hold.  It is actually a very well written interface
 (for the most part...there were a few things poorly documented and/or
 implemented) But anyway, I found the java version of the project (Sphinx
 4 http://cmusphinx.sourceforge.net/sphinx4/ ) to be pretty easy to
 build/interface with.


Its great Dan that u got sphinx packages worked for you. I tried it but got
some error. However now a days i was concentrating on understanding their
some libraries and trying to write my own optimized codes. I will definitely
ping you in case of any help.



 The benefit of using the HMMs and models and methods that Sphinx
 implements is that anyone in their programs should be able to specify a
 grammar (similar to a simplified regex) that they want to be recognized
 and then the interpreter should be able to be user independant...meaning
 anyone can speak the phrase into the phone and get the desired output.
 Speech training wouldn't be required.  I found that once you set it up
 correctly, the Sphinx engine is very powerful, and usually identifies
 the spoken words no matter who says them (we found it even seemed to
 work decently well with a variety different accents).


This is good and in fact I will also try to implement this in the model. I
will get the HMM models of words by training them from different speakers.
This thing i have covered in my Design Document.

Thanks in advance...


 -Dan Staley

 On Sun, 2008-06-15 at 19:07 -0400, Ajit Natarajan wrote:
  Hello,
 
  I know nothing about speech recognition, so if the following won't work,
  please let me know (gently :) ).
 
  I understand that there is a project called Sphinx in CMU which attempts
  speech recognition.  It seems pretty complex.  I couldn't get it to work
  on my Linux desktop.  I'm not sure if it would work on an FR since it
  may need a lot of CPU horsepower and memory.
 
  I see a speech project on the OM projects page.  To me, it seems like
  the project is attempting command recognition, e.g., voice dialing.
  However, it would be great if the FR can function as a rudimentary
  dictation machine, i.e., allow the user to speak and convert to text.
 
  Perhaps the following may work.
 
  1. Ask the user to speak some standard words.  Record the speech and
  establish the mapping from the words to the corresponding speech.
  It may even be good to maintain separate databases for different
  purposes, e.g., one for UNIX command lines, one for emails, and a
  third for technical documents.
 
  2. The speech recognizer then functions similar to a keyboard in that it
  converts speech to text which it then enters into the application
  that has focus.
 
  3. The user must speak word by word.  The speech recognizer finds the
  closest match for the speech my checking against the recordings made
  in step 1 (and step 4).  The user may need to set the database from
  which the match must be made.
 
  4. If there is no close match, or if the user is unhappy with the
  selection made in step 3, the user can type in the correct word.  A
  new record can be added to the appropriate database.
 
  The process may be frustrating for the user at first, but over time, the
  speech recognition should become better and better.
 
  The separate databases may be needed, for example, because the word
  period should usually translate to the symbol `.' except when writing
  about time periods when it should translate to the word `period'.
 
  I do not know what the storage requirements would be to maintain this
  database.  I do not know if the closest match algorithm in step 3 is
  even possible.  But if we could get a good dictation engine, that would
  be a killer app, in my opinion.  No more typing!  No more carpal tunnel
  injuries.  No more having to worry about small on screen keyboards that
  challenge finger typing.
 
  Thanks.
 
  Ajit
 
 


 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community




-- 
Saurabh Gupta
Electronics and Communication Engg.
NSIT,New Delhi
___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


speech - text on FR?

2008-06-15 Thread Ajit Natarajan
Hello,

I know nothing about speech recognition, so if the following won't work, 
please let me know (gently :) ).

I understand that there is a project called Sphinx in CMU which attempts 
speech recognition.  It seems pretty complex.  I couldn't get it to work 
on my Linux desktop.  I'm not sure if it would work on an FR since it 
may need a lot of CPU horsepower and memory.

I see a speech project on the OM projects page.  To me, it seems like 
the project is attempting command recognition, e.g., voice dialing. 
However, it would be great if the FR can function as a rudimentary 
dictation machine, i.e., allow the user to speak and convert to text.

Perhaps the following may work.

1. Ask the user to speak some standard words.  Record the speech and
establish the mapping from the words to the corresponding speech.
It may even be good to maintain separate databases for different
purposes, e.g., one for UNIX command lines, one for emails, and a
third for technical documents.

2. The speech recognizer then functions similar to a keyboard in that it
converts speech to text which it then enters into the application
that has focus.

3. The user must speak word by word.  The speech recognizer finds the
closest match for the speech my checking against the recordings made
in step 1 (and step 4).  The user may need to set the database from
which the match must be made.

4. If there is no close match, or if the user is unhappy with the
selection made in step 3, the user can type in the correct word.  A
new record can be added to the appropriate database.

The process may be frustrating for the user at first, but over time, the 
speech recognition should become better and better.

The separate databases may be needed, for example, because the word 
period should usually translate to the symbol `.' except when writing 
about time periods when it should translate to the word `period'.

I do not know what the storage requirements would be to maintain this 
database.  I do not know if the closest match algorithm in step 3 is 
even possible.  But if we could get a good dictation engine, that would 
be a killer app, in my opinion.  No more typing!  No more carpal tunnel 
injuries.  No more having to worry about small on screen keyboards that 
challenge finger typing.

Thanks.

Ajit

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: speech - text on FR?

2008-06-15 Thread Dan Staley
I actually just interfaced with the Sphinx project at one of the
research positions I hold.  It is actually a very well written interface
(for the most part...there were a few things poorly documented and/or
implemented) But anyway, I found the java version of the project (Sphinx
4 http://cmusphinx.sourceforge.net/sphinx4/ ) to be pretty easy to
build/interface with.

The benefit of using the HMMs and models and methods that Sphinx
implements is that anyone in their programs should be able to specify a
grammar (similar to a simplified regex) that they want to be recognized
and then the interpreter should be able to be user independant...meaning
anyone can speak the phrase into the phone and get the desired output.
Speech training wouldn't be required.  I found that once you set it up
correctly, the Sphinx engine is very powerful, and usually identifies
the spoken words no matter who says them (we found it even seemed to
work decently well with a variety different accents).  

-Dan Staley

On Sun, 2008-06-15 at 19:07 -0400, Ajit Natarajan wrote:
 Hello,
 
 I know nothing about speech recognition, so if the following won't work,
 please let me know (gently :) ).
 
 I understand that there is a project called Sphinx in CMU which attempts
 speech recognition.  It seems pretty complex.  I couldn't get it to work
 on my Linux desktop.  I'm not sure if it would work on an FR since it
 may need a lot of CPU horsepower and memory.
 
 I see a speech project on the OM projects page.  To me, it seems like
 the project is attempting command recognition, e.g., voice dialing.
 However, it would be great if the FR can function as a rudimentary
 dictation machine, i.e., allow the user to speak and convert to text.
 
 Perhaps the following may work.
 
 1. Ask the user to speak some standard words.  Record the speech and
 establish the mapping from the words to the corresponding speech.
 It may even be good to maintain separate databases for different
 purposes, e.g., one for UNIX command lines, one for emails, and a
 third for technical documents.
 
 2. The speech recognizer then functions similar to a keyboard in that it
 converts speech to text which it then enters into the application
 that has focus.
 
 3. The user must speak word by word.  The speech recognizer finds the
 closest match for the speech my checking against the recordings made
 in step 1 (and step 4).  The user may need to set the database from
 which the match must be made.
 
 4. If there is no close match, or if the user is unhappy with the
 selection made in step 3, the user can type in the correct word.  A
 new record can be added to the appropriate database.
 
 The process may be frustrating for the user at first, but over time, the
 speech recognition should become better and better.
 
 The separate databases may be needed, for example, because the word
 period should usually translate to the symbol `.' except when writing
 about time periods when it should translate to the word `period'.
 
 I do not know what the storage requirements would be to maintain this
 database.  I do not know if the closest match algorithm in step 3 is
 even possible.  But if we could get a good dictation engine, that would
 be a killer app, in my opinion.  No more typing!  No more carpal tunnel
 injuries.  No more having to worry about small on screen keyboards that
 challenge finger typing.
 
 Thanks.
 
 Ajit
 
 


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: speech - text on FR?

2008-06-15 Thread Mikko Rauhala
su, 2008-06-15 kello 16:07 -0700, Ajit Natarajan kirjoitti:
 I see a speech project on the OM projects page.  To me, it seems like 
 the project is attempting command recognition, e.g., voice dialing. 

Feasible especially if the user trains the command words in advance.
(Didn't check if it does that; it is doable to some extent otherwise too
but the difficulty rises markedly...)

 However, it would be great if the FR can function as a rudimentary 
 dictation machine, i.e., allow the user to speak and convert to text.

A pipe dream. Save your dictations as audio and postprocess them
elsewhere.

-- 
Mikko Rauhala   - [EMAIL PROTECTED] - URL:http://www.iki.fi/mjr/
Transhumanist   - WTA member - URL:http://www.transhumanism.org/
Singularitarian - SIAI supporter - URL:http://www.singinst.org/




___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: speech - text on FR?

2008-06-15 Thread Brandon Kruse
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dan Staley wrote:
| I actually just interfaced with the Sphinx project at one of the
| research positions I hold.  It is actually a very well written interface
| (for the most part...there were a few things poorly documented and/or
| implemented) But anyway, I found the java version of the project (Sphinx
| 4 http://cmusphinx.sourceforge.net/sphinx4/ ) to be pretty easy to
| build/interface with.
|
| The benefit of using the HMMs and models and methods that Sphinx
| implements is that anyone in their programs should be able to specify a
| grammar (similar to a simplified regex) that they want to be recognized
| and then the interpreter should be able to be user independant...meaning
| anyone can speak the phrase into the phone and get the desired output.
| Speech training wouldn't be required.  I found that once you set it up
| correctly, the Sphinx engine is very powerful, and usually identifies
| the spoken words no matter who says them (we found it even seemed to
| work decently well with a variety different accents).
|
| -Dan Staley
|
| On Sun, 2008-06-15 at 19:07 -0400, Ajit Natarajan wrote:
| Hello,
|
| I know nothing about speech recognition, so if the following won't work,
| please let me know (gently :) ).
|
| I understand that there is a project called Sphinx in CMU which attempts
| speech recognition.  It seems pretty complex.  I couldn't get it to work
| on my Linux desktop.  I'm not sure if it would work on an FR since it
| may need a lot of CPU horsepower and memory.
|
| I see a speech project on the OM projects page.  To me, it seems like
| the project is attempting command recognition, e.g., voice dialing.
| However, it would be great if the FR can function as a rudimentary
| dictation machine, i.e., allow the user to speak and convert to text.
|
| Perhaps the following may work.
|
| 1. Ask the user to speak some standard words.  Record the speech and
| establish the mapping from the words to the corresponding speech.
| It may even be good to maintain separate databases for different
| purposes, e.g., one for UNIX command lines, one for emails, and a
| third for technical documents.
|
| 2. The speech recognizer then functions similar to a keyboard in that it
| converts speech to text which it then enters into the application
| that has focus.
|
| 3. The user must speak word by word.  The speech recognizer finds the
| closest match for the speech my checking against the recordings made
| in step 1 (and step 4).  The user may need to set the database from
| which the match must be made.
|
| 4. If there is no close match, or if the user is unhappy with the
| selection made in step 3, the user can type in the correct word.  A
| new record can be added to the appropriate database.
|
| The process may be frustrating for the user at first, but over time, the
| speech recognition should become better and better.
|
| The separate databases may be needed, for example, because the word
| period should usually translate to the symbol `.' except when writing
| about time periods when it should translate to the word `period'.
|
| I do not know what the storage requirements would be to maintain this
| database.  I do not know if the closest match algorithm in step 3 is
| even possible.  But if we could get a good dictation engine, that would
| be a killer app, in my opinion.  No more typing!  No more carpal tunnel
| injuries.  No more having to worry about small on screen keyboards that
| challenge finger typing.
|
| Thanks.
|
| Ajit
|
|
|
|
| ___
| Openmoko community mailing list
| community@lists.openmoko.org
| http://lists.openmoko.org/mailman/listinfo/community

Along with other speex to text engines (as someone else already
mentioned), it works best when the engine knows that he could have said
something in this list of pre-defined commands, and not any word in general.

It is also very good for deciding between two words, eg yes or no,
which is more useful than you would think, if you design your interface
to the user in the right way.

They also have a sphinx mobile-type of library, which seems to be very
lightweight, and might be worth looking into.

One thing I thought of is when someone tells you a number over the
phone, the phone could record and add it to the address book.

Lots of cool stuff you could do :)

- -brandon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIVeVaWSn2Kv7ZyAoRAoOKAJ9V2psUqf9TniZYMUbPp83hvm9lOgCfSaDI
qlZ6A+HqDGzZDKpUDaj+oDA=
=Wgj4
-END PGP SIGNATURE-

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community