I would like to experiment with a very simple speech to text application
which will take an emailed voice mail (.WAV attached to an email) and
re-email the converted text. This should be possible with SAPI 5.4 - I am
aware of the problems in making the recognition sufficiently accurate, but
that is a future consideration. My request for help is in getting C++ for
Win32 into C#. 

Finding downloads and information for SAPI on MSDN was a little difficult
(and the development seemed to stop in 2010, and it's now wrapped in the
Windows SDK), but much of it is subsumed into .NET v3.5 onwards.
Unfortunately, my C++ experience is very small and there don't seem to be
complete C# examples to guide me. 

What suits me as a simple starting point is this sample in C++ Using
<http://msdn.microsoft.com/en-us/library/ms717071%28VS.85%29.aspx>  WAV File
Input with SR Engines - for SAPI 5.3 (Win32, not .NET). 

Would someone be kind enough to massage it into a rudimentary VS2010 C#
project for me? This might require locating the relevant members within
System.Speech.Recognition and equating the Const in the C++ header files (I
think). 

I have attached a text file of the C++ code from the link above. The MSDN
help for the .NET assembly is here
<http://msdn.microsoft.com/en-us/library/system.speech.recognition(v=vs.90).
aspx> . 

  _____  

Ian Thomas
Victoria Park, Western Australia

 

// Using WAV File Input with SR Engines
// http://msdn.microsoft.com/en-us/library/ms717071(VS.85).aspx
//
// Sample wav audio file input source code
//    COM/C++ Developers 
//    C-style is very similar to C++ and COM

    {
   CComPtr<ISpStream> cpInputStream;
   CComPtr<ISpRecognizer> cpRecognizer;
   CComPtr<ISpRecoContext> cpRecoContext;
   CComPtr<ISpRecoGrammar> cpRecoGrammar;

   // Create basic SAPI stream object
   // NOTE: The helper SpBindToFile can be used to perform the following 
operations
   hr = cpInputStream.CoCreateInstance(CLSID_SpStream);
   // Check hr
   CSpStreamFormat sInputFormat;
   // generate WaveFormatEx structure, assuming the wav format is 22kHz, 
16-bit, Stereo
   hr = sInputFormat.AssignFormat(SPSF_22kHz16BitStereo);
   // Check hr

   // setup stream object with wav file MY_WAVE_AUDIO_FILENAME
   //   for read-only access, since it will only be access by the SR engine
   hr = cpInputStream->BindToFile(MY_WAVE_AUDIO_FILENAME,
      SPFM_OPEN_READONLY,
      sInputFormat.FormatId(),
      sInputFormat.WaveFormatExPtr(),
      SPFEI_ALL_EVENTS);

   // Check hr

   // Create in-process speech recognition engine
   hr = cpRecognizer.CoCreateInstance(CLSID_SpInprocRecognizer);
   // Check hr

   // connect wav input to recognizer
   // SAPI will negotiate mismatched engine/input audio formats using system 
audio codecs, so second parameter is not important - use default of TRUE
   hr = cpRecognizer->SetInput(cpInputStream, TRUE);
   // Check hr

   // Create recognition context to receive events
   hr = cpRecognizer->CreateRecoContext(&cpRecoContext;);
   // Check hr

   // Create grammar, and load dictation
   // ignore grammar ID for simplicity's sake
   // NOTE: Voice command apps would load CFG here
   hr = cpRecognizer->CreateGrammar(NULL, &cpRecoGrammar;);
   // Check hr
   hr = cpRecoGrammar->LoadDictation(NULL,SPLO_STATIC); 
   // Check hr

   // check for recognitions and end of stream event
   hr = cpRecoContext->SetInterest(SPFEI(SPEI_RECOGNITION) | 
SPFEI(SPEI_SR_END_STREAM), SPFEI(SPEI_RECOGNITION) | SPFEI(SPEI_SR_END_STREAM));

   // use Win32 events for command-line style application
   hr = cpRecoContext->SetNotifyWin32Event();
   // Check hr

   // activate dictation, and begin recognition
   hr = cpRecoGrammar->SetDictationState(SPRS_ACTIVE);
   // Check hr

   // while events occur, continue processing
   // timeout should be greater than the audio stream length, or a reasonable 
amount of time expected to pass before no more recognitions are expected in an 
audio stream
   BOOL fEndStreamReached = FALSE;
   while (!fEndStreamReached && S_OK == 
cpRecoContext->WaitForNotifyEvent(MY_REASONABLE_TIMEOUT))
   {
      CSpEvent spEvent;
      // pull all queued events from the reco context's event queue

      while (!fEndStreamReached && S_OK == spEvent.GetFrom(cpRecoContext))
      {
         // Check event type
         switch (spEvent.eEventId)
         {
            // speech recognition engine recognized some audio
            case SPEI_RECOGNITION:
            // TODO: log/report recognized text
            break;

            // end of the wav file was reached by the speech recognition engine
            case SPEI_SR_END_STREAM:
               fEndStreamReached = TRUE;
               break;
         }

         // clear any event data/object references
         spEvent.Clear();
         }// END event pulling loop - break on empty event queue OR end stream
      }// END event polling loop - break on event timeout OR end stream

   // deactivate dictation
   hr = cpRecoGrammar->SetDictationState(SPRS_INACTIVE);
   // Check hr

   // unload dictation topic
   hr = cpRecoGrammar->UnloadDictation();
   // Check hr

   // close the input stream, since we're done with it
   // NOTE: smart pointer will call SpStream's destructor, and consequently 
::Close, but code may want to check for errors on ::Close operation
   hr = cpInputStream->Close();
   // Check hr
}

Reply via email to