I would like to experiment with a very simple speech to text application which will take an emailed voice mail (.WAV attached to an email) and re-email the converted text. This should be possible with SAPI 5.4 - I am aware of the problems in making the recognition sufficiently accurate, but that is a future consideration. My request for help is in getting C++ for Win32 into C#.
Finding downloads and information for SAPI on MSDN was a little difficult (and the development seemed to stop in 2010, and it's now wrapped in the Windows SDK), but much of it is subsumed into .NET v3.5 onwards. Unfortunately, my C++ experience is very small and there don't seem to be complete C# examples to guide me. What suits me as a simple starting point is this sample in C++ Using <http://msdn.microsoft.com/en-us/library/ms717071%28VS.85%29.aspx> WAV File Input with SR Engines - for SAPI 5.3 (Win32, not .NET). Would someone be kind enough to massage it into a rudimentary VS2010 C# project for me? This might require locating the relevant members within System.Speech.Recognition and equating the Const in the C++ header files (I think). I have attached a text file of the C++ code from the link above. The MSDN help for the .NET assembly is here <http://msdn.microsoft.com/en-us/library/system.speech.recognition(v=vs.90). aspx> . _____ Ian Thomas Victoria Park, Western Australia
// Using WAV File Input with SR Engines // http://msdn.microsoft.com/en-us/library/ms717071(VS.85).aspx // // Sample wav audio file input source code // COM/C++ Developers // C-style is very similar to C++ and COM { CComPtr<ISpStream> cpInputStream; CComPtr<ISpRecognizer> cpRecognizer; CComPtr<ISpRecoContext> cpRecoContext; CComPtr<ISpRecoGrammar> cpRecoGrammar; // Create basic SAPI stream object // NOTE: The helper SpBindToFile can be used to perform the following operations hr = cpInputStream.CoCreateInstance(CLSID_SpStream); // Check hr CSpStreamFormat sInputFormat; // generate WaveFormatEx structure, assuming the wav format is 22kHz, 16-bit, Stereo hr = sInputFormat.AssignFormat(SPSF_22kHz16BitStereo); // Check hr // setup stream object with wav file MY_WAVE_AUDIO_FILENAME // for read-only access, since it will only be access by the SR engine hr = cpInputStream->BindToFile(MY_WAVE_AUDIO_FILENAME, SPFM_OPEN_READONLY, sInputFormat.FormatId(), sInputFormat.WaveFormatExPtr(), SPFEI_ALL_EVENTS); // Check hr // Create in-process speech recognition engine hr = cpRecognizer.CoCreateInstance(CLSID_SpInprocRecognizer); // Check hr // connect wav input to recognizer // SAPI will negotiate mismatched engine/input audio formats using system audio codecs, so second parameter is not important - use default of TRUE hr = cpRecognizer->SetInput(cpInputStream, TRUE); // Check hr // Create recognition context to receive events hr = cpRecognizer->CreateRecoContext(&cpRecoContext;); // Check hr // Create grammar, and load dictation // ignore grammar ID for simplicity's sake // NOTE: Voice command apps would load CFG here hr = cpRecognizer->CreateGrammar(NULL, &cpRecoGrammar;); // Check hr hr = cpRecoGrammar->LoadDictation(NULL,SPLO_STATIC); // Check hr // check for recognitions and end of stream event hr = cpRecoContext->SetInterest(SPFEI(SPEI_RECOGNITION) | SPFEI(SPEI_SR_END_STREAM), SPFEI(SPEI_RECOGNITION) | SPFEI(SPEI_SR_END_STREAM)); // use Win32 events for command-line style application hr = cpRecoContext->SetNotifyWin32Event(); // Check hr // activate dictation, and begin recognition hr = cpRecoGrammar->SetDictationState(SPRS_ACTIVE); // Check hr // while events occur, continue processing // timeout should be greater than the audio stream length, or a reasonable amount of time expected to pass before no more recognitions are expected in an audio stream BOOL fEndStreamReached = FALSE; while (!fEndStreamReached && S_OK == cpRecoContext->WaitForNotifyEvent(MY_REASONABLE_TIMEOUT)) { CSpEvent spEvent; // pull all queued events from the reco context's event queue while (!fEndStreamReached && S_OK == spEvent.GetFrom(cpRecoContext)) { // Check event type switch (spEvent.eEventId) { // speech recognition engine recognized some audio case SPEI_RECOGNITION: // TODO: log/report recognized text break; // end of the wav file was reached by the speech recognition engine case SPEI_SR_END_STREAM: fEndStreamReached = TRUE; break; } // clear any event data/object references spEvent.Clear(); }// END event pulling loop - break on empty event queue OR end stream }// END event polling loop - break on event timeout OR end stream // deactivate dictation hr = cpRecoGrammar->SetDictationState(SPRS_INACTIVE); // Check hr // unload dictation topic hr = cpRecoGrammar->UnloadDictation(); // Check hr // close the input stream, since we're done with it // NOTE: smart pointer will call SpStream's destructor, and consequently ::Close, but code may want to check for errors on ::Close operation hr = cpInputStream->Close(); // Check hr }
