Sorry, I tried to download from Download Windows ph2.0 but instead of download it will go back to http://support.microsoft.com/kb/968929 again tried to download did not open download page. I am in confusion from where I have to download. Mine is WinXP(sp3) - which is automatically updated by default. With regards, -sriranga(78)
On Mon, Mar 28, 2011 at 8:40 AM, Sriranga(78yrsold) <[email protected] > wrote: > . kindly whether I have to downaload from Script centre downloads i.e. > *Download > Windows PowerShell 2.0* <http://support.microsoft.com/kb/968929>. Whether > this will work for tesseract-ocr r-578 version -due to issue No: 465? > With regards, > -sriranga(78) > > > On Sun, Mar 27, 2011 at 10:51 PM, Quan Nguyen <[email protected]> wrote: > >> I created a PowerShell script to automate language data generation for >> Tesseract 3.01. Save it as train.ps1 and put it in tesseract-3.0 >> directory. >> >> Any feedback and improvement is welcome. >> >> <# >> >> Automate Tesseract 3.01 language data pack generation process. >> >> @author: Quan Nguyen >> @date: 27 Mar 2011 >> >> The script file should be placed in the same directory as Tesseract's >> binary executables. >> >> Run PowerShell as Administrator and allow script execution by running >> the following command: >> >> PS > Set-ExecutionPolicy RemoteSigned >> >> Then execute the script by: >> >> PS > .\train.ps1 >> or >> PS > .\train.ps1 yourlang imageFolder >> >> If imageFolder is not specified, it is default to a yourlang >> subdirectory under Tesseract directory. >> >> Windows PowerShell 2.0 Download: http://support.microsoft.com/kb/968929 >> >> #> >> >> $lang = $args[0] >> if (!$lang) { >> $lang = Read-Host "Enter a language code" >> } >> >> $langDir = $lang >> >> if ($args[1]) { >> $langDir = $args[1] >> } >> >> if (!(test-path $langDir)) >> { >> throw "{0} is not a valid path" -f $langDir >> } >> >> echo "=== Generating Tesseract language data for language: $lang ===" >> >> $fullPath = [IO.Path]::GetFullPath($langDir) >> echo "** Your training images should be in ""$fullPath"" directory." >> >> $al = New-Object System.Collections.ArrayList >> >> echo "Make Box Files" >> $boxFiles >> Foreach ($entry in dir $langDir) { >> If ($entry.name.toLower().endsWith(".tif") -and >> $entry.name.startsWith($lang)) { >> echo "** Processing image: $entry" >> $nameWoExt = [IO.Path]::Combine($entry.DirectoryName, >> $entry.BaseName) >> $al.Add($nameWoExt) >> >> #Bootstrapping a new character set >> $trainCmd = ".\tesseract {0}.tif {0} -l {1} batch.nochop >> makebox" -f $nameWoExt, $lang >> #Should comment out the next line after done with editing the box >> files to prevent them from getting overwritten in repeated runs. >> Invoke-Expression $trainCmd >> $boxFiles += $nameWoExt + ".box " >> } >> } >> echo "** Box files should be edited before continuing. **" >> >> echo "Generate .tr Files" >> $trFiles >> Foreach ($entry in $al) { >> $trainCmd = ".\tesseract {0}.tif {0} nobatch box.train" -f >> $entry >> Invoke-Expression $trainCmd >> $trFiles += $entry + ".tr " >> } >> >> echo "Compute the Character Set" >> Invoke-Expression ".\unicharset_extractor -D $langDir $boxFiles" >> >> move-item -force -path $langDir\unicharset -destination $langDir\ >> $lang.unicharset >> >> echo "Clustering" >> Invoke-Expression ".\mftraining -U unicharset -O $trFiles" >> Invoke-Expression ".\cntraining $trFiles" >> >> echo "Dictionary Data" >> Invoke-Expression ".\wordlist2dawg $langdir\ >> $lang.frequent_words_list.txt $langdir\$lang.freq-dawg $langdir\ >> $lang.unicharset" >> Invoke-Expression ".\wordlist2dawg $langdir\$lang.words_list.txt >> $langdir\$lang.word-dawg $langdir\$lang.unicharset" >> >> echo "The last file (unicharambigs) -- this is to be manually edited" >> if (!(test-path $langdir\$lang.unicharambigs)) { >> new-item "$langdir\$lang.unicharambigs" -type file >> set-content -path $langdir\$lang.unicharambigs -value "v1" >> } >> >> echo "Putting it all together" >> Invoke-Expression ".\combine_tessdata $langdir\$lang." >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

