. kindly whether I have to downaload from Script centre downloads i.e.
*Download
Windows PowerShell 2.0* <http://support.microsoft.com/kb/968929>. Whether
this will work for tesseract-ocr r-578 version  -due to  issue No: 465?
With regards,
-sriranga(78)

On Sun, Mar 27, 2011 at 10:51 PM, Quan Nguyen <[email protected]> wrote:

> I created a PowerShell script to automate language data generation for
> Tesseract 3.01. Save it as train.ps1 and put it in tesseract-3.0
> directory.
>
> Any feedback and improvement is welcome.
>
> <#
>
> Automate Tesseract 3.01 language data pack generation process.
>
> @author: Quan Nguyen
> @date: 27 Mar 2011
>
> The script file should be placed in the same directory as Tesseract's
> binary executables.
>
> Run PowerShell as Administrator and allow script execution by running
> the following command:
>
> PS > Set-ExecutionPolicy RemoteSigned
>
> Then execute the script by:
>
> PS > .\train.ps1
> or
> PS > .\train.ps1 yourlang imageFolder
>
> If imageFolder is not specified, it is default to a yourlang
> subdirectory under Tesseract directory.
>
> Windows PowerShell 2.0 Download: http://support.microsoft.com/kb/968929
>
> #>
>
> $lang = $args[0]
> if (!$lang) {
>    $lang = Read-Host "Enter a language code"
> }
>
> $langDir = $lang
>
> if ($args[1]) {
>    $langDir = $args[1]
> }
>
> if (!(test-path $langDir))
> {
>    throw "{0} is not a valid path" -f $langDir
> }
>
> echo "=== Generating Tesseract language data for language: $lang ==="
>
> $fullPath = [IO.Path]::GetFullPath($langDir)
> echo "** Your training images should be in ""$fullPath"" directory."
>
> $al = New-Object System.Collections.ArrayList
>
> echo "Make Box Files"
> $boxFiles
> Foreach ($entry in dir $langDir) {
>   If ($entry.name.toLower().endsWith(".tif") -and
> $entry.name.startsWith($lang)) {
>      echo "** Processing image: $entry"
>      $nameWoExt = [IO.Path]::Combine($entry.DirectoryName,
> $entry.BaseName)
>      $al.Add($nameWoExt)
>
> #Bootstrapping a new character set
>      $trainCmd = ".\tesseract {0}.tif {0} -l {1} batch.nochop
> makebox" -f $nameWoExt, $lang
> #Should comment out the next line after done with editing the box
> files to prevent them from getting overwritten in repeated runs.
>      Invoke-Expression $trainCmd
>      $boxFiles += $nameWoExt + ".box "
>   }
> }
> echo "** Box files should be edited before continuing. **"
>
> echo "Generate .tr Files"
> $trFiles
> Foreach ($entry in $al) {
>      $trainCmd = ".\tesseract {0}.tif {0} nobatch box.train" -f
> $entry
>      Invoke-Expression $trainCmd
>      $trFiles += $entry + ".tr "
> }
>
> echo "Compute the Character Set"
> Invoke-Expression ".\unicharset_extractor -D $langDir $boxFiles"
>
> move-item -force -path $langDir\unicharset -destination $langDir\
> $lang.unicharset
>
> echo "Clustering"
> Invoke-Expression ".\mftraining -U unicharset -O $trFiles"
> Invoke-Expression ".\cntraining $trFiles"
>
> echo "Dictionary Data"
> Invoke-Expression ".\wordlist2dawg $langdir\
> $lang.frequent_words_list.txt $langdir\$lang.freq-dawg $langdir\
> $lang.unicharset"
> Invoke-Expression ".\wordlist2dawg $langdir\$lang.words_list.txt
> $langdir\$lang.word-dawg $langdir\$lang.unicharset"
>
> echo "The last file (unicharambigs) -- this is to be manually edited"
> if (!(test-path $langdir\$lang.unicharambigs)) {
>    new-item "$langdir\$lang.unicharambigs" -type file
>    set-content -path $langdir\$lang.unicharambigs -value "v1"
> }
>
> echo "Putting it all together"
> Invoke-Expression ".\combine_tessdata $langdir\$lang."
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to