Every week I receive a file that I use in one of my programs. When I first started receiving this file it always came in UTF8 encoding. Later the sender changed to using UTF16 encoding; however, every once in a while it will be back in UTF8 or rarely in Mac OS Roman encoding.

Every week I first read the file into BBEdit and save it back out in UTF8 encoding so that it will agree with the coding in my program. It seems like there should be some way that I can determine the encoding using RB but being just a hobby programmer in RB I am not sure how.

I took a copy of one of the files and used BBEdit to store it in each of the following encodings as shown by the name of the file. When I read any of these files into BBEdit it instantly reports the proper encoding for the file.

   UTF8NoBom.txt
   UTF8.txt
   MacOSRoman.txt
   WindowsLatin1.txt
   ISOLatin1.txt
   UTF16.txt
   UTF16NoBom.txt

Now, using the following code in RB to get the file names of each of the files in the first column of a listbox and the internetname of the encoding in the second column I see UTF8 given as the encoding of each of the files.

So, how does one determine the encoding of a file programatically? Seems that I should be able to write the code to process the file without having to care what encoding the file has. Just determine the encoding, tell RB to use that encoding and then process the file. BBEdit can figure it out so I ought to be able to. But how???

The above files are in a folder named "TestFiles" for the code below. This is the code I have thrown into a test program to try and figure out this encoding stuff.

Sub showInfo()
  Dim f, fldr As FolderItem
  Dim t As TextInputStream
  Dim i, n, rw As Integer
  Dim enc As TextEncoding
  Dim r As String

  fldr = GetFolderItem("TestFiles")
  if fldr <> nil and fldr.exists and fldr.Directory then
    n = fldr.Count
    for i = 1 to n
      f = fldr.TrueItem(i)
      lbInfo.AddRow f.Name
      rw=lbInfo.LastIndex
      t=f.OpenAsTextFile
      r = t.ReadLine
      t.Close
      lbInfo.Cell(rw,1)=r
      enc = r.Encoding
      lbInfo.Cell(rw,1) = enc.internetName  // always displays UTF8
    next i
  else
    MsgBox "Unable to find the folder 'TestFiles' in project folder."
  end if
End Sub


=== A Mac addict in Tennessee ===

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to