Re: unicode and dbf files

2009-10-27 Thread John Machin
On Oct 28, 2:51 am, Ethan Furman et...@stoneleaf.us wrote:
 John Machin wrote:
  On Oct 27, 7:15 am, Ethan Furman et...@stoneleaf.us wrote:

 Let me rephrase -- say I get a dbf file with an LDID of \x0f that maps
 to a cp437, and the file came from a german oem machine... could that
 file have upper-ascii codes that will not map to anything reasonable on
 my \x01 cp437 machine?  If so, is there anything I can do about it?

  ASCII is defined over the first 128 codepoints; upper-ascii codes is
  meaningless. As for the rest of your question, if the file's encoded
  in cpXXX, it's encoded in cpXXX. If either the creator or the reader
  or both are lying, then all bets are off.

 My confusion is this -- is there a difference between any of the various
 cp437s?

What various cp437s???

  Going down the list at ESRI: 0x01, 0x09, 0x0b, 0x0d, 0x0f,
 0x11, 0x15, 0x18, 0x19, and 0x1b all map to cp437,

Yes, this is called a many-to-*one* relationship.

 and they have names

they being the Language Drivers, not the codepages.

 such as US, Dutch, Finnish, French, German, Italian, Swedish, Spanish,
 English (Britain  US)... are these all the same?

When you read the Wikipedia page on cp437, did you see any reference
to different versions for French, German, Finnish, etc? I saw only one
mapping table; how many did you see? If there are multiple language
versions of a codepage, how do you expect to handle this given Python
has only one codec per codepage?

Trying again: *ONE* attribute of a Language Driver ID (LDID) is the
character set (codepage) that it uses. Other attributes may be things
like the collating (sorting) sequence, whether they use a dot or a
comma as the decimal point, etc. Many different languages in Western
Europe can use the same codepage. Initially the common one was cp 437,
then 850, then 1252.

There may possibly different interpretations of a codepage out there
somewhere, but they are all *intended* to be the same, and I advise
you to cross the different-cp437s bridge *if* it exists and you ever
come to it.

Have you got access to files with LDID not in (0, 1) that you can try
out?

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode and dbf files

2009-10-27 Thread Ethan Furman

John Machin wrote:

There may possibly different interpretations of a codepage out there
somewhere, but they are all *intended* to be the same, and I advise
you to cross the different-cp437s bridge *if* it exists and you ever
come to it.

Have you got access to files with LDID not in (0, 1) that you can try
out?


Alas, I do not.  And I probably never will, making the whole thing academic.

Speaking of tables I do not have access to, and documentation for that 
matter, I would love to get information on db4, 5, 7, etc.


Many thanks for your time and knowledge, and my apologies for seeming so 
dense.  :)


Cheers!

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: unicode and dbf files

2009-10-26 Thread Ethan Furman

John Machin wrote:

On Oct 24, 4:14 am, Ethan Furman et...@stoneleaf.us wrote:


John Machin wrote:


On Oct 23, 3:03 pm, Ethan Furman et...@stoneleaf.us wrote:



John Machin wrote:



On Oct 23, 7:28 am, Ethan Furman et...@stoneleaf.us wrote:



Greetings, all!



I would like to add unicode support to my dbf project.  The dbf header
has a one-byte field to hold the encoding of the file.  For example,
\x03 is code-page 437 MS-DOS.



My google-fu is apparently not up to the task of locating a complete
resource that has a list of the 256 possible values and their
corresponding code pages.



What makes you imagine that all 256 possible values are mapped to code
pages?



I'm just wanting to make sure I have whatever is available, and
preferably standard.  :D



So far I have found this, plus variations:http://support.microsoft.com/kb/129631



Does anyone know of anything more complete?



That is for VFP3. Try the VFP9 equivalent.



dBase 5,5,6,7 use others which are not defined in publicly available
dBase docs AFAICT. Look for language driver ID and LDID. Secondary
source: ESRI support site.



Well, a couple hours later and still not more than I started with.
Thanks for trying, though!



Huh? You got tips to (1) the VFP9 docs (2) the ESRI site (3) search
keywords and you couldn't come up with anything??


Perhaps nothing new would have been a better description.  I'd already
seen the clicketyclick site (good info there)



Do you think so? My take is that it leaves out most of the codepage
numbers, and these two lines are wrong:
65h Nordic MS-DOS   code page 865
66h Russian MS-DOS  code page 866


That was the site I used to get my whole project going, so ignoring the 
unicode aspect, it has been very helpful to me.




and all I found at ESRI
were folks trying to figure it out, plus one link to a list that was no
different from the vfp3 list (or was it that the list did not give the
hex values?  Either way, of no use to me.)



Try this:
http://webhelp.esri.com/arcpad/8.0/referenceguide/


Wow.  Question, though:  all those codepages mapping to 437 and 850 -- 
are they really all the same?




I looked at dbase.com, but came up empty-handed there (not surprising,
since they are a commercial company).



MS and ESRI have docs ... does that mean that they are non-commercial
companies?


I don't know enough about ESRI to make an informed comment, so I'll just 
say I'm grateful they have them!  MS is a complete mystery... perhaps 
they are finally seeing the light?  Hard to believe, though, from a 
company that has consistently changed their file formats with every release.




I searched some more on Microsoft's site in the VFP9 section, and was
able to find the code page section this time.  Sadly, it only added
about seven codes.

At any rate, here is what I have come up with so far.  Any corrections
and/or additions greatly appreciated.

code_pages = {
'\x01' : ('ascii', 'U.S. MS-DOS'),



All of the sources say codepage 437, so why ascii instead of cp437?


Hard to say, really.  Adjusted.



'\x02' : ('cp850', 'International MS-DOS'),
'\x03' : ('cp1252', 'Windows ANSI'),
'\x04' : ('mac_roman', 'Standard Macintosh'),
'\x64' : ('cp852', 'Eastern European MS-DOS'),
'\x65' : ('cp866', 'Russian MS-DOS'),
'\x66' : ('cp865', 'Nordic MS-DOS'),
'\x67' : ('cp861', 'Icelandic MS-DOS'),
'\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'), # iffy



Indeed iffy. Python doesn't have a cp895 encoding, and it's probably
not alone. I suggest that you omit Kamenicky until someone actually
wants it.


Yeah, I noticed that.  Tentative plan was to implement it myself (more 
for practice than anything else), and also to be able to raise a more 
specific error (Kamenicky not currently supported or some such).




'\x69' : ('cp852', 'Mazovia (Polish) MS-DOS'),  # iffy



Look 5 lines back. cp852 is 'Eastern European MS-DOS'. Mazovia
predates and is not the same as cp852. In any case, I suggest that you
omit Masovia until someone wants it. Interesting reading:

http://www.jastra.com.pl/klub/ogonki.htm


Very interesting reading.



'\x6a' : ('cp737', 'Greek MS-DOS (437G)'),
'\x6b' : ('cp857', 'Turkish MS-DOS'),
'\x78' : ('big5', 'Traditional Chinese (Hong Kong SAR, Taiwan)\



big5 is *not* the same as cp950. The products that create DBF files
were designed for Windows. So when your source says that LDID 0xXX
maps to Windows codepage YYY, I would suggest that all you should do
is translate that without thinking to python encoding cpYYY.


Ack.  Not sure how I missed 'Windows' at the end of that description.



   Windows'),   # wag


What does wag mean?


wag == 'wild ass guess'



'\x79' : ('iso2022_kr', 'Korean Windows'),  # wag


Try cp949.


Done.



'\x7a' : ('iso2022_jp_2', 'Chinese Simplified (PRC, Singapore)\
   Windows'),   # wag



Very wrong. iso2022_jp_2 is supposed to include basic Japanese, basic

Re: unicode and dbf files

2009-10-26 Thread John Machin
On Oct 27, 3:22 am, Ethan Furman et...@stoneleaf.us wrote:
 John Machin wrote:
  On Oct 24, 4:14 am, Ethan Furman et...@stoneleaf.us wrote:

 John Machin wrote:

 On Oct 23, 3:03 pm, Ethan Furman et...@stoneleaf.us wrote:

 John Machin wrote:

 On Oct 23, 7:28 am, Ethan Furman et...@stoneleaf.us wrote:

  Try this:
 http://webhelp.esri.com/arcpad/8.0/referenceguide/

 Wow.  Question, though:  all those codepages mapping to 437 and 850 --
 are they really all the same?

437 and 850 *are* codepages. You mean all those language driver IDs
mapping to codepages 437 and 850. A codepage merely gives an
encoding. An LDID is like a locale; it includes other things besides
the encoding. That's why many Western European languages map to the
same codepage, first 437 then later 850 then 1252 when Windows came
along.

      '\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'),     # iffy

  Indeed iffy. Python doesn't have a cp895 encoding, and it's probably
  not alone. I suggest that you omit Kamenicky until someone actually
  wants it.

 Yeah, I noticed that.  Tentative plan was to implement it myself (more
 for practice than anything else), and also to be able to raise a more
 specific error (Kamenicky not currently supported or some such).

The error idea is fine, but I don't get the implement it yourself for
practice bit ... practice what? You plan a long and fruitful career
inplementing codecs for YAGNI codepages?

      '\x7b' : ('iso2022_jp', 'Japanese Windows'),        # wag

  Try cp936.

 You mean 932?

Yes.

 Very helpful indeed.  Many thanks for reviewing and correcting.

You're welcome.

 Learning to deal with unicode is proving more difficult for me than
 learning Python was to begin with!  ;D

?? As far as I can tell, the topic has been about mapping from
something like a locale to the name of an encoding, i.e. all about the
pre-Unicode mishmash and nothing to do with dealing with unicode ...

BTW, what are you planning to do with an LDID of 0x00?

Cheers,

John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode and dbf files

2009-10-26 Thread Ethan Furman

John Machin wrote:

On Oct 27, 3:22 am, Ethan Furman et...@stoneleaf.us wrote:


John Machin wrote:


Try this:
http://webhelp.esri.com/arcpad/8.0/referenceguide/


Wow.  Question, though:  all those codepages mapping to 437 and 850 --
are they really all the same?


437 and 850 *are* codepages. You mean all those language driver IDs
mapping to codepages 437 and 850. A codepage merely gives an
encoding. An LDID is like a locale; it includes other things besides
the encoding. That's why many Western European languages map to the
same codepage, first 437 then later 850 then 1252 when Windows came
along.


Let me rephrase -- say I get a dbf file with an LDID of \x0f that maps 
to a cp437, and the file came from a german oem machine... could that 
file have upper-ascii codes that will not map to anything reasonable on 
my \x01 cp437 machine?  If so, is there anything I can do about it?




   '\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'), # iffy



Indeed iffy. Python doesn't have a cp895 encoding, and it's probably
not alone. I suggest that you omit Kamenicky until someone actually
wants it.


Yeah, I noticed that.  Tentative plan was to implement it myself (more
for practice than anything else), and also to be able to raise a more
specific error (Kamenicky not currently supported or some such).



The error idea is fine, but I don't get the implement it yourself for
practice bit ... practice what? You plan a long and fruitful career
inplementing codecs for YAGNI codepages?


ROFL.  Playing with code; the unicode/code page interactions.  Possibly 
looking at constructs I might not otherwise.  Since this would almost 
certainly (I don't like saying absolutely and never -- been 
troubleshooting for too many years for that!-) be a YAGNI, implementing 
it is very low priority




   '\x7b' : ('iso2022_jp', 'Japanese Windows'),# wag



Try cp936.


You mean 932?



Yes.



Very helpful indeed.  Many thanks for reviewing and correcting.



You're welcome.



Learning to deal with unicode is proving more difficult for me than
learning Python was to begin with!  ;D



?? As far as I can tell, the topic has been about mapping from
something like a locale to the name of an encoding, i.e. all about the
pre-Unicode mishmash and nothing to do with dealing with unicode ...


You are, of course, correct.  Once it's all unicode life will be easier 
(he says, all innocent-like).  And dbf files even bigger, lol.




BTW, what are you planning to do with an LDID of 0x00?


Hmmm.  Well, logical choices seem to be either treating it as plain 
ascii, and barfing when high-ascii shows up; defaulting to \x01; or 
forcing the user to choose one on initial access.


I am definitely open to ideas!



Cheers,

John


--
http://mail.python.org/mailman/listinfo/python-list


Re: unicode and dbf files

2009-10-26 Thread John Machin
On Oct 27, 7:15 am, Ethan Furman et...@stoneleaf.us wrote:
 John Machin wrote:
  On Oct 27, 3:22 am, Ethan Furman et...@stoneleaf.us wrote:

 John Machin wrote:

 Try this:
 http://webhelp.esri.com/arcpad/8.0/referenceguide/

 Wow.  Question, though:  all those codepages mapping to 437 and 850 --
 are they really all the same?

  437 and 850 *are* codepages. You mean all those language driver IDs
  mapping to codepages 437 and 850. A codepage merely gives an
  encoding. An LDID is like a locale; it includes other things besides
  the encoding. That's why many Western European languages map to the
  same codepage, first 437 then later 850 then 1252 when Windows came
  along.

 Let me rephrase -- say I get a dbf file with an LDID of \x0f that maps
 to a cp437, and the file came from a german oem machine... could that
 file have upper-ascii codes that will not map to anything reasonable on
 my \x01 cp437 machine?  If so, is there anything I can do about it?

ASCII is defined over the first 128 codepoints; upper-ascii codes is
meaningless. As for the rest of your question, if the file's encoded
in cpXXX, it's encoded in cpXXX. If either the creator or the reader
or both are lying, then all bets are off.

  BTW, what are you planning to do with an LDID of 0x00?

 Hmmm.  Well, logical choices seem to be either treating it as plain
 ascii, and barfing when high-ascii shows up; defaulting to \x01; or
 forcing the user to choose one on initial access.

It would be more useful to allow the user to specify an encoding than
an LDID.

You need to be able to read files created not only by software like
VFP or dBase but also scripts using third-party libraries. It would be
useful to allow an encoding to override an LDID that is incorrect e.g.
the LDID implies cp1251 but the data is actually encoded in koi8[ru]

Read this: http://en.wikipedia.org/wiki/Code_page_437
With no LDID in the file and no encoding supplied, I'd be inclined to
make it barf if any codepoint not in range(32, 128) showed up.

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode and dbf files

2009-10-24 Thread John Machin
On Oct 24, 4:14 am, Ethan Furman et...@stoneleaf.us wrote:
 John Machin wrote:
  On Oct 23, 3:03 pm, Ethan Furman et...@stoneleaf.us wrote:

 John Machin wrote:

 On Oct 23, 7:28 am, Ethan Furman et...@stoneleaf.us wrote:

 Greetings, all!

 I would like to add unicode support to my dbf project.  The dbf header
 has a one-byte field to hold the encoding of the file.  For example,
 \x03 is code-page 437 MS-DOS.

 My google-fu is apparently not up to the task of locating a complete
 resource that has a list of the 256 possible values and their
 corresponding code pages.

 What makes you imagine that all 256 possible values are mapped to code
 pages?

 I'm just wanting to make sure I have whatever is available, and
 preferably standard.  :D

 So far I have found this, plus 
 variations:http://support.microsoft.com/kb/129631

 Does anyone know of anything more complete?

 That is for VFP3. Try the VFP9 equivalent.

 dBase 5,5,6,7 use others which are not defined in publicly available
 dBase docs AFAICT. Look for language driver ID and LDID. Secondary
 source: ESRI support site.

 Well, a couple hours later and still not more than I started with.
 Thanks for trying, though!

  Huh? You got tips to (1) the VFP9 docs (2) the ESRI site (3) search
  keywords and you couldn't come up with anything??

 Perhaps nothing new would have been a better description.  I'd already
 seen the clicketyclick site (good info there)

Do you think so? My take is that it leaves out most of the codepage
numbers, and these two lines are wrong:
65h Nordic MS-DOS   code page 865
66h Russian MS-DOS  code page 866


 and all I found at ESRI
 were folks trying to figure it out, plus one link to a list that was no
 different from the vfp3 list (or was it that the list did not give the
 hex values?  Either way, of no use to me.)

Try this:
http://webhelp.esri.com/arcpad/8.0/referenceguide/



 I looked at dbase.com, but came up empty-handed there (not surprising,
 since they are a commercial company).

MS and ESRI have docs ... does that mean that they are non-commercial
companies?

 I searched some more on Microsoft's site in the VFP9 section, and was
 able to find the code page section this time.  Sadly, it only added
 about seven codes.

 At any rate, here is what I have come up with so far.  Any corrections
 and/or additions greatly appreciated.

 code_pages = {
      '\x01' : ('ascii', 'U.S. MS-DOS'),

All of the sources say codepage 437, so why ascii instead of cp437?

      '\x02' : ('cp850', 'International MS-DOS'),
      '\x03' : ('cp1252', 'Windows ANSI'),
      '\x04' : ('mac_roman', 'Standard Macintosh'),
      '\x64' : ('cp852', 'Eastern European MS-DOS'),
      '\x65' : ('cp866', 'Russian MS-DOS'),
      '\x66' : ('cp865', 'Nordic MS-DOS'),
      '\x67' : ('cp861', 'Icelandic MS-DOS'),
      '\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'),     # iffy

Indeed iffy. Python doesn't have a cp895 encoding, and it's probably
not alone. I suggest that you omit Kamenicky until someone actually
wants it.

      '\x69' : ('cp852', 'Mazovia (Polish) MS-DOS'),      # iffy

Look 5 lines back. cp852 is 'Eastern European MS-DOS'. Mazovia
predates and is not the same as cp852. In any case, I suggest that you
omit Masovia until someone wants it. Interesting reading:

http://www.jastra.com.pl/klub/ogonki.htm

      '\x6a' : ('cp737', 'Greek MS-DOS (437G)'),
      '\x6b' : ('cp857', 'Turkish MS-DOS'),
      '\x78' : ('big5', 'Traditional Chinese (Hong Kong SAR, Taiwan)\

big5 is *not* the same as cp950. The products that create DBF files
were designed for Windows. So when your source says that LDID 0xXX
maps to Windows codepage YYY, I would suggest that all you should do
is translate that without thinking to python encoding cpYYY.

                 Windows'),       # wag

What does wag mean?

      '\x79' : ('iso2022_kr', 'Korean Windows'),          # wag

Try cp949.


      '\x7a' : ('iso2022_jp_2', 'Chinese Simplified (PRC, Singapore)\
                 Windows'),       # wag

Very wrong. iso2022_jp_2 is supposed to include basic Japanese, basic
(1980) Chinese (GB2312) and a basic Korean kit. However to quote from
CJKV Information Processing by Ken Lunde, ... from a practical
point of view, ISO-2022-JP-2 . [is] equivalent to ISO-2022-JP-1
encoding. i.e. no Chinese support at all. Try cp936.

      '\x7b' : ('iso2022_jp', 'Japanese Windows'),        # wag

Try cp936.

      '\x7c' : ('cp874', 'Thai Windows'),                 # wag
      '\x7d' : ('cp1255', 'Hebrew Windows'),
      '\x7e' : ('cp1256', 'Arabic Windows'),
      '\xc8' : ('cp1250', 'Eastern European Windows'),
      '\xc9' : ('cp1251', 'Russian Windows'),
      '\xca' : ('cp1254', 'Turkish Windows'),
      '\xcb' : ('cp1253', 'Greek Windows'),
      '\x96' : ('mac_cyrillic', 'Russian Macintosh'),
      '\x97' : ('mac_latin2', 'Macintosh EE'),
      '\x98' : ('mac_greek', 'Greek Macintosh') }

HTH,
John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode and dbf files

2009-10-23 Thread Ethan Furman

John Machin wrote:

On Oct 23, 3:03 pm, Ethan Furman et...@stoneleaf.us wrote:


John Machin wrote:


On Oct 23, 7:28 am, Ethan Furman et...@stoneleaf.us wrote:



Greetings, all!



I would like to add unicode support to my dbf project.  The dbf header
has a one-byte field to hold the encoding of the file.  For example,
\x03 is code-page 437 MS-DOS.



My google-fu is apparently not up to the task of locating a complete
resource that has a list of the 256 possible values and their
corresponding code pages.



What makes you imagine that all 256 possible values are mapped to code
pages?


I'm just wanting to make sure I have whatever is available, and
preferably standard.  :D



So far I have found this, plus variations:http://support.microsoft.com/kb/129631



Does anyone know of anything more complete?



That is for VFP3. Try the VFP9 equivalent.



dBase 5,5,6,7 use others which are not defined in publicly available
dBase docs AFAICT. Look for language driver ID and LDID. Secondary
source: ESRI support site.


Well, a couple hours later and still not more than I started with.
Thanks for trying, though!



Huh? You got tips to (1) the VFP9 docs (2) the ESRI site (3) search
keywords and you couldn't come up with anything??


Perhaps nothing new would have been a better description.  I'd already 
seen the clicketyclick site (good info there), and all I found at ESRI 
were folks trying to figure it out, plus one link to a list that was no 
different from the vfp3 list (or was it that the list did not give the 
hex values?  Either way, of no use to me.)


I looked at dbase.com, but came up empty-handed there (not surprising, 
since they are a commercial company).


I searched some more on Microsoft's site in the VFP9 section, and was 
able to find the code page section this time.  Sadly, it only added 
about seven codes.


At any rate, here is what I have come up with so far.  Any corrections 
and/or additions greatly appreciated.


code_pages = {
'\x01' : ('ascii', 'U.S. MS-DOS'),
'\x02' : ('cp850', 'International MS-DOS'),
'\x03' : ('cp1252', 'Windows ANSI'),
'\x04' : ('mac_roman', 'Standard Macintosh'),
'\x64' : ('cp852', 'Eastern European MS-DOS'),
'\x65' : ('cp866', 'Russian MS-DOS'),
'\x66' : ('cp865', 'Nordic MS-DOS'),
'\x67' : ('cp861', 'Icelandic MS-DOS'),
'\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'), # iffy
'\x69' : ('cp852', 'Mazovia (Polish) MS-DOS'),  # iffy
'\x6a' : ('cp737', 'Greek MS-DOS (437G)'),
'\x6b' : ('cp857', 'Turkish MS-DOS'),

'\x78' : ('big5', 'Traditional Chinese (Hong Kong SAR, Taiwan)\
   Windows'),   # wag
'\x79' : ('iso2022_kr', 'Korean Windows'),  # wag
'\x7a' : ('iso2022_jp_2', 'Chinese Simplified (PRC, Singapore)\
   Windows'),   # wag
'\x7b' : ('iso2022_jp', 'Japanese Windows'),# wag
'\x7c' : ('cp874', 'Thai Windows'), # wag
'\x7d' : ('cp1255', 'Hebrew Windows'),
'\x7e' : ('cp1256', 'Arabic Windows'),
'\xc8' : ('cp1250', 'Eastern European Windows'),
'\xc9' : ('cp1251', 'Russian Windows'),
'\xca' : ('cp1254', 'Turkish Windows'),
'\xcb' : ('cp1253', 'Greek Windows'),
'\x96' : ('mac_cyrillic', 'Russian Macintosh'),
'\x97' : ('mac_latin2', 'Macintosh EE'),
'\x98' : ('mac_greek', 'Greek Macintosh') }

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


unicode and dbf files

2009-10-22 Thread Ethan Furman

Greetings, all!

I would like to add unicode support to my dbf project.  The dbf header 
has a one-byte field to hold the encoding of the file.  For example, 
\x03 is code-page 437 MS-DOS.


My google-fu is apparently not up to the task of locating a complete 
resource that has a list of the 256 possible values and their 
corresponding code pages.


So far I have found this, plus variations:
http://support.microsoft.com/kb/129631

Does anyone know of anything more complete?

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: unicode and dbf files

2009-10-22 Thread John Machin
On Oct 23, 7:28 am, Ethan Furman et...@stoneleaf.us wrote:
 Greetings, all!

 I would like to add unicode support to my dbf project.  The dbf header
 has a one-byte field to hold the encoding of the file.  For example,
 \x03 is code-page 437 MS-DOS.

 My google-fu is apparently not up to the task of locating a complete
 resource that has a list of the 256 possible values and their
 corresponding code pages.

What makes you imagine that all 256 possible values are mapped to code
pages?

 So far I have found this, plus 
 variations:http://support.microsoft.com/kb/129631

 Does anyone know of anything more complete?

That is for VFP3. Try the VFP9 equivalent.

dBase 5,5,6,7 use others which are not defined in publicly available
dBase docs AFAICT. Look for language driver ID and LDID. Secondary
source: ESRI support site.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode and dbf files

2009-10-22 Thread Ethan Furman

John Machin wrote:

On Oct 23, 7:28 am, Ethan Furman et...@stoneleaf.us wrote:


Greetings, all!

I would like to add unicode support to my dbf project.  The dbf header
has a one-byte field to hold the encoding of the file.  For example,
\x03 is code-page 437 MS-DOS.

My google-fu is apparently not up to the task of locating a complete
resource that has a list of the 256 possible values and their
corresponding code pages.


What makes you imagine that all 256 possible values are mapped to code
pages?


I'm just wanting to make sure I have whatever is available, and 
preferably standard.  :D




So far I have found this, plus variations:http://support.microsoft.com/kb/129631

Does anyone know of anything more complete?


That is for VFP3. Try the VFP9 equivalent.

dBase 5,5,6,7 use others which are not defined in publicly available
dBase docs AFAICT. Look for language driver ID and LDID. Secondary
source: ESRI support site.


Well, a couple hours later and still not more than I started with. 
Thanks for trying, though!


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list