Re: [Freedos-user] Unicode and codepages in apps already bundled with FreeDOS?

2023-06-24 Thread Mateusz Viste via Freedos-user

On 24/06/2023 02:18, Michael Brutman via Freedos-user wrote:
A centralized mapping would be nice, but then you will run into the 
question of how strict you want the code to be.


In an ideal world, one could imagine a new nlsfunc service that answers 
with a best effort match from the local codepage for any unicode 
codepoint. I am not saying this is a good practical idea, though. Given 
the limited development nowadays, it is probably for the best that each 
application comes with its own rules and mappings.


Mateusz


___
Freedos-user mailing list
Freedos-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-user


Re: [Freedos-user] Unicode and codepages in apps already bundled with FreeDOS?

2023-06-23 Thread Michael Brutman via Freedos-user
I added some limited Unicode support to mTCP Telnet and mTCP IRCjr in the
last release a few months ago.

   - I used a text file to store the mapping.  That lets people add code
   points or make corrections if they don't like the choices I made.
   - The code uses the text file both ways; to figure out what Unicode code
   point to send for a local high-bit character and what character to display
   when a Unicode code point is detected.
   - The current mapping is pointed to by a text file.
   - I don't try to detect the current code page in use.  The user is
   responsible for pointing at the correct text file.  While simple, this is
   also flexible.
   - I used a hash table to make the mappings pretty fast.  (I've seen some
   horrible code that did linear searches of a table, and that's painful to
   sit through.)

A centralized mapping would be nice, but then you will run into the
question of how strict you want the code to be.  The conversion from the
current code page to Unicode should always be strict as Unicode has far
many more glyphs.  But incoming Unicode can be mapped loosely or strictly,
and in my case I went for loose because I wanted the output to be useful to
humans and not full of "tofu" characters.  A strict mapping can be shared
but a loose mapping is probably best application specific.


-Mike
___
Freedos-user mailing list
Freedos-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-user


Re: [Freedos-user] Unicode and codepages in apps already bundled with FreeDOS?

2023-06-23 Thread Discussion and general questions about FreeDOS. via Freedos-user

On 23/06/2023 01:02, Eric Auer wrote:

PS: For all things NOT mentioned above, I expect no support for
Unicode or conversions at all. I expect those to just assume an
8-bit encoding in text (and file names) matching your codepage.


For the sake of completeness I will add that AMB has "some" UTF-8 
support, in the sense that the human writer can create original content 
in utf-8, and then ambpack is able to translate it to a codepage back 
and forth using mappings generated by utf8tocp.


Mateusz


___
Freedos-user mailing list
Freedos-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-user


[Freedos-user] Unicode and codepages in apps already bundled with FreeDOS?

2023-06-22 Thread Eric Auer



Hi all,

as part of a mail with Vacek, I made a list of apps from


https://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/test/report.html

which MIGHT have some sort of Unicode and codepage awareness:

By that, I mean that those apps can process input and/or output
which are encoded using Unicode or some codepage and then show
or otherwise process it either in Unicode or using the current
codepage (which could get autodetected), or graphically, maybe
including a custom non-codepage font.

It surprises me that there are more than 30 suspects, but only
a fraction of those will ACTUALLY have the features I hope them
to have. Maybe you can help me to make the list more exact :-)

HTMLHELP shows input (HTML with Unicode and entity support)
using awareness of which chars exist in the current codepage.

DN2 (DOS Navigator file manager, Ritlabs and Necromancer forks)
may be able to handle file names or view files beyond simple
"treat as 8-bit, assume it fits codepage". Same for the DOSZIP
file manager and PGME (which even comes with fonts, I think).

The SQLITE database engine may still contain Unicode support
even though it may be of limited use in DOS.

Like file managers, some archivers may be aware of filenames
supporting encodings beyond the current codepage: 7ZIP just
distinguishes DOS, WIN and UTF, whatever that means. 7ZDEC
may just assume that Unicode chars 0 to 255 are your codepage?
CABEXTRACT seems to rely on ICONV for Unicode? ZIP and UNZIP
may or may not support encodings in their Infozip DOS ports?
I do not expect any of the other archivers to ponder encodings.

Some of the larger programming languages, often ports using
32-bit compilers for DOS, could support Unicode in some way:
DOJS (JavaScript), Euphoria, FreeBASIC (FBC), FreePascal (FPC),
Lua, Regina Rexx, Perl, OpenWatcom C, OpenWatcom Fortran maybe?

I suspect filesystem drivers to have Unicode or codepage awareness,
suspects are: DOSLFN, LFNDOS, NTFS, USBDOX :-)

Among text editors, MinEd seems to be as Unicode- and codepage-
aware as HTMLHELP: http://towo.net/mined/term-dos.png Blocek
even comes with a graphical Unicode font. SETEDIT, ELVIS and
VIM are powerful enough to possibly support various encodings?

The FOXTYPE viewer explicitly supports Unicode. GNUCHCP is a
bit of an alternative to the DISPLAY/MODE/CPI font ecosystem.
UNRTF converts RTF to other text formats.

Likewise, internet apps such as Arachne, Dillo, Lynx, Links,
SSHDOS and SSH2DOS could support Unicode and other encodings?
Media player MPLAYER probably does, too. Maybe also OPENCP?

Last but not least, the OPENGEM GUI distro could contain
encoding-aware apps or infrastructure?

What are your thoughts? There might be more Unicode in FreeDOS
than I had intuitively expected. Even when support is minimal,
it would be cool to know that multiple apps grasp the concept
of, say, UTF-8 and codepages being able to show a tiny subset
of Unicode space and that a few apps even come with fonts with
far more than 256 different chars already :-)

Thanks for your insights! Regards, Eric

PS: For all things NOT mentioned above, I expect no support for
Unicode or conversions at all. I expect those to just assume an
8-bit encoding in text (and file names) matching your codepage.



___
Freedos-user mailing list
Freedos-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-user