Re: Text encoding Babel. Was Re: George Keremedjiev

2018-12-05 Thread Maciej W. Rozycki via cctalk
On Tue, 4 Dec 2018, Liam Proven wrote:

> >  I don't know if the unreal mode has been retained in the x86 architecture
> > to this day; as I noted above it was not officially supported.  But then
> > some originally undocumented x86 features, such as the second byte of AAD
> > and AAM instructions actually being an immediate argument that could have
> > a value different from 10, have become standardised at one point.
> 
> I know, and was surprised that, v86 mode isn't supported in x86-64.

 In the native long mode, that is.  If you run the CPU 32-bit, then VM86 
works.  I guess AMD didn't want to burden the architecture in case pure 
64-bit parts were made in the future.

> This caused major problems for the developers of DOSEMU.

 And also for expansion-BIOS emulation, especially with graphics adapters 
(which, accompanied by scarce to inexistent hardware documentation, made 
mode switching even trickier in Linux than it already was).  It looks like 
fully-software machine code interpretation like with QEMU is the only way 
remaining for x86-64.

  Maciej


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-12-04 Thread Liam Proven via cctalk
On Sat, 1 Dec 2018 at 02:00, Maciej W. Rozycki  wrote:

>  Be assured there were enough IBM PC clones running DOS around from 1989
> onwards for this stuff to matter,

OK, fair enough. Thanks for the info!

> and hardly anyone switched to MS Windows
> before version 95 (running Windows 3.0 with the ubiquitous HGC-compatible
> graphics adapters was sort of fun anyway, and I am not sure if Windows 3.1
> even supported it; maybe with extra drivers).

It did. Demo:

https://www.youtube.com/watch?v=0lOGPQQlxT8

Screenshot:

http://nerdlypleasures.blogspot.com/2016/12/windows-30-multimedia-edition-early.html

The difficult bit was Windows 3.0 on an 8088/8086 with VGA, I believe.
The VGA driver contained 80286 instructions because MS didn't imagine
anyone would want Win3 on such old PCs.

(This again shows that MS didn't believe Win3 would be such a big hit,
giving the lie to all the pro-OS/2 anti-MS conspiracy theories...

https://virtuallyfun.com/wordpress/2011/06/01/windows-3-0/
)

To run Win3 on an 8086 in VGA mode, you had to replace the CPU with an
NEC V20 or V30, as I heard it and faintly recall...

The driver did later get patched to work:

http://www.vcfed.org/forum/showthread.php?35593-Windows-3-0-VGA-color-driver-for-8088-XT




-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-12-04 Thread Liam Proven via cctalk
On Tue, 4 Dec 2018 at 15:02, Maciej W. Rozycki via cctalk
 wrote:

>  I don't know if the unreal mode has been retained in the x86 architecture
> to this day; as I noted above it was not officially supported.  But then
> some originally undocumented x86 features, such as the second byte of AAD
> and AAM instructions actually being an immediate argument that could have
> a value different from 10, have become standardised at one point.

I know, and was surprised that, v86 mode isn't supported in x86-64.

This caused major problems for the developers of DOSEMU.


-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-12-04 Thread Maciej W. Rozycki via cctalk
On Fri, 30 Nov 2018, Fred Cisin via cctalk wrote:

> > Well, ATA drives at that time should have already had the capability to
> > remap bad blocks or whole tracks transparently in the firmware, although
> 
> Not even IDE.
> Seagate ST4096  (ST506/412 MFM)  80MB formatted, which was still considered
> good size by those of us who weren't wealthy.

 Sure!  You did need a bad block list for such a drive though.

> > Of course the ability to remap bad storage areas transparently is not an
> > excuse for the OS not to handle them gracefully, it was not that time yet
> > back then when a hard drive with a bad block or a dozen was considered
> > broken like it usually is nowadays.
> 
> Yes, they still came with list of known bad blocks.  Usually taped to the
> drive.  THIS one wasn't on the manufacturer's list, and neither SpeedStor nor
> SpinRite could find it!
> There were other ways to lock out a block besides filling it with a garbage
> file, but that was easiest.

 IIRC for MS-DOS the canonical way was to mark the containing cluster as 
bad using a special code in the FAT.  Both `format' and `chkdsk' were able 
to do that, as were some third-party tools.  That ensured that disk 
maintenance tools, such as `defrag', didn't reuse the cluster for 
something else as it could happen with a real file assignment of such a 
cluster.

> And, I did try to tell the Microsoft people that the OS "should recover
> gracefully from hardware errors".  In those words.

 I found switching to Linux a reasonable solution to this kind of customer 
service attitude.  There you can fix an issue yourself or if you don't 
feel like, then you can hire someone to do it for you (or often just ask 
kindly, as engineers usually feel responsible for code they have 
committed, including any bugs). :)

> > Did 3.1 support running in the real mode though (as opposed to switching
> > to the real mode for DOS tasks only)?  I honestly do not remember anymore,
> > and ISTR it was removed at one point.  I am sure 3.0 did.
> 
> I believe that it did.  I don't remember WHAT the program didn't like about
> 3.1, or if there were a real reason, not just an arbitrary limit.
> I don't think that the Cordata's refusal to run on 286 was based on a real
> reason.
> 
> But, the Win 3.1 installation program(s) balked at anything without A20 and a
> tiny bit of RAM above 10h I didn't have a problem with having a few
> dedicated machines (an XT with Cordata interface, an AT with Eiconscript card
> for postscript and HP PCL, an AT Win 3.0 for the font editor, a machine for
> disk duplication (no-notch disks), order entry, accounting, and lots of
> machines with lots of different floppy drive types.)  I also tested every
> release of my programs on many variants of the platform (after I discovered
> the hard way that 286 had a longer pre-fetch buffer than 8088!)

 Hmm, interesting.  I never tried any version of MS Windows on a PC/XT 
class machine and the least equipped 80286-based system I've used had at 
least 1MiB of RAM and a chipset clever enough to remap a part of it above 
1MiB.  And then that was made available via HIMEM.SYS.

 What might be unknown to some is that apart from toggling the A20 mask 
gate HIMEM.SYS also switched on the so-called "unreal mode" on processors 
that supported it.  These were at least the 80486 and possibly the 80386 
as well (but my memory has faded about it at this point), and certainly 
not the 80286 as it didn't support segment sizes beyond 64kiB.  This mode 
gave access to the whole 4GiB 32-bit address space to real mode programs, 
by setting data segment limits (sizes) to 4GiB.

 This was possible by programming segment descriptors in the protected 
mode and then switching back to the real mode without resetting the limits 
to the usual 64kiB value beforehand.  This worked because unlike in the 
protected mode segment register writes made in the real mode only updated 
the segment base and not the limit stored in the corresponding descriptor.  
IIRC it was not possible for the code segment to use a 4GiB limit in the 
real mode as it would malfunction (i.e. it would not work as per real mode 
expectations), so it was left at 64kiB.

 According to Intel documentation software was required to reset segment 
sizes to 64kiB before switching back to the real mode, so this was not an 
officially supported mode of operation.  MS Windows may or may not have 
made use of this feature in its real mode of operation; I am not sure, 
although I do believe HIMEM.SYS itself did use it (or otherwise why would 
it set it in the first place?).

 I discovered it by accident in early 1990s while experimenting with some 
assembly programming (possibly by trying to read from beyond the end of a 
segment by using an address size override prefix, a word or a doubleword 
data quantity and an offset of 0x and not seeing a trap or suchlike) 
and could not explain where this phenomenon came from as it contradicted 
the x86 processor manual I 

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-12-01 Thread Jim Manley via cctalk
On Fri, Nov 30, 2018 at 3:28 PM Grant Taylor via cctalk <
cctalk@classiccmp.org> wrote:

> On 11/30/2018 02:33 PM, Jim Manley via cctalk wrote:
> > There's enough slack in the approved offerings that electives can be
> > weighted more toward the technical direction (e.g., user interface and
> > experience) or the arts direction (e.g., psychology and history).  The
> idea
> > was to close the severely-growing gap between those who know everything
> > about computing and those who need to know enough, but not everything, to
> > be truly effective in the information-dominant world we've been careening
> > toward without nearly enough preparation of future generations.
>
> I kept thinking to myself that many of the people that are considered
> pioneers in computers were actually something else by trade and learned
> how to use computers and / or created what they needed for the computer
> to be able to do their primary job.
> --
> Grant. . . .
> unix || die
>

Most people know that Newton's motivation for developing calculus was
explaining the motions of the planets, but not many know that he served as
the Warden, and then Master, of the Royal Mint, as well as being fascinated
with optics and vision (to the point where he inserted a needle into one of
his eyes!) and a closet alchemist.  His competitor, Leibniz, was motivated
to develop calculus by a strong desire to win more billiards bets from his
fellow wealthy buddies in Hanover, the financial capital of Germany at the
time, while developing the mathematics of the physics governing the
collisions of billiard balls.  Babbage was motivated to develop calculating
and computing machines to eliminate the worldwide average of seven errors
per page in astronomical, navigational, and mathematical tables of the
1820s.

Shannon and Hamming (with whom I worked - the latter, not the former!) were
motivated to represent Boolean logic in digital circuits and improve
long-distance communications by formalizing how to predictably ferret more
signal out of noise.  Turing was motivated to test his computing theories
to break the Nazi Enigma ciphers (character-oriented, vs. word-oriented
codes) and moved far beyond the mathematical underpinnings of his theories
into the engineering of Colossus and the bombes.  Hollerith was motivated
by the requirement to complete the decennial census tabulations within 10
years (the 1890 census was going to take 13 years to tabulate using
traditional manual methods within the available budget).  Mauchly and
Eckert were motivated to automate calculations for ballistics tables for
WW-II weapons systems that were being fielded faster than tables could be
produced manually.

Hopper developed the first compiler and the first programming language to
use English words, Flow-Matic, that led, in turn, to COBOL being created to
meet financial software needs.  John Backus and the other developers of
FORTRAN were likewise motivated by scientific and engineering calculation
requirements.  Kernigan, Ritchie, and Thompson were motivated by a desire
to perform an immense prank, in the form of Unix and A/B/BCPL/C, on an
unsuspecting and all-too-serious professional computing world (
http://www.stokely.com/lighter.side/unix.prank.html).  Gates and Allen were
motivated by all of the money lying around on desks, in their drawers, and
in the drawers worn by the people sitting at said desks, to foist PC/MS-DOS
and Windows on the less serious computing public.  Kildall was motivated by
the challenges of developing multi-pass compilation on systems with minimal
microcomputer hardware resources.

Meanwhile, the rest of the computing field was motivated to pursue the next
shinier pieces of higher-performance hardware, developing ever-more-bloated
programming languages, OSes, services, and applications that continue to
slow down even the latest-and-greatest systems.  Berners-Lee was motivated
to help scientists and engineers at the European Organization for Nuclear
Research (CERN - the Conseil Européen pour la Recherche Nucléaire) organize
and share their work without having to become expert software developers in
their own right.  Yang, Filo, Brin, Page, Zuckerberg, et al, were motivated
by whatever money could be scrounged from sofas used by couch-surfing,
homeless Millenials (redundant syntax fully intended), and from local news
outlets' advertising accounts.  Selling everyone's, but their own,
personally-identifiable information, probably including that of their own
mothers, has been a welcome additional cornucopia of revenue to them.

Computer science and engineering degrees weren't even offered yet when I
attended the heavily science and engineering oriented naval institution
where I earned my BS in engineering (70% of degrees awarded were in STEM
fields).  The closest you could get were math and electrical engineering
degrees, taking the very few electives offered in CS and CE disciplines.
Granted, the computer I primarily had access to was a secondhand GE-265

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Fred Cisin via cctalk

I found the bad spot and put a SECTORS.BAD file there, and then was OK.


On Sat, 1 Dec 2018, Maciej W. Rozycki wrote:

Well, ATA drives at that time should have already had the capability to
remap bad blocks or whole tracks transparently in the firmware, although


Not even IDE.
Seagate ST4096  (ST506/412 MFM)  80MB formatted, which was still 
considered good size by those of us who weren't wealthy.




Of course the ability to remap bad storage areas transparently is not an
excuse for the OS not to handle them gracefully, it was not that time yet
back then when a hard drive with a bad block or a dozen was considered
broken like it usually is nowadays.


Yes, they still came with list of known bad blocks.  Usually taped to the 
drive.  THIS one wasn't on the manufacturer's list, and neither SpeedStor 
nor SpinRite could find it!
There were other ways to lock out a block besides filling it with a 
garbage file, but that was easiest.


And, I did try to tell the Microsoft people that the OS "should recover 
gracefully from hardware errors".  In those words.




I had a font editor that wouldn't tolerate 3.1, and quite a few XTs (no A20),
so I continued to keep Win 3.0 on a bunch of machines.

Did 3.1 support running in the real mode though (as opposed to switching
to the real mode for DOS tasks only)?  I honestly do not remember anymore,
and ISTR it was removed at one point.  I am sure 3.0 did.


I believe that it did.  I don't remember WHAT the program didn't like 
about 3.1, or if there were a real reason, not just an arbitrary limit.
I don't think that the Cordata's refusal to run on 286 was based on a real 
reason.


But, the Win 3.1 installation program(s) balked at anything without A20 
and a tiny bit of RAM above 10h I didn't have a problem with having a 
few dedicated machines (an XT with Cordata interface, an AT with 
Eiconscript card for postscript and HP PCL, an AT Win 3.0 for the font 
editor, a machine for disk duplication (no-notch disks), order entry, 
accounting, and lots of machines with lots of different floppy drive 
types.)  I also tested every release of my programs on many variants of 
the platform (after I discovered the hard way that 286 had a longer 
pre-fetch buffer than 8088!)


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Maciej W. Rozycki via cctalk
On Fri, 30 Nov 2018, Fred Cisin via cctalk wrote:

> I found the bad spot and put a SECTORS.BAD file there, and then was OK.
> The Microsoft Beta program wanted cheerleaders, and ABSOLUTELY didn't want any
> negative feedback nor bug reports, and insisted that the OS had no
> responsibility to recover from nor survive hardware problems, and that
> therefore it was not their problem.  I told them that they would soon have to
> do a recall (THAT was EXACTLY what happened with DOS 6.2x).  They did not
> invite me to participate in any more Betas.

 Well, ATA drives at that time should have already had the capability to 
remap bad blocks or whole tracks transparently in the firmware, although 
obviously it took some time for the industry to notice that and catch up 
with support for the relevant protocol requests in the software tools.  
It took many years after all for PC BIOS vendors to notice that ATA drives 
generally do report their C/H/S geometry supported (be it real or 
simulated; I only ever came across one early ATA HDD whose C/H/S geometry 
was real, all the rest were ZBR), so there is no need for the user to 
enter it manually for a hard drive to work.

 Of course the ability to remap bad storage areas transparently is not an 
excuse for the OS not to handle them gracefully, it was not that time yet 
back then when a hard drive with a bad block or a dozen was considered 
broken like it usually is nowadays.

> I had a font editor that wouldn't tolerate 3.1, and quite a few XTs (no A20),
> so I continued to keep Win 3.0 on a bunch of machines.

 Did 3.1 support running in the real mode though (as opposed to switching 
to the real mode for DOS tasks only)?  I honestly do not remember anymore, 
and ISTR it was removed at one point.  I am sure 3.0 did.

  Maciej


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Fred Cisin via cctalk

On Sat, 1 Dec 2018, Maciej W. Rozycki via cctalk wrote:

Be assured there were enough IBM PC clones running DOS around from 1989
onwards for this stuff to matter, and hardly anyone switched to MS Windows
before version 95 (running Windows 3.0 with the ubiquitous HGC-compatible
graphics adapters was sort of fun anyway, and I am not sure if Windows 3.1
even supported it; maybe with extra drivers).


Depending on which question you are asking, . . .
Windows 3.1 definitely did support Hercules video.  We had about 3 dozen 
such machines (386SX) in the school student homework lab.
It also supported CGA, but initially didn't come with the driver, so it 
would work if you upgraded from 3.0 to 3.1, or otherwise used the 3.0 CGA 
driver.


In August 1991, I went to a Microsoft conference in Seattle.  Although it 
was the anniversary of the 5150, Bill Gates was making appearances on the 
east coast, instead of being there.


They asked our opinion of the NEW flying ["dry rot" disintegrating] window 
logo, and couldn't believe it that we did NOT love it.


I found out about, and got a copy of, a CD-ROM "International" Windows 
3.0, with many languages, including Chinese!  I loved being able to 
install from CD, instead of boxes of floppies, and was glad that they were 
at least trying to expand to the rest of the world.


They introduced Windows 3.1.  But the borrowed Toshiba laptop that I had 
with me had 1MB of contiguous RAM, but not A20 support, and 3.1 "NEEDED" 
64K above 1MB for HIMEM.SYS, which "SOLVES the problem of not enough RAM".
3.1 also was the first product to force SMARTDRV.SYS.  As soon as I got 
home, I contacted the Win3.1 Beta program to tell them that write-cacheing 
without a way to turn it off was a BIG problem.  There was a bad spot on 
the hard drive that I was installing it to that neither Spinrite nor 
SpeedStor could find, but it consistently crashed the 3.1 installation. 
But, with the forced write cacheing, there was NO possible way to recover. 
(Without write-cacheing, you just rename the file that failed, and 
manually install another copy of that one file)

I found the bad spot and put a SECTORS.BAD file there, and then was OK.
The Microsoft Beta program wanted cheerleaders, and ABSOLUTELY didn't want 
any negative feedback nor bug reports, and insisted that the OS had no 
responsibility to recover from nor survive hardware problems, and that 
therefore it was not their problem.  I told them that they would soon 
have to do a recall (THAT was EXACTLY what happened with DOS 6.2x).  They 
did not invite me to participate in any more Betas.


I had a font editor that wouldn't tolerate 3.1, and quite a few XTs (no 
A20),  so I continued to keep Win 3.0 on a bunch of machines.




Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Maciej W. Rozycki via cctalk
On Sun, 25 Nov 2018, Liam Proven via cctalk wrote:

> > > For example, right now, I am in my office in Křižíkova. I can't
> > > type that name correctly without Unicode characters, because the ANSI
> > > character set doesn't contain enough letters for Czech.
> >
> > Intriguing.  Is there an old MS-DOS Code Page (or comparable technique)
> > that does encompass the necessary characters?
> 
> Don't know. But I suspect there weren't many PCs here before the
> Velvet Revolution in 1989. Democracy came around the time of Windows
> 3.0 so there may not have been much of a commerical drive.

 Be assured there were enough IBM PC clones running DOS around from 1989 
onwards for this stuff to matter, and hardly anyone switched to MS Windows 
before version 95 (running Windows 3.0 with the ubiquitous HGC-compatible 
graphics adapters was sort of fun anyway, and I am not sure if Windows 3.1 
even supported it; maybe with extra drivers).

 Anyway MS-DOS 5.0 onwards had a complete set of code pages for various 
regions of the world.  For Czechia, Hungary, Lithuania, Poland, and other 
European countries located towards the east and using a language with a 
latin transcription code page 852 was provided.  For France, Germany, 
Spain, Nordic countries, etc. page 850 was provided.  There were other 
pages included as well, beyond the IBM's original page 437, including 
Greek and Cyrillic, but I don't know the details.  It's quite likely 
Wikipedia has them.

 Of course the HGC didn't support text mode character switching, however 
ISA VGA clones started trickling in at one point too.  I still have my ISA
Trident TVGA 8900C adapter from 1993 working in one of my machines, though 
I have since switched to Linux.

 NB my last name is also correctly spelled Różycki rather than Rozycki, 
and the two letters with the diacritics are completely different from and 
have sounds associated that bear no resemblance to the corresponding ones 
without, i.e. these are not merely accents, which we don't have in Polish 
at all (Polish complicates this further in that the sound of `ó' is the 
same as the sound of `u' and the sound of `ż' is the same as the sound of 
`rz' (which is BTW different from where the two letters are written 
separately), however the alternatives are not interchangeable and are 
either invalid or change the meaning of a word, and many native Polish 
speakers get them wrong anyway).

 FWIW,

  Maciej


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Grant Taylor via cctalk

On 11/30/2018 03:57 PM, Sean Conner via cctalk wrote:
There are several problems with this.  One, how many bits do you set 
aside per character?  8?  16?  There are potentially an open ended set 
of stylings that one might use.


I acknowledge that the idea I shared was incomplete and likely has 
shortcomings.  But I do think that it demonstrates a concept, which is 
what I was after.


Second problem---where do you store such bits?  Not to imply this is a 
bad idea, just that there are issues that need to be resolved with how 
things are done today (how does this interact with UTF-8 for instance? 
Or UCS-4?).


Ideally, I'd like to see UTF-8 / UTF-16 code points (?) for the 
different styles of a letter.  Not every letter (character ~> byte / 
double) needs the styling.  So I suspect that it would be better to 
judiciously place code points in the UTF-8 / UTF-16 space.


Sadly, when I try to search for "this", the letters aren't found in 
"푡ℎ푖푠 푖푠 푎 푠푡푟푖푛푔" or "혁헵헶혀 헶혀 헮 헰헼헺헺헲헻혁". 
Something that I think should work.


Also, storage of these letters can work just like it is in this email.  ;-)



--
Grant. . . .
unix || die


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Sean Conner via cctalk
It was thus said that the Great Keelan Lightfoot via cctalk once stated:
> > I see no reason that we can't have new control codes to convey new
> > concepts if they are needed.
> 
> I disagree with this; from a usability standpoint, control codes are
> problematic. Either the user needs to memorize them, or software needs
> to inject them at the appropriate times. There's technical problems
> too; when it comes to playing back a stream of characters, control
> characters mean that it is impossible to just start listening. It is
> difficult to fast forward and rewind in a file, because the only way
> to determine the current state is to replay the file up to that point.

  [ and further down the message ... ]

> I'm going to lavish on the unicode for this example, so those of you
> properly unequipped may not see this example:
> 
> foo := 푡ℎ푖푠 푖푠 푎 푠푡푟푖푛푔 혁헵헶혀 헶혀 헮 헰헼헺헺헲헻혁
> printf(푡ℎ푒 푠푡푟푖푛푔 푖푠 ① 푖푠푛푡 푡ℎ푎푡 푒푥푐푖푡푖푛푔, foo)
> if 혁헵헶혀 헶혀 헮 헽헼헼헿헹혆 헽헹헮헰헲헱 헰헼헺헺헲헻혁 foo ==
> 푡ℎ푖푠 푖푠 푎푙푠표 푎 푠푡푟푖푛푔, 푏푢푡 푛표푡 푡ℎ푒 푠푎푚푒
> 표푛푒 { 혁헵헶혀 헶혀 헮헹혀헼 헮 헰헼헺헺헲헻혁
> ...
> 
> An atrocious example, but a good demonstration of my point. If I had a
> toggle switch on my keyboard to switch between code, comment and
> string, it would have been much simpler to construct too!

  Somehow, the compiler will have to know that "푡ℎ푖푠 푖푠 푎 푠푡푟푖푛푔" is a
string while "혁헵헶혀 헶혀 헮 헰헼헺헺헲헻혁" is a comment to be ignored.  You lamented
the lack of a toggle switch for the two, but existing langauges, like C,
already have them, '"' is the "toggle" for strings, while '/*' and '*/' are
the toggles for comment (and now '//' if you are using C99).  It's still
something you have to "type" (or "toggle" or "switch" or somehow indicate
the mode).

  The other issue is now such inforamtion is stored, and there, I only see
two solutions---in-band and out-of-band.  In-band would be included with the
text.  Something along the lines of (where  is the ASCII ESC character
27, and this is an example only):

foo := _this is a string\ ^this is a comment\
printf(_the string is [1p isn't that exciting\,foo)

  But this has a problem you noted above---it's a lot harder to seek through
the file to arbitrary positions.  Grant Taylor stated another way of doing
this:

> What if there were (functionally) additional bits that indicated various
> other (what I was calling) stylings?
> 
> I think that something along those lines could help avoid a concern I
> have.  Namely how do search for an A, what ever ""style it's in.  I
> think I could hypothetically search for bytes ~> words (characters)
> containing ( ) () 01x1 (assuming that the
> proceeding don't cares are set appropriately) and find any format of A,
> upper case, lower case, bold, italic, underline, strike through, etc.

  There are several problems with this.  One, how many bits do you set aside
per character?  8?  16?  There are potentially an open ended set of stylings
that one might use.  Second problem---where do you store such bits?  Not to
imply this is a bad idea, just that there are issues that need to be
resolved with how things are done today (how does this interact with UTF-8
for instance?  Or UCS-4?).

Then there's out-of-band storage, which stores such information outside the
text (an example---I'm not saying this is the only way to store such
information out-of-band):

foo := this is a string this is a comment
printf(the string is 1 isn't that exciting,foo)

---

string 8-23
string 50-63
string 65-84
replacement 64
comment 25-41

  This has its own problems---namely, how to you keep the two together.  It
will either be a separate file, which could get separated, or part of the
text file but then you run into the problem of reading Microsoft Word files
cira 1986 with today's tools.  

  -spc (I like the ideas, but the implementations are harder than it first
appears ... )


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Grant Taylor via cctalk

On 11/30/2018 02:33 PM, Jim Manley via cctalk wrote:

There's enough slack in the approved offerings that electives can be
weighted more toward the technical direction (e.g., user interface and
experience) or the arts direction (e.g., psychology and history).  The idea
was to close the severely-growing gap between those who know everything
about computing and those who need to know enough, but not everything, to
be truly effective in the information-dominant world we've been careening
toward without nearly enough preparation of future generations.


I kept thinking to myself that many of the people that are considered 
pioneers in computers were actually something else by trade and learned 
how to use computers and / or created what they needed for the computer 
to be able to do their primary job.




--
Grant. . . .
unix || die


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Grant Taylor via cctalk

On 11/30/2018 11:34 AM, Keelan Lightfoot via cctalk wrote:

Thanks!


:-)

Both. In the beginning we were content, because the keyboard was well 
suited to the capabilities of the technology available at the time it was 
invented. We didn't see a better way, because when compared to using a pen 
and paper (for writing) or using toggle switches (to control a computer), 
a keyboard was a significant improvement. It's the the explosive growth 
and universal adoption of computers that has locked us in to the keyboard 
as the standard.


*sigh*  The Steve G. from Security Now's comments about passwords not 
going away comes to mind and seems apropos for keyboards.


There are other, likely better, things out there.  But keyboards 
themselves aren't going to go away.


I disagree with this; from a usability standpoint, control codes are 
problematic. Either the user needs to memorize them, or software needs 
to inject them at the appropriate times.


Okay.

There's technical problems too; when it comes to playing back a stream 
of characters, control characters mean that it is impossible to just 
start listening. It is difficult to fast forward and rewind in a file, 
because the only way to determine the current state is to replay the 
file up to that point.


Now I'm wondering about something akin to the differences in upper case 
and lower case.  Functionally the same code, just a different value in 
the 6th bit.


What if there were (functionally) additional bits that indicated various 
other (what I was calling) stylings?


I think that something along those lines could help avoid a concern I 
have.  Namely how do search for an A, what ever ""style it's in.  I 
think I could hypothetically search for bytes ~> words (characters) 
containing ( ) () 01x1 (assuming that the 
proceeding don't cares are set appropriately) and find any format of A, 
upper case, lower case, bold, italic, underline, strike through, etc.


The other think that the additional bit / flags could do is allow the 
bytes (words / characters) to be read mid-stream.


Do you mean modal control codes? As in "everything after here is bold" 
and "the bold stops here"?


Yes.  That's what I was thinking when I wrote that.

We've gone backwards sadly. For a brief while, this kind of rich user 
interface stuff was provided by the OS. A text box, regardless of 
the application, would use the OS's text box control, and would have 
a universal interface for rich text.


Indeed.

But the growth of the web has resulted in an atavism. We're back to plain 
text, and using markup to style our text.


I mostly agree.  But I do wonder how true that actually is, at least on 
a technical level.  I think the text input box can be enhanced to allow 
more than just plain text.


If I want bold text in Slack, I have to use markup.  Facebook Messages and 
YouTube comments also support markup, but the syntax is slightly different 
between them.


*sigh*

Back in 1991, If I wanted bold text in any application that supported 
rich text on my SE/30, I hit command-B and I got bold text. Sure, there 
are Javascript rich text editors that can be bolted on, but they all 
have their own UI concepts, and they're all a trainwreck.


I believe that we can do better.

In addition to crusty old computers, I also enjoy the company of three 
crusty old Linotypes. In fact, that's what got me thinking about this 
stuff in the first place. The Linotype keyboard has 90 keys, which 
directly map to the 90 glyphs a Linotype can "render". The keyboard is 
laid out in three qual sized sections: lowercase letters on the left, 
uppercase on the right, with numbers and punctuation in the middle. 
Push the button, and what's marked on the button is what ultimately 
ends up on the page. Each Linotype mat (matrix; letter mold) has two 
positions, which can be selected by flipping a little lever when they're 
being assembled into a line. The two positions are almost always used 
to select between two versions of a font; roman/bold or roman/italic 
are the most common pairings.


Intriguing.  I have a vague mental image of what you're talking about 
after watching Linotype: The Film (http://www.linotypefilm.com).  I 
found it quite entertaining and informative.


But what it means is that you can walk up to a machine with a half-typed 
line in the assembler and immediately determine its state.  Any mats 
set in the bold position are in a physically different position in 
the assembler. The position of the switch tells you if you're typing 
in bold or roman. When you push the 'A' key, you know an uppercase 'A' 
in bold will be added to the line. Additionally, the position of that 
switch can be verified without taking your eyes off of the copy. There 
is no black magic, no spooky action at a distance.  The capabilities of 
the machine are immediately apparent.


I was not aware of the physically different positions.  But either I 
don't remember, pick up on, or they 

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Jim Manley via cctalk
> Back on topic, the tools exist, but they are often seen as toys and
> not serious software
> development tools. Are we at the point where the compiler for a visual
> programming
> language is written in the visual programming language?
>
> - Keelan
>

Hi Keelan,

I was going to mention this further back in the thread when visual
programming was first mentioned, but for those not aware, there has been a
shift in emphasis in teaching computing principles to newbies who have no
idea what a bit, byte, assembler, compiler, interpreter, etc., are.  UC
Berkeley's "The Beauty and Joy of Computing" (and a follow-on "The Beauty
and Joy of Data", offered at some institutions) curricula are increasingly
being taught (starting in high school advanced placement computer science,
as well as in freshman coursework in universities) to convey fundamental
computing concepts:

https://bjc.berkeley.edu

The associated courses are taught using a visual programming environment
called Snap!, where the (now browser-based, thank goodness) ease-of-use of
Scratch (drag-and-drop interface, visual metaphors for loops, conditionals,
etc., as well as easy animation tools) is combined with the power of Scheme
(first class procedures, first class lists, first class objects, and first
class continuations).

https://snap.berkeley.edu

Some universities have begun offering Bachelor of Arts degrees in CS, in
addition to BSCSs, where about half of the BACS coursework is
technically-oriented, and the remainder is oriented to more traditional
arts offerings.  TB, TB, and Snap! form a bridge so that students
who ordinarily would never even consider studying CS can become
knowledgeable enough to truly comprehend and appreciate computing's
possibilities and limitations in its role in civilization (or at least
what's left of it).

There's enough slack in the approved offerings that electives can be
weighted more toward the technical direction (e.g., user interface and
experience) or the arts direction (e.g., psychology and history).  The idea
was to close the severely-growing gap between those who know everything
about computing and those who need to know enough, but not everything, to
be truly effective in the information-dominant world we've been careening
toward without nearly enough preparation of future generations.

I haven't worked with Snap! enough yet to know for sure whether it can be
used to develop itself, but I strongly suspect that is the case (it's
actually implemented in Javascript using an HTML5 canvas due to its
browser-based nature).  It wouldn't be suitable for doing systems level
development, unless optimized C code (or equivalent) could be emitted, but
it could certainly be used to demonstrate the logic principles involved in
any level of software development that most people are ever likely to need
to understand.  There's mention of Snap! programs being convertible to
mainstream programming languages such as Python, JavaScript, C, etc., but I
haven't traced to ground in documentation how that's supposed to happen,
yet.

We may be part-way there because Google's Blockly spin-off of Scratch can
already emit five scripting languages (Javascript, Python, PHP, Lua, and
Dart), and it uses a modular approach where emission of code in additional
languages could reportedly be added.  That magic word, "optimized", is the
key to whether the code is fundamentally correct and would need oodles of
hand-rewriting to improve efficiency, or there are ways to automate at
least some of the optimization.

Snap! can be run off-line in a browser, as well as on the on-line primary
and mirror sites, and standalone applications can be generated.  Scratch
has been extended to provide an easy way to control and sense physical
environments via typical robotics components, but I haven't looked to see
if Snap! has inherited those extensions.

For any doubters, note that Pacman was ported to Scratch years ago,
complete with the authentic sounds (including the "shrivel and
disappear-in-death" clip), so ... ;^)

All the Best,
Jim


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Keelan Lightfoot via cctalk
> Welcome.  :-)

Thanks!

> Do you think that we stopped enhancing the user input experience more
> because we were content with what we had or because we didn't see a
> better way to do what we wanted to do?

Both. In the beginning we were content, because the keyboard was well
suited to the capabilities of the technology available at the time it
was invented. We didn't see a better way, because when compared to
using a pen and paper (for writing) or using toggle switches (to
control a computer), a keyboard was a significant improvement. It's
the the explosive growth and universal adoption of computers that has
locked us in to the keyboard as the standard.

> I agree that markup languages are a kludge.  But I don't know that they
> require plain text to describe higher level concepts.
>
> I see no reason that we can't have new control codes to convey new
> concepts if they are needed.

I disagree with this; from a usability standpoint, control codes are
problematic. Either the user needs to memorize them, or software needs
to inject them at the appropriate times. There's technical problems
too; when it comes to playing back a stream of characters, control
characters mean that it is impossible to just start listening. It is
difficult to fast forward and rewind in a file, because the only way
to determine the current state is to replay the file up to that point.

> Aside:  ASCII did what it needed to do at the time.  Times are different
> now.  We may need more / new / different control codes.
>
> By control codes, I'm meaning a specific binary sequence that means a
> specific thing.  I think it needs to be standardized to be compatible
> with other things -or- it needs to be considered local and proprietary
> to an application.

Do you mean modal control codes? As in "everything after here is bold"
and "the bold stops here"?

> I actually wonder how much need there is for /all/ of those utilities.
> I expect that things should have streamlined and simplified, at least
> some, in the last 30 years.

We've gone backwards sadly. For a brief while, this kind of rich user
interface stuff was provided by the OS. A text box, regardless of the
application, would use the OS's text box control, and would have a
universal interface for rich text. But the growth of the web has
resulted in an atavism. We're back to plain text, and using markup to
style our text. If I want bold text in Slack, I have to use markup.
Facebook Messages and YouTube comments also support markup, but the
syntax is slightly different between them. Back in 1991, If I wanted
bold text in any application that supported rich text on my SE/30, I
hit command-B and I got bold text. Sure, there are Javascript rich
text editors that can be bolted on, but they all have their own UI
concepts, and they're all a trainwreck.

> What would you like to do or see done differently?  Even if it turns out
> to be worse, it would still be something different and likely worth
> trying at least once.

In addition to crusty old computers, I also enjoy the company of three
crusty old Linotypes. In fact, that's what got me thinking about this
stuff in the first place. The Linotype keyboard has 90 keys, which
directly map to the 90 glyphs a Linotype can "render". The keyboard is
laid out in three qual sized sections: lowercase letters on the left,
uppercase on the right, with numbers and punctuation in the middle.
Push the button, and what's marked on the button is what ultimately
ends up on the page. Each Linotype mat (matrix; letter mold) has two
positions, which can be selected by flipping a little lever when
they're being assembled into a line. The two positions are almost
always used to select between two versions of a font; roman/bold or
roman/italic are the most common pairings.

But what it means is that you can walk up to a machine with a
half-typed line in the assembler and immediately determine its state.
Any mats set in the bold position are in a physically different
position in the assembler. The position of the switch tells you if
you're typing in bold or roman. When you push the 'A' key, you know an
uppercase 'A' in bold will be added to the line. Additionally, the
position of that switch can be verified without taking your eyes off
of the copy. There is no black magic, no spooky action at a distance.
The capabilities of the machine are immediately apparent.

> I don't think of bold or italic or underline as second class concepts.
> I tend to think of the following attributes that can be applied to text:
>
>   · bold
> [snip]
>
> I don't think that normal is superior to the other four (five) in any
> way.  I do think that normal does occur VASTLY more frequently than the
> any combination of the others.  As such normal is what things default to
> as an optimization.  IMHO that optimization does not relegate the other
> styles to second class.

I agree. I think that they're normal enough that they should exist as
their own code points in unicode. Our 

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-28 Thread Jim Manley via cctalk
Some computing economics history:

I'm an engineer and scientist by both education and experience, and one
major difference between the disciplines is that engineers are required to
pass coursework and demonstrate proficiency in economics.  That's because
we need to deliver things that actually do what customers think they paid
for within strict budgets and schedules, or we go hungry.  Scientists, on
the other hand, if they can accurately predict what it will cost to prove a
theory, aren't practicing science, because they have to already know the
outcome and are taking no risk.  A theoretically "superior" encoding may
not see practical use by a significant number of people because of legacy
inertia that often makes no sense, but is rooted in cultural, sociological,
emotional, and other factors, including economics.

Dvorak computer keyboards are allegedly far more efficient
speed/accuracy-wise than QWERTY computer keyboards, so they should rule the
computing world, but they don't.  Keyboards that reduce the risk of
repetitive stress injuries (e.g., carpal tunnel syndrome) should dominate
the market for very sensible health reasons, but they don't, either.
Legacy inertia is a beyotch to overcome, especially when
international-level manufacturers and investors have a strong
interest making lots of money from the status quo.  Logic and reasoning are
simply nowhere near enough to create the conditions necessary for
widespread adoption - sometimes it's just good luck in timing (or, bad
luck, as the case may be).

ASCII was developed in an age when Teletypes and similar devices were the
only textual I/O options, with fixed-width/size/style typefaces (font
family is an attribute of a typeface - there's no such thing as a "font").
By the late 1950s, there were around 250 computer manufacturers, and none
of their products were interoperable in any form.  Until the IBM 360 was
released in 1965, IBM had 14 product _lines_ that were incompatible with
each other, despite having 20,000+ very capable scientists and engineers on
their payroll.

You can't blame the ASCII developers for lack of foresight when no one in
their right mind back then would have ever predicted we could have upwards
of a trillion bytes of memory in our pockets (e.g., the Samsung Note 9),
much less multi-megapixel touch displays with millions of colors, with
worldwide-reaching cellular/Internet access with milliseconds of round-trip
response, etc.

Someone thinking that they're going to make oodles of money from some
supposedly new-and-improved proprietary encoding "standard" that discards
five-plus decades of legacy intellectual and economic investment, is
pursuing a fool's errand.  Even companies with resources at the level of
Apple, Google, Microsoft, etc., aren't that arrogant, and they've
demonstrated some pretty heavy-duty chutzpah over time.  BTW, you won't be
able to patent what apparently amounts to a lookup table, and even if you
copyright it, it will be a simple matter of developing
functionally-equivalent code that performs a translation on-the-fly.  See
also the clever schemes where DVD encryption keys, that had been left on an
unprotected server accessible via the Internet, were transformed into prime
numbers that didn't infringe on the copyrights associated with the keys.

True standards are open nowadays - the days of proprietary "standards" are
a couple of decades behind us - even Microsoft has been publishing the
binary structure of their Office document file formats.  The specification
for Word, that includes everything going back to v 1.0, is humongous, and
even they were having fits trying to maintain the total spec, which is
reportedly why they went with XML to create the .docx, .xlsx, .pptx, etc.,
formats.  That also happened to make it possible to placate governments
(not to mention customers) that are looking for any hint of
anti-competitive behavior, and thus also made it easier for projects such
as OpenOffice and LibreOffice to flourish.

Typographical bigots, who are more interested in style than content, were
safely fenced off in the back rooms of publishing houses and printing
plants until Apple released the hounds on an unsuspecting public.  I'm
actually surprised that the style purists haven't forced Smell-o-Vision
technology on The Rest of Us to ensure that the musty smell of old books is
part of every reading "experience" (I can't stand the current common use of
that word).  At least I have the software chops to transform the visual
trash that passes for "style" these days into something pleasing to _my_
eyes (see what I did there with "severely-flawed" ASCII?  Here's how you
can do /italics/ and !bold! BTW.).

Nothing frosts me more than reading text that can't be resized and
auto-reflowed, especially on mobile devices with extremely limited display
real estate.  I'm fully able-bodied and I'm perturbed by such bad design,
so, I'm pretty sure that pages that prevent pinch-zooming, and that don't
allow for direct 

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-28 Thread Christian Gauger-Cosgrove via cctalk
On Wed, 28 Nov 2018 at 09:27, Paul Koning via cctalk
 wrote:
> I learned it about 15 years ago (OpenAPL, running on a Solaris workstation 
> with a modified Xterm that handled the APL characters).  Nice.  It made a 
> handy tool for some cryptanalysis programs I needed to write.
>
I am interested in this cryptanalysis program...


> I wonder if current APL implementations use the Unicode characters for APL, 
> that would make things easy.
>
I can confirm that both NARS 2000 and Dyalog APL both use the Unicode
APL characters.

Regards,
Christian
-- 
Christian M. Gauger-Cosgrove
STCKON08DS0
Contact information available upon request.


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-28 Thread Paul Koning via cctalk



> On Nov 27, 2018, at 9:23 PM, Fred Cisin via cctalk  
> wrote:
> 
>>> I have long wondered if there are computer languages that aren't rooted
>>> in English / ASCII.  I feel like it's rather pompous to assume that all
>>> programming languages are rooted in English / ASCII.  I would hope that
>>> there are programming languages that are more specific to the region of
>>> the world they were developed in.  As such, I would expect that they
>>> would be stored in something other than ASCII.
> 
> On Tue, 27 Nov 2018, William Donzelli via cctalk wrote:
>> APL.
> 
> APL requires adding additional characters.  That was a major obstacle to 
> acceptance, both in terms of keyboard and type ball (my use preceded CRT), 
> but also asking the user/programmer to learn new characters.  I loved APL!

I learned it about 15 years ago (OpenAPL, running on a Solaris workstation with 
a modified Xterm that handled the APL characters).  Nice.  It made a handy tool 
for some cryptanalysis programs I needed to write.

I wonder if current APL implementations use the Unicode characters for APL, 
that would make things easy.

> I love the use of an arrow for assignment.  ...

One of the strangest programming languages I've used is POP-2, which we used in 
an AI course (Expert Systems) at the University of Illinois, in 1976.  Taught 
by a visiting prof from the University of Edinborough, I think Donald Mickie 
but I may have the name confused.

Like APL, POP-2 had the same associativity for all operators.  Unlike APL, the 
designers decided that the majority should win so assignment would be 
left-associative like everything else -- rather than APL's rule that all the 
other operators are right-associative like assignment.  So you'd end up with 
statements like:

n + 1 -> n

More at https://en.wikipedia.org/wiki/POP-2

paul



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-28 Thread Liam Proven via cctalk
On Tue, 27 Nov 2018 at 20:47, Grant Taylor via cctalk
 wrote:
>
> I don't think that HTML can reproduce fixed page layout like PostScript
> and PDF can.  It can make a close approximation.  But I don't think HTML
> can get there.  Nor do I think it should.

There are a wider panoply of options to consider.

For instance, Display Postscript, and come to that, arguably, NeWS.

Also, modern document-specific markups. I work in DocBook XML, which I
dislike intensely.

There's also, at another extreme, AsciiDoc (and Markdown (in various
"flavours")), Restructured Text, and similar "lightweight" MLs:

http://hyperpolyglot.org/lightweight-markup

But there are, of course, rivals. DITA is also widely-used.

And of course there are things like LyX/LaTeX/TeX, which some find
readable. I am not one of them. But I get paid to do Docbook, I don't
get paid to do TeX.

Neal Stephenson's highly enjoyable novel /Seveneves/ contains some
interesting speculations on the future of the Roman alphabet and what
close contact with Cyrillic over a period will do to it.

Aside:

[[

> I'm not personally aware of any cases where ASCII limits programming
> languages.  But my ignorance does not preclude that situation from existing.

APL and ColorForth, as others have pointed out.

> I have long wondered if there are computer languages that aren't rooted
> in English / ASCII.

https://en.wikipedia.org/wiki/Qalb_(programming_language)

More generally:

https://en.wikipedia.org/wiki/Non-English-based_programming_languages

Personally I am more interested in non-*textual* programming
languages. A trivial candidate is Scratch:

https://scratch.mit.edu/

But ones that entirely subvert the model of using linear files
containing characters that are sequentially interpreted are more
interesting to me. I blogged about one family I just discovered last
week:

https://liam-on-linux.livejournal.com/60054.html

The videos are more or less _necessary_ here, because trying to
describe this in text will fail _badly_. Well worth a couple of hours
of anyone's time.

]]

Anyway. To return to text encodings.

Again I wish to refer to a novel; to Kim Stanley Robinson's "Mars
trilogy", /Red Mars/, /Green Mars/ and /Blue Mars/. Or as a friend
called them, "RGB Mars" or even "Technicolor Mars".

A character presents an argument that if you try to summarise many
things on a scale -- e.g. for text encodings, from simplicity and
readability, to complexity and capability -- you can't encapsulate any
sophisticated system.

He urges a 4-cornered system, using the example of the "four humours":
phlegm, bile, choler and sang. The opposed corners of the diagram are
as important as the sides of the square; characteristics form the
corners, but the intersections between them are what defines us.

So. There is more than one scale here.

At one extreme, we could have the simplest possible text encoding.
Something like Morse code or Braille, which omits almost all "syntax"
-- almost no punctuation, no carriage returns or anything like that,
which are _metadata_, they are information about how to display the
content, not content themselves. Not even case is encoded: no
capitals, no minuscule letters. But of course a number of alphabets
don't have that distinction, and it's not essential in the Roman
alphabet.

Slightly richer, but littered with historical baggage from its origins
in teletypes: ASCII.

Much richer, but still not rich enough for all the
Roman-alphabet-using-languages: ANSI.

Insanely rich, but still not rich enough for all the written
languages: Unicode. (What plane? What encoding? What version, even?)

At the other extreme, markup languages that either weren't really
intended for humans but often are written by them -- e.g. the SGML/XML
family -- or are only usable by relatively few humans -- e.g. the TeX
family -- or that are almost never used by humans, e.g. PostScript, or
HP PCL.

And what I find a fairly happy medium -- AsciiDoc, say. Perfectly
readable by untrained people as plain ASCII, can be written with mere
hours of study, if that, but also can be processed and rendered into
something much prettier.

The richer the encoding, the harder it is for *humans* to read, and
the more complex the software to handle it needs to be.

So, yes, ASCII is perhaps too minimal. ANSI is just a superset.

But I'd argue that there _should_ be a separation between at least 2,
maybe 3 levels, and arguably more.

#1 Plain text encoding. Ideally able to handle all the characters in
all forms of the Latin alphabet, and single-byte based. Drop ASCII
legacy baggage such as backspace, bell, etc.

#2 Richer text, with simple markup, but human-readable and
human-writable without needing much skill or knowledge. Along the
lines of Markdown or *traditional* /email/ _formatting_ perhaps.

#3 Formatted text, with embedded control codes. The Oberon OS does this.

#4 Full 1980s word-processor-style document, with control codes,
formatting, font and page layout features, etc.

#5 

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-28 Thread Liam Proven via cctalk
On Wed, 28 Nov 2018 at 08:05, Fred Cisin via cctalk
 wrote:
>
> He also created the Canon Cat.
>
> His idea of a user interface included that the program should KNOW
> (assume) what the user wanted to do.

One of my heroes.

I've never used a Cat or his other software UIs, but the demos I've
seen are enough to make me wonder at how much we have lost already,
and secondarily, if it would be possible to code up a Raskin-style
editor in Emacs.

It's about the only editor I know that's smart enough and programmable
enough. Unfortunately, I also find it horrible to use and don't know
how to do this.

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Fred Cisin via cctalk

Why not a language even more self-documenting than COBOL, wherein the main
body is text, and special markers to identify the CODE that corresponds?

On Wed, 28 Nov 2018, Sean Conner wrote:

 In the book _Programmers at Work_ there's a picture of a program Jef
Raskin [1] wrote that basically embeds BASIC into a word processor document.
[1] He started the Macintosh project at Apple.  It was later taken over
by Steve Jobs and taken in a different direction.


He also created the Canon Cat.

His idea of a user interface included that the program should KNOW 
(assume) what the user wanted to do.


I showed him WHY the OS shouldn't go ahead (without asking for 
confirmation!) and format a disk that it couldn't read.  I do not know 
whether that change got made before commercial release.
(The Cat, incidentally, was SS 512 bytes per sector, with 10 sectors per 
track.  Sometimes described as 256K (because use was primarily for 
imaging 256K of RAM), but sometimes [more accurately] as 384K)


Before his death, I almost ended up with his electric van (Subaru 600 
based).  There was not enough computational capability in it for that to 
be on-topic here.




Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread ben via cctalk

On 11/27/2018 9:11 PM, Sean Conner via cctalk wrote


   But I can still load and read circa-1968-plain-text files without issue,
on a computer that didn't even exist at the time, using tools that didn't
exist at the time.  The same can't be said for a circa-1988-Microsoft-word
file.  It requires either the software of the time, or specialized software
that understands the format.



But where do find the 1968 plain text files?
Right now I am looking for free online books on computers
and computer science books in the 1971 to 1977 year range.
a fictional example "HAL 9000 programing" AI BOOTSTRAPPING WITH A LISP
1st edition. Useful knowledge for back then. "HAL 9000 programing"
HOW the AI BOOTSTRAPS windows 1000 in HOT JAVA 2001 edition. No so 
useful for historic knowledge.


Looking to write a simple integer language as I have no floating point
yet on 1973-1974 ish paper computer design. And yes it is 18 bits and
TTL. Right now I am programing in C for what little quick and dirty 
software I have written and digging around for ideas.


It would be nice if bitsavers could have the old 1st edition books.
The latest may sell but old knowledge is being lost.
Ben.



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Sean Conner via cctalk
It was thus said that the Great Fred Cisin via cctalk once stated:
>
> >>I like the C comment example; Why do I need to call out a comment with
> >>a special sequence of letters? Why can't a comment exist as a comment?
> 
> Why not a language even more self-documenting than COBOL, wherein the main 
> body is text, and special markers to identify the CODE that corresponds?

  In the book _Programmers at Work_ there's a picture of a program Jef
Raskin [1] wrote that basically embeds BASIC into a word processor document.

  -spc

[1] He started the Macintosh project at Apple.  It was later taken over
by Steve Jobs and taken in a different direction.


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Sean Conner via cctalk
It was thus said that the Great Keelan Lightfoot via cctalk once stated:
> I'm a bit dense for weighing in on this as my first post, but what the heck.
> 
> Our problem isn't ASCII or Unicode, our problem is how we use computers.
> 
> Going back in time a bit, the first keyboards only recorded letters
> and spaces, even line breaks required manual intervention. As things
> developed, we upgraded our input capabilities a little bit (return
> keys! delete keys! arrow keys!), but then, some time before graphical
> displays came along, we stopped upgrading. We stopped increasing the
> capabilities of our input, and instead focused on kludges to make them
> do more. We created markup languages, modifier keys, and page
> description languages, all because our input devices and display
> devices lacked the ability to comprehend anything more than letters.
> Now we're in a position where we have computers with rich displays
> bolted to a keyboard that has remained unchanged for 150 years.

  Do you have anything in particular in mind?

> Unpopular opinion time: Markup languages are a kludge, relying on
> plain text to describe higher level concepts. TeX has held us back.
> It's a crutch so religiously embraced by the people that make our
> software that the concept of markup has come to be accepted "the way".
> I worked with some university students recently, who wasted a
> ridiculous amount of time learning to use LaTeX to document their
> projects. Many of them didn't even know that page layout software
> existed, they thought there was this broad valley in capabilities with
> TeX on one side, and Microsoft Word on the other. They didn't realize
> that there is a whole world of purpose built tools in between. Rather
> than working on developing and furthering our input capabilities,
> we've been focused on keeping them the same. Markup languages aren't
> the solution. They are a clumsy bridge between 150 year old input
> technology and modern display capabilities.
> 
> Bold or italic or underlined text shouldn't be a second class concept,
> they have meaning that can be lost when text is conveyed in
> circa-1868-plain-text. 

  But I can still load and read circa-1968-plain-text files without issue,
on a computer that didn't even exist at the time, using tools that didn't
exist at the time.  The same can't be said for a circa-1988-Microsoft-word
file.  It requires either the software of the time, or specialized software
that understands the format.

> I've read many letters that predate the
> invention of the typewriter, emphasis is often conveyed using
> underlines or darkened letters. We've drawn this arbitrary line in the
> sand, where only letters that can be typed on a typewriter are "text",
> Everything else is fluff that has been arbitrarily decided to convey
> no meaning. I think it's a safe argument to make that the primary
> reason we've painted ourselves into this unexpressive corner is
> because of a dogged insistence that we cling to the keyboard.

  There were conventions developed for typewriters to get around this. 
Underlining text indicated italicized text (if the typewriter didn't have
the capability---some did).

  In fact, typewriters have more flexibility than computers do even today. 
Within the restriction of a typewriter (only characters and spaces) you
could use the back-space key (which did not erase the previous
character) and re-type the same character to get a bold effect.  You could
back-space and hit the underscore to get underlined text.  You could
back-space and hit the ` key to get a grave accent, and the ' to get an
acute accent.  With a bit more fiddling with the back-space and adjusting
the paper via the platten, you could get umlauts (either via the . or '
keys).

  I think the original intent of the BS control character in ASCII was to
facilitate this behavior, but alas, nothing ever did.  Shame, it's a neat
concept.

> I like the C comment example; Why do I need to call out a comment with
> a special sequence of letters? Why can't a comment exist as a comment?

  The smart-ass answer is "because the compiler only looks at a stream of
text and needs a special marker" but I get the deeper question---is a plain
text file the only way to program?

  No.  There are other ways.  There are many attempts at so-called "visual
languages" but none of them have been used to any real extent.  Yes, there
are languages like Visual Basic or Smalltalk, but even with those, you still
type text for the computer to run.

  The only really alternative programming language I know of is Excel. 
Seriously.  That's about the closest thing you get to a comment existing as
a comment without special markers, because you don't include those as part
of the program (specifically, you will exclude those cells from the
computation least you get an error).

> Why is a comment a second class concept? When I take notes in the
> margin, I don't explicitly need to call them out as notes. This
> extends to 

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Chuck Guzis via cctalk
On 11/27/18 6:23 PM, Fred Cisin via cctalk wrote:

> I love the use of an arrow for assignment.  In teaching, a student's
> FIRST encounter with programming can be daunting.  Use of an equal sign
> immediately runs up against the long in-grained concept of commutative
> equality.  You would be surprised how many first time students try to
> say 3 = X .  Then, of course,
> N = 1
> N = N + 1
> is a mathematical "proof by induction" that all numbers are equal!
> (Don't let a mathematician see that, or the universe will cease to
> exist, and be replaced by something even more inexplicable!)

It's worth noting that in 1963 ASCII, hex 5E was the up-arrow (now the
circumflex) and hex 5F was the left-arrow (now underline).

It's also worth nothing that in the original CDC 6-bit display code,
there were symbols, not only for left-to-right arrow, but not equals,
logical OR and AND, up- and down-arrow, equivalence, logical NOT,
less-than-or-equal, and greater-than-or equal--pretty much the original
Algool-60 special characters.

--Chuck



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Sean Conner via cctalk
It was thus said that the Great Grant Taylor via cctalk once stated:
> On 11/27/2018 04:43 PM, Keelan Lightfoot via cctalk wrote:
> >
> >Unpopular opinion time: Markup languages are a kludge, relying on plain 
> >text to describe higher level concepts.
> 
> I agree that markup languages are a kludge.  But I don't know that they 
> require plain text to describe higher level concepts.
> 
> I see no reason that we can't have new control codes to convey new 
> concepts if they are needed.
> 
> Aside:  ASCII did what it needed to do at the time.  Times are different 
> now.  We may need more / new / different control codes.
> 
> By control codes, I'm meaning a specific binary sequence that means a 
> specific thing.  I think it needs to be standardized to be compatible 
> with other things -or- it needs to be considered local and proprietary 
> to an application.

  [ snip ]

> I don't think of bold or italic or underline as second class concepts. 
> I tend to think of the following attributes that can be applied to text:
> 
>  · bold
>  · italic
>  · overline
>  · strike through
>  · underline
>  · superscript exclusive or subscript
>  · uppercase exclusive or lowercase
>  · opposing case
>  · normal (none of the above)

  But there are defined control codes for that (or most of that list
anyway).  It's not ANSI, but an ISO standard.  Let's see ... 

^[[1m bold
^[[3m italic
^[[53m overline
^[[9m strike through
^[[4m underline
^[[0m normal

  The superscript/subscribe could be done via another font

^[[11m ... ^[[19m

  Maybe even the opposing case case ... um ... yeah.

  By the way, ^[ is a single character representing the ASCII ESC character
(27).  

> I see no reason that the keyboard can't have keys / glyphs added to it.
> 
> I'm personally contemplating adding additional keys (via an add on 
> keyboard) that are programmed to produce additional symbols.  I 
> frequently use the following symbols and wish I had keys for easier 
> access to them:  ≈, ·, ¢, ©, °, …, —, ≥, ∞, ‽, ≤, µ, 
> ≠, Ω, ½, ¼, ⅓, ¶, ±, ®, §, ¾, ™, ⅔, ¿, ⊕.

  Years ago I came across an IBM Model M keyboard that had the APL character
set on the keyboard, along with the normal characters one finds.  I would
have bought it on the spot if it weren't for a friend of mine who saw it 10
seconds before I did.

  I did recently get another IBM Model M keyboard (an SSK model) that had
additional labels on the keys:

http://boston.conman.org/2018/10/31.2

The nice thing about the IBM Model M is the keycaps are easy to replace.

> I will concede that many computers and / or programming languages do 
> behave based on text.  But I am fairly confident that there are some 
> programming languages (I don't know about computers) that work 
> differently.  Specifically, simple objects are included as part of the 
> language and then more complex objects are built using the simpler 
> objects.  Dia and (what I understand of) Minecraft come to mind.

  You might be thinking of Smalltalk.

  -spc



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Fred Cisin via cctalk

I have long wondered if there are computer languages that aren't rooted
in English / ASCII.  I feel like it's rather pompous to assume that all
programming languages are rooted in English / ASCII.  I would hope that
there are programming languages that are more specific to the region of
the world they were developed in.  As such, I would expect that they
would be stored in something other than ASCII.


On Tue, 27 Nov 2018, William Donzelli via cctalk wrote:

APL.


APL requires adding additional characters.  That was a major obstacle to 
acceptance, both in terms of keyboard and type ball (my use preceded CRT), 
but also asking the user/programmer to learn new characters.  I loved APL!


I love the use of an arrow for assignment.  In teaching, a student's FIRST 
encounter with programming can be daunting.  Use of an equal sign 
immediately runs up against the long in-grained concept of commutative 
equality.  You would be surprised how many first time students try to say 
3 = X .  Then, of course,

N = 1
N = N + 1
is a mathematical "proof by induction" that all numbers are equal!
(Don't let a mathematician see that, or the universe will cease to 
exist, and be replaced by something even more inexplicable!)


Even the archaic keyword "LET" in BASIC helped clarify that.

We tend to be dismissive of such problems, declaring that 
students "need to LEARN the right way".



I remember a cartoon in a publication, that might have been Interface Age,
where an archeologist looking at hieroglyphics says that it looks like a 
subset of APL.



But, I think that the comment was more in regards to programming by 
non-English speaking programmers.  While FORTRAN, COBOL, BASIC can be 
almost trivially adapted to Spanish, Italian, German, etc.,

What about Chinese? Japanese?
Yes, there IS a Chinese COBOL!
But, THOSE programmers essentially have to learn English before they can 
program!
Surely a Chinese or Japanese based programming language could be 
developed.



--
Grumpy Ol' Fred ci...@xenosoft.com




Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Toby Thain via cctalk
On 2018-11-27 8:33 PM, Grant Taylor via cctalk wrote:
> ...
>> Bold or italic or underlined text shouldn't be a second class concept,
>> they have meaning that can be lost when text is conveyed in
>> circa-1868-plain-text. I've read many letters that predate the
>> invention of the typewriter, emphasis is often conveyed using
>> underlines or darkened letters.
> 
> I don't think of bold or italic or underline as second class concepts. I
> tend to think of the following attributes that can be applied to text:
> 
>  · bold
>  · italic
>  · overline
>  · strike through
>  · underline
>  · superscript exclusive or subscript
>  · uppercase exclusive or lowercase
>  · opposing case
>  · normal (none of the above)
> 

This covers only a small fraction of the Latin-centric typographic
palette - much of which has existed for 500 years in print (non-Latin
much older). Computerisation has only impoverished that palette, and
this is how it happens: Checklists instead of research.

Work with typographers when trying to represent typography in a
computer. The late Hermann Zapf was Knuth's close friend. That's the
kind of expertise you need on your team.

--Toby


> I don't think that normal is superior to the other four (five) in any
> way.  I do think that normal does occur VASTLY more frequently than the
> any combination of the others.  As such normal is what things default to
> as an optimization.  IMHO that optimization does not relegate the other
> styles to second class.
> ...





Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Grant Taylor via cctalk

On 11/27/2018 04:43 PM, Keelan Lightfoot via cctalk wrote:
I'm a bit dense for weighing in on this as my first post, but what 
the heck.


Welcome.  :-)


Our problem isn't ASCII or Unicode, our problem is how we use computers.


Okay.

Going back in time a bit, the first keyboards only recorded letters 
and spaces, even line breaks required manual intervention. As things 
developed, we upgraded our input capabilities a little bit (return 
keys! delete keys! arrow keys!), but then, some time before graphical 
displays came along, we stopped upgrading. We stopped increasing the 
capabilities of our input, and instead focused on kludges to make them do 
more.


Do you think that we stopped enhancing the user input experience more 
because we were content with what we had or because we didn't see a 
better way to do what we wanted to do?


We created markup languages, modifier keys, and page description 
languages, all because our input devices and display devices lacked 
the ability to comprehend anything more than letters.  Now we're in a 
position where we have computers with rich displays bolted to a keyboard 
that has remained unchanged for 150 years.


Hum

Unpopular opinion time: Markup languages are a kludge, relying on plain 
text to describe higher level concepts.


I agree that markup languages are a kludge.  But I don't know that they 
require plain text to describe higher level concepts.


I see no reason that we can't have new control codes to convey new 
concepts if they are needed.


Aside:  ASCII did what it needed to do at the time.  Times are different 
now.  We may need more / new / different control codes.


By control codes, I'm meaning a specific binary sequence that means a 
specific thing.  I think it needs to be standardized to be compatible 
with other things -or- it needs to be considered local and proprietary 
to an application.


TeX has held us back.  It's a crutch so religiously embraced by the 
people that make our software that the concept of markup has come to be 
accepted "the way".  I worked with some university students recently, 
who wasted a ridiculous amount of time learning to use LaTeX to document 
their projects. Many of them didn't even know that page layout software 
existed, they thought there was this broad valley in capabilities with 
TeX on one side, and Microsoft Word on the other. They didn't realize 
that there is a whole world of purpose built tools in between.


I actually wonder how much need there is for /all/ of those utilities. 
I expect that things should have streamlined and simplified, at least 
some, in the last 30 years.


Rather than working on developing and furthering our input capabilities, 
we've been focused on keeping them the same. Markup languages aren't the 
solution. They are a clumsy bridge between 150 year old input technology 
and modern display capabilities.


What would you like to do or see done differently?  Even if it turns out 
to be worse, it would still be something different and likely worth 
trying at least once.


Bold or italic or underlined text shouldn't be a second class 
concept, they have meaning that can be lost when text is conveyed in 
circa-1868-plain-text. I've read many letters that predate the invention 
of the typewriter, emphasis is often conveyed using underlines or darkened 
letters.


I don't think of bold or italic or underline as second class concepts. 
I tend to think of the following attributes that can be applied to text:


 · bold
 · italic
 · overline
 · strike through
 · underline
 · superscript exclusive or subscript
 · uppercase exclusive or lowercase
 · opposing case
 · normal (none of the above)

I don't think that normal is superior to the other four (five) in any 
way.  I do think that normal does occur VASTLY more frequently than the 
any combination of the others.  As such normal is what things default to 
as an optimization.  IMHO that optimization does not relegate the other 
styles to second class.


We've drawn this arbitrary line in the sand, where only letters that 
can be typed on a typewriter are "text", Everything else is fluff that 
has been arbitrarily decided to convey no meaning.


I don't agree that the decision was made (by most people).  At least not 
consciously.


I will say that some people probably decided what a minimum viable 
product is when selling typewriters, and consciously chose to omit the 
other options.


I think it's a safe argument to make that the primary reason we've 
painted ourselves into this unexpressive corner is because of a dogged 
insistence that we cling to the keyboard.


I see no reason that the keyboard can't have keys / glyphs added to it.

I'm personally contemplating adding additional keys (via an add on 
keyboard) that are programmed to produce additional symbols.  I 
frequently use the following symbols and wish I had keys for easier 
access to them:  ≈, ·, ¢, ©, °, …, —, ≥, ∞, ‽, ≤, µ, ≠, Ω, ½, ¼, ⅓, ¶, 
±, ®, §, ¾, ™, ⅔, ¿, ⊕.



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Keelan Lightfoot via cctalk
I'm a bit dense for weighing in on this as my first post, but what the heck.

Our problem isn't ASCII or Unicode, our problem is how we use computers.

Going back in time a bit, the first keyboards only recorded letters
and spaces, even line breaks required manual intervention. As things
developed, we upgraded our input capabilities a little bit (return
keys! delete keys! arrow keys!), but then, some time before graphical
displays came along, we stopped upgrading. We stopped increasing the
capabilities of our input, and instead focused on kludges to make them
do more. We created markup languages, modifier keys, and page
description languages, all because our input devices and display
devices lacked the ability to comprehend anything more than letters.
Now we're in a position where we have computers with rich displays
bolted to a keyboard that has remained unchanged for 150 years.

Unpopular opinion time: Markup languages are a kludge, relying on
plain text to describe higher level concepts. TeX has held us back.
It's a crutch so religiously embraced by the people that make our
software that the concept of markup has come to be accepted "the way".
I worked with some university students recently, who wasted a
ridiculous amount of time learning to use LaTeX to document their
projects. Many of them didn't even know that page layout software
existed, they thought there was this broad valley in capabilities with
TeX on one side, and Microsoft Word on the other. They didn't realize
that there is a whole world of purpose built tools in between. Rather
than working on developing and furthering our input capabilities,
we've been focused on keeping them the same. Markup languages aren't
the solution. They are a clumsy bridge between 150 year old input
technology and modern display capabilities.

Bold or italic or underlined text shouldn't be a second class concept,
they have meaning that can be lost when text is conveyed in
circa-1868-plain-text. I've read many letters that predate the
invention of the typewriter, emphasis is often conveyed using
underlines or darkened letters. We've drawn this arbitrary line in the
sand, where only letters that can be typed on a typewriter are "text",
Everything else is fluff that has been arbitrarily decided to convey
no meaning. I think it's a safe argument to make that the primary
reason we've painted ourselves into this unexpressive corner is
because of a dogged insistence that we cling to the keyboard.

I like the C comment example; Why do I need to call out a comment with
a special sequence of letters? Why can't a comment exist as a comment?
Why is a comment a second class concept? When I take notes in the
margin, I don't explicitly need to call them out as notes. This
extends to strings, why do I need to use quotes? I know it's a string
why can't the computer remember that too? Why do I have to use the
capabilities of a typewriter to describe that to the computer? There
seems to be confusion that computers are inherently text based. They
are only that way because we program them and use them that way, and
because we've done it the same way since the day of the teletype, and
it's _how it's done._

"Classic" Macs are a great example of breaking this pattern. There was
no way to force the computer into a text mode of operating, it didn't
exist. Right down to the core the operating system was graphical. When
you click an icon, the computer doesn't issue a text command, it
doesn't call a function by name, it merely alters the flow of some
binary stuff flowing through the CPU in response to some other bits
changing. Yes, the program describing that was written in text, but
that text is not what the computer is interpreting.

I'm getting a bit philosophical, so I'll shut up now, but it's an
interesting discussion.

- Keelan


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread ben via cctalk

On 11/27/2018 12:47 PM, Grant Taylor via cctalk wrote:

ASCII is a common way of encoding characters and control codes in the 
same binary pattern.


File formats are what collections of ASCII characters / control codes 
mean / do.


It also was designed for hard copy. Over strikes don't work well on a 
CRT screen.

Ben.




Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread William Donzelli via cctalk
> I have long wondered if there are computer languages that aren't rooted
> in English / ASCII.  I feel like it's rather pompous to assume that all
> programming languages are rooted in English / ASCII.  I would hope that
> there are programming languages that are more specific to the region of
> the world they were developed in.  As such, I would expect that they
> would be stored in something other than ASCII.

APL.

--
Will


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Grant Taylor via cctalk

On 11/27/2018 03:05 AM, Guy Dunphy wrote:
It was a core of the underlying philosophy, that html would NOT allow any 
kind of fixed formatting. The reasoning was that it could be displayed 
on any kind of system, so had to be free-format and quite abstract.


That's one of the reasons that I like HTML as much as I do.

Which is great, until you actually want to represent a real printed page, 
or book. Like Postscript can. Thus html was doomed to be inadequate for 
capture of printed works.


I feel like trying to accurately represent fixed page layout in HTML is 
a questionable idea.  I would think that it would be better to use a 
different type of file.


That was a disaster. There wasn't any real reason it could not be 
both. Just an academic's insistense on enforcing his ideology.  Then of 
course, over time html has morphed to include SOME forms of absolute 
layout, because there was a real demand for that. But the result is 
a hodge-podge.


I don't think that HTML can reproduce fixed page layout like PostScript 
and PDF can.  It can make a close approximation.  But I don't think HTML 
can get there.  Nor do I think it should.



Yes, it should be capable of that. But not enforce 'only that way'.


I question if people are choosing to use HTML to store documentation 
because it's so popular and then getting upset when they want to do 
things that HTML is not meant to do.  Or in some cases is actually meant 
to /not/ to.


Use the tool for the job.  Don't alter the wrong tool for your 
particular job.


IMHO true page layout doesn't belong in HTML.  Loosely laying out the 
same content in approximately the same layout is okay.


By 'html' I mean the kludge of html-css-js. The three-cat herd. (Ignoring 
all the _other_ web cats.)  Now it's way too late to fix it properly 
with patches.


I don't agree with that.  HTML (and XML) has markup that can be used, 
and changed, to define how the HTML is meant to be interpreted.


The fact that people don't do so correctly is mostly independent of the 
fact that it has the ability.  I say mostly because there is some small 
amount of wiggle room for discussion of does the functionality actually 
work or not.


I meant there's no point trying to determine why they were so deluded, 
and failed to recognise that maybe some users (Ed) would want to just 
type two spaces.


I /do/ believe that there /is/ a point in trying to understand why 
someone did what they did.



now 'we' (the world) are stuck with it for legacy compatibility reasons.


Our need to be able to read it does not translate to our need to 
continue to use it.



Any extensions have to be retro-compatible.


I disagree.

I see zero reason why we couldn't come up with something new and 
completely different.


Granted, there should be ways to translate from one to the other.  Much 
like how ASCII and EBCDIC are still in use today.


What I'm talking about is not that. It's about how to create a coding 
scheme that serves ALL the needs we are now aware of. (Just one of 
which is for old ASCII files to still make sense.) This involves both 
re-definition of some of the ASCII control codes, AND defining sequential 
structure standards.  For eg UTF-8 is a sequential structure. So are 
all the html and css codings, all programming languages, etc. There's a 
continuum of encoding...structure...syntax.  The ASCII standard didn't 
really consider that continuum.


I don't think that ASCII was even trying to answer / solve the problems 
that you're talking about.


ASCII was a solution for a different problem for a different time.

There is no reason we can't move on to something else.


Which exceptions would those be? (That weren't built on top of ASCII!)


It is subject to the meaning of "back tot he roots" and not worth taking 
more time.


I assume you're thinking that ASCII serves just fine for program source 
code?


I'm not personally aware of any cases where ASCII limits programming 
languages.  But my ignorance does not preclude that situation from existing.


I do believe that there are a number of niche programming languages (if 
you will) that store things as binary data (I'm thinking PLCs and the 
likes) but occasionally have said data represented (as a hexadecimal 
dump) in ASCII.  But the fact that ASCII can or can't easily display the 
data is immaterial to the system being programmed.


I have long wondered if there are computer languages that aren't rooted 
in English / ASCII.  I feel like it's rather pompous to assume that all 
programming languages are rooted in English / ASCII.  I would hope that 
there are programming languages that are more specific to the region of 
the world they were developed in.  As such, I would expect that they 
would be stored in something other than ASCII.


Could the sequence of bytes be displayed as ASCII?  Sure.  Would it make 
much sense?  Not likely.


This is a bandwagon/normalcy bias effect. "Everyone does it that way 
and always has, so it must be 

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Peter Corlett via cctalk
On Tue, Nov 27, 2018 at 01:21:52AM +1100, Guy Dunphy via cctalk wrote:
[...]
> Oh yes, tell me about the html 'there is no such thing as hard formatting and
> you can't have any even when you want it' concept. Thank you Tim Berners Lee.

Sure you can! Pick one of:

a) If you're not using HTML features, don't bother wrapping the text in a HTML
   document. Just serve up a bog standard text/plain document with all of your
   favourite ASCII art and hard formatting as you please.

b) Go old-school and use HTML .

c) Go lah-di-dah new-school and use the CSS white-space: property to fine-tune
   the exact formatting behaviour of  you desire.



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Liam Proven via cctalk
On Mon, 26 Nov 2018 at 15:21, Guy Dunphy via cctalk
 wrote:

> Defects in the ASCII code table. This was a great improvement at the time, 
> but fails to implement several utterly essential concepts. The lack of these 
> concepts in the character coding scheme underlying virtually all information 
> processing since the 1960s, was unfortunate. Just one (of many) bad 
> consequences has been the proliferation of 'patch-up' text coding schemes 
> such as proprietry document formats (MS Word for eg), postscript, pdf, html 
> (and its even more nutty academia-gone-mad variants like XML), UTF-8, unicode 
> and so on.

This is fascinating stuff and I am very interested to see how it comes
out, but I think there is a problem here which I wanted to highlight.

The thing is this. You seem to be discussing what you perceive as
_general_ defects in ASCII, but they are I think not _general_
defects. They are specific to your purpose, and I don't know what that
is exactly, but I have a feeling it is not a general overall universal
goal.

Just consider what "A.S.C.I.I." stands for.

[1] it's American. Yes it has lots of issues internationally, but it
does the job well for American English. As a native English speaker I
rue the absence of £ but the fact that Americans as so unfamiliar with
the symbol that they even appropriate its name for the unrelated #
which already had a perfectly good name of its own, but ASCII is
American and Americans don't use £. Fine.

[2] The "I.I." bit. Historical accidents aside, vestigial traces of
specific obsolete hardware implementations, it's _not a markup
language_. Its function is unrelated to those of HTML or XML or
anything like that. It's for "information interchange". That means
from computer or program to other computer or program. It's an
encoding and that's all. We needed a standard one. We got it. It has
flaws, many flaws, but it worked.

No it doesn't contain æ and å and ä and ø and ö. That's a problem for
Scandinavians.

It doesn't contain š and č and ṡ and ý (among others) and that's a
problem for Roman-alphabet-using Slavs.

Even broadening the discussion to 8-bit ANSI...

It does have a very poor way of encoding é and à and so on, which
indicates the relative importance of Latin-language users in the
Americas, compared to Slavs and so on.

But markup languages, formatting, control signalling, all that sort of
stuff is a separate discussion to encoding standards.

Attempt to bring them into encoding systems and the problem explodes
in complexity and becomes insoluble.

Additionally, it also makes a bit of a mockery of OSes focussed on raw
text streams, such as Unix, and whereas I am no great lover of Unix,
it does provide me with a job, and less headaches than Windows.

So, overall, all I wanted to say was: identify the problem domain
specifically and how to separate that from over, *overlapping* domains
before attacking ASCII for weaknesses that are not actually weaknesses
at all but indeed strengths for a lot of its use-cases.

Saying that,  I'd really like to read more about this project. It
looks like it peripherally intersects with one of my own big ones.

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Liam Proven via cctalk
On Mon, 26 Nov 2018 at 23:39, Christian Gauger-Cosgrove
 wrote:
>
> On Mon, 26 Nov 2018 at 03:44, Liam Proven via cctalk
>  wrote:
> > If it's in Roman, Cyrillic, or Greek, they're alphabets, so it's a letter.
> >
> Correct, Latin, Greek, and Cyrillic are alphabets, so each
> letter/character can be a consonant or vowel.
>
> > I can't read Arabic or Hebrew but I believe they're alphabets too.
> >
> Hebrew, Arabic, Syriac, Punic, Aramaic, Ugaritic, et cetera are
> abjads, meaning that each character represents a consonant sound,
> vowel sounds are either derived from context and knowledge of the
> language, or can be added in via diacritics.
>
> Devanagari and Thai (and Tibetan, Khmer, Sudanese, Balinese...) are
> abugidas, where each character is a consonant-vowel pair, with the
> "base" character being one particular vowel sound, and alternates
> being indicated by modifications (example in Devanagari: "क" is "ka",
> while "कि" is "ki"; another example using Canadian Aboriginal
> Syllabics "ᕓ" is "vai" whereas "ᕗ" is "vu").
>
> > I don't know anything about any Asian scripts except a tiny bit of
> > Japanese and Chinese, and they get called different things, but
> > "character" is probably most common.
> >
> Japanese actually uses three different scripts. Chinese characters
> (the kanji script of Japanese, and the hanja script of Korean) are
> logograms.
>
> Japanese also has two syllabic scripts, katakana and hiragana where
> each character represents a specific consonant vowel pair.
>
> Korean hangul (or if you happen to be from the DPRK, chosŏn'gŭl) is a
> mix of alphabet and syllabary, where individual characters consist of
> sub parts stacked in a specific pattern. Stealing Wikipedia's example,
> "kkulbeol" is written as "꿀벌", not the individual parts "ㄲㅜㄹㅂㅓㄹ".
>
>
> And now for even more fun, Egyptian hieroglyphics and cuneiform (which
> started with Sumerian, and then used by the Assyrians/Babylonians and
> others) are a delightful mix of logographic, syllabic and alphabetic
> characters. Because while China loathes you, Babylon has a truly deep
> hatred of you and wishes to revel in your suffering.

Um. Yes. Thank you for that. Very informative, interesting, and I did
actually know most of it already but maybe others didn't.

The thing is that it's not actually very germane to the question I was
addressing, which was "what do you call the individual units in
different scripts?" I.e. "letter" vs "glyph" vs "character" vs
"ideogram" vs "grapheme", etc... :-)

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Fred Cisin via cctalk
Oh yes, tell me about the html 'there is no such thing as hard formatting 
and you can't have any even when you want it' concept. Thank you Tim 
Berners Lee.

I've not delved too deeply into the lack of hard formatting in HTML.


The HTML   . . .   tag helps a bit.
Before I found THAT, I was having serious difficulties with too much of 
what I tried to do with HTML.


Obvious examples include ASCII art, but also program source code.
I should NOT have to create a "table" for that, nor have difficulty having 
a string literal in code that contains varying numbers of space 
characters!


For some reason, a few decades ago, I had substantial difficulty finding 
out about the existence of the  tag, and at that time, did not find 
the  tag, nor CSS.   Now, it seems to be pretty easy to find.



--
Grumpy Ol' Fred ci...@xenosoft.com



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Grant Taylor via cctalk

On 11/26/18 7:21 AM, Guy Dunphy wrote:
I was speaking poetically. Perhaps "the mail software he uses was written 
by morons" is clearer.


;-)

Oh yes, tell me about the html 'there is no such thing 
as hard formatting and you can't have any even when 
you want it' concept. Thank you Tim Berners Lee.


I've not delved too deeply into the lack of hard formatting in HTML.

I've also always considered HTML to be what you want displayed, with 
minimal information about how you want it displayed.  IMHO CSS helps 
significantly with the latter part.



http://everist.org/NobLog/20130904_Retarded_ideas_in_comp_sci.htm
http://everist.org/NobLog/20140427_p-term_is_retarded.htm


Intriguing.  $readingList++.

Except that 'non-breaking space' is mostly about inhibiting line wrap at 
that word gap.


I wouldn't have thought "mostly" or "inhibiting line wrap".  I view the 
non-breaking space as a way to glue two parts of text together and treat 
them as one unit, particularly for display and partially for selection.


Granted, much of the breaking is done when the text can not continue (in 
it's natural direction), frequently needing to start anew on the next line.


But anyway, there's little point trying to psychoanalyze the writers of 
that software. Probably involved pointy-headed bosses.


I like to understand why things have been done the way they were. 
Hopefully I can learn from the reasons.


Of course not. It was for American English only. This is one of the 
major points of failure in the history of information processing.


Looking backwards, (I think) I can understand why you say that.  But 
based on my (possibly limited) understanding of the time, I think that 
ASCII was one of the primordial building blocks that was necessary.  It 
was a standard (one of many emerging standards of the time) that allowed 
computers from different manufacturers interoperate and represent 
characters with the same binary pattern.  Something that we now (mostly) 
take for granted and something that could not be assured at the time or 
before.


Containing extended Unicode character sets via UTF-8, doesn't make it a 
non-hard-formatted medium. In ASCII a space is a space, and multi-spaces 
DON'T collapse. White space collapse is a feature of html, and whether 
an email is html or not is determined by the sending utility.


Having read the rest of your email and now replying, I feel that we may 
be talking about two different things.  One being ASCII's standard 
definition of how to represent different letters / glyphs in a 
consistent binary pattern.  The other being how information is stored in 
an (un)structured sequence of ASCII characters.


As you see, this IS NOT HTML, since those extra spaces and your diagram 
below would have collapsed if it was html. Also saving it as text and 
opening in a plain text ed or hex editor absolutely reveals what it is.


I feel it is important to acknowledge your point and to state that I'm 
moving on.


Hmm... the problem is it's intended to be serious, but is still far from 
exposure-ready.  So if I talk about it now, I risk having specific terms 
I've coined in the doco (including the project name) getting meme-jammed 
or trademarked by others. The plan is to release it all in one go, 
eventually. Definitely will be years before that happens, if ever.


Fair enough.

However, here's a cut-n-paste (in plain text) of a section of the 
Introduction (html with diags.)


ACK


--

Almost always, a first attempt at some unfamiliar, complex task produces 
a less than optimal result. Only with the knowledge gained from actually 
doing a new thing, can one look back and see the mistakes made. It usually 
takes at least one more cycle of doing it over from scratch to produce 
something that is optimal for the needs of the situation. Sometimes, 
especially where deep and subtle conceptual innovations are involved, 
it takes many iterations.


Part way through the first large (for me at the time) project that I 
worked on, I decided that the project (and likely others) needed three 
versions before being production ready:


1)  First whack at solving the problem.  LOTS about the problem is 
learned, including the true requirements and the unknown dependencies 
along the say.  This will not be the final shipping version.  -  Think 
of this as the Alpha release.
2)  This is a complete re-write of the project based on what was learned 
in #1.  -  Think of this as the Beta release.
3)  This is less of a re-write and more of a bug fix for version 2.  - 
Think of this as the shipping release.


Human development of computing science (including information coding 
schemes) has been effectively a 'first time effort', since we kept on 
developing new stuff built on top of earlier work. We almost never went 
back to the roots and rebuilt everything, applying insights gained from 
the many mistakes made.


With few notable (partial) exceptions, I largely agree.

In reviewing the evolution of 

A modest side project : redefining text encoding (Was: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Fred Cisin via cctalk

On Tue, 27 Nov 2018, Guy Dunphy via cctalk wrote:

Hmm... the problem is it's intended to be serious, but is still far from 
exposure-ready.
So if I talk about it now, I risk having specific terms I've coined in the doco 
(including
the project name) getting meme-jammed or trademarked by others. The plan is to 
release it
all in one go, eventually. Definitely will be years before that happens, if 
ever.
However, here's a cut-n-paste (in plain text) of a section of the 
Introduction (html with diags.)


Without pushing too hard to get you to reveal more than you are 
comfortable with, I really like what you wrote, and hope that someday we 
can participate in some aspects.



I would like to see some acknowledgement that some things are truly flaws 
in the original design, whereas some others are ideas for further 
expansion and enhancement.


It's probably not going to be possible to objectively differentiate which 
are which.



And, as typified by Intel X86 V Motorola 68000, incremental kludges 
permit compatability and trivial ease of migration, whereas a design from 
scratch permits correcting aspects that would otherwise be stuck, at the 
expense of massive software re-creation.




Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Christian Gauger-Cosgrove via cctalk
On Mon, 26 Nov 2018 at 03:44, Liam Proven via cctalk
 wrote:
> If it's in Roman, Cyrillic, or Greek, they're alphabets, so it's a letter.
>
Correct, Latin, Greek, and Cyrillic are alphabets, so each
letter/character can be a consonant or vowel.

> I can't read Arabic or Hebrew but I believe they're alphabets too.
>
Hebrew, Arabic, Syriac, Punic, Aramaic, Ugaritic, et cetera are
abjads, meaning that each character represents a consonant sound,
vowel sounds are either derived from context and knowledge of the
language, or can be added in via diacritics.

Devanagari and Thai (and Tibetan, Khmer, Sudanese, Balinese...) are
abugidas, where each character is a consonant-vowel pair, with the
"base" character being one particular vowel sound, and alternates
being indicated by modifications (example in Devanagari: "क" is "ka",
while "कि" is "ki"; another example using Canadian Aboriginal
Syllabics "ᕓ" is "vai" whereas "ᕗ" is "vu").

> I don't know anything about any Asian scripts except a tiny bit of
> Japanese and Chinese, and they get called different things, but
> "character" is probably most common.
>
Japanese actually uses three different scripts. Chinese characters
(the kanji script of Japanese, and the hanja script of Korean) are
logograms.

Japanese also has two syllabic scripts, katakana and hiragana where
each character represents a specific consonant vowel pair.

Korean hangul (or if you happen to be from the DPRK, chosŏn'gŭl) is a
mix of alphabet and syllabary, where individual characters consist of
sub parts stacked in a specific pattern. Stealing Wikipedia's example,
"kkulbeol" is written as "꿀벌", not the individual parts "ㄲㅜㄹㅂㅓㄹ".


And now for even more fun, Egyptian hieroglyphics and cuneiform (which
started with Sumerian, and then used by the Assyrians/Babylonians and
others) are a delightful mix of logographic, syllabic and alphabetic
characters. Because while China loathes you, Babylon has a truly deep
hatred of you and wishes to revel in your suffering.


Regards,
Christian
-- 
Christian M. Gauger-Cosgrove
STCKON08DS0
Contact information available upon request.


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread ben via cctalk

On 11/26/2018 9:26 AM, Charles Anthony via cctalk wrote:

On Mon, Nov 26, 2018 at 4:28 AM Peter Corlett via cctalk <
cctalk@classiccmp.org> wrote:


On Sun, Nov 25, 2018 at 07:59:13PM -0800, Fred Cisin via cctalk wrote:
[...]

Alas, "current" computers use 8, 16, 32. They totally fail to understand

the

intrinsic benefits of 9, 12, 18, 24, and 36 bits.


Oh go on then, I'm curious. What are the benefits? Is it just that there
are
useful prime factors for bit-packing hacks? And if so, why not 30?



As I understand it, 36 bits was used as it could represent a signed 10
digit decimal number in binary; the  Frieden 10 digit calculator was the
"gold standard" of banking and financial institutions, so to compete in
that market, you computer had to be able to match the arithmetic standards.

-- Charles



I say 20 bits needs to be used more often.
Did anything really use all the control codes in ASCII?
Back then you got what the TTY printed.
Did any one ever come up a a character set for ALGOL?
Ben.





Re: Windows Accessibility Settings. RE: George Keremedjiev

2018-11-26 Thread Grant Taylor via cctalk

On 11/26/2018 02:53 PM, Dave Wade via cctalk wrote:
Just in case anyone isn't aware, and who gets duplicate characters input 
because they have some un-steadiness, and are using a Windows/10 PC 
(I think 7 as well) there are some options in in the "Ease of Access" 
settings "Filter Keys" settings => "bounce keys" that may help with 
your typing. These set a configurable delay that will ignore repeated 
keypresses for a very short period of time.  The default is 0.5 of 
second but its configurable. You need to enable "Filter Keys" to see the 
"Bounce Keys" option. There is also a "slow keys" option.


I've found that there are a number of features that land under 
accessibility / ease of access settings that can make the computer quite 
a bit nicer.


So, if you've ever thought that "I don't need anything under 
'Accessibility' or 'Ease of Access' settings." you may be missing out. 
Go check.


I'm extensively using these assistants for a number of things, not the 
least of which is I'm lazy and I want my iPhone to auto-correct scsi to 
SCSI, or pppoe to PPPoE, or shruggie to ¯\_(ツ)_/¯, or ... to …, or … or 
… or


I hope this helps, and I am sorry if you knew this already and it 
doesn't 


I think it's always good to share neat ~> helpful features with others. 
Especially if it's done in the positive sense of "this is really cool" 
and not negative "oh, you need some help, go look here."




--
Grant. . . .
unix || die


Windows Accessibility Settings. RE: George Keremedjiev

2018-11-26 Thread Dave Wade via cctalk
Just in case anyone isn't aware, and who gets duplicate characters input 
because they have some un-steadiness, and are using a Windows/10 PC (I think 7 
as well) there are some options in 
in the "Ease of Access" settings "Filter Keys" settings => "bounce keys" that 
may help with your typing. These set a configurable delay that will ignore 
repeated keypresses for a very short period of time.
The default is 0.5 of second but its configurable. You need to enable "Filter 
Keys" to see the "Bounce Keys" option. There is also a "slow keys" option. 

I hope this helps, and I am sorry if you knew this already and it doesn't 

Dave
G4UGM

> -Original Message-
> From: cctalk  On Behalf Of ED SHARPE via
> cctalk
> Sent: 26 November 2018 18:30
> To: lpro...@gmail.com; cctalk@classiccmp.org
> Subject: Re: George Keremedjiev
> 
> i use email i  use and suggest   you   use a delete key.  no  loss no  gain...
> 
> 
> In a message dated 11/26/2018 11:16:07 AM US Mountain Standard Time,
> lpro...@gmail.com writes:
> 
> 
> On Mon, 26 Nov 2018 at 17:54, ED SHARPE < couryho...@aol.com> wrote:
> >
> > pay attention it us,probaby my hand which adds,Xtra spaces as stated
> > before, please feel free to use the delete key
> 
> Are you saying that you have motor control problems, such as Parkinson's
> Disease or something? If so, I am really sorry -- but you have never said that
> before, to my recollection.
> 
> But you have never commented to anyone who has asked why you don't
> switch to a proper local email client, which would fix the quoting and so on.
> Do you not have access to your own computer, or something? If so I am sure
> someone could give you a machine, if that would help...
> 
> --
> Liam Proven - Profile: https://about.me/liamproven
> Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
> Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
> UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053



Re: George Keremedjiev

2018-11-26 Thread ED SHARPE via cctalk
i use email i  use and suggest   you   use a delete key.  no  loss no  gain...


In a message dated 11/26/2018 11:16:07 AM US Mountain Standard Time, 
lpro...@gmail.com writes:

 
On Mon, 26 Nov 2018 at 17:54, ED SHARPE <
couryho...@aol.com> wrote:
>
> pay attention it us,probaby my hand which adds,Xtra spaces as stated before,
> please feel free to use the delete key

Are you saying that you have motor control problems, such as
Parkinson's Disease or something? If so, I am really sorry -- but you
have never said that before, to my recollection.

But you have never commented to anyone who has asked why you don't
switch to a proper local email client, which would fix the quoting and
so on. Do you not have access to your own computer, or something? If
so I am sure someone could give you a machine, if that would help...

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Charles Anthony via cctalk
On Mon, Nov 26, 2018 at 4:28 AM Peter Corlett via cctalk <
cctalk@classiccmp.org> wrote:

> On Sun, Nov 25, 2018 at 07:59:13PM -0800, Fred Cisin via cctalk wrote:
> [...]
> > Alas, "current" computers use 8, 16, 32. They totally fail to understand
> the
> > intrinsic benefits of 9, 12, 18, 24, and 36 bits.
>
> Oh go on then, I'm curious. What are the benefits? Is it just that there
> are
> useful prime factors for bit-packing hacks? And if so, why not 30?
>
>
As I understand it, 36 bits was used as it could represent a signed 10
digit decimal number in binary; the  Frieden 10 digit calculator was the
"gold standard" of banking and financial institutions, so to compete in
that market, you computer had to be able to match the arithmetic standards.

-- Charles

-- 
X-Clacks-Overhead: GNU Terry Pratchett


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Guy Dunphy via cctalk
At 10:52 PM 25/11/2018 -0700, you wrote:


>> Then adds a plain ASCII space 0x20 just to be sure.
>
>I don't think it's adding a plain ASCII space 0x20 just to be sure. 
>Looking at the source of the message, I see =C2=A0, which is the UTF-8 
>representation followed by the space.  My MUA that understands UTF-8 
>shows that "=C2=A0 " translates to "  ".  Further, "=C2=A0 =C2=A0" 
>translates to "   ".

I was speaking poetically. Perhaps "the mail software he uses was
written by morons" is clearer.

>Some of the reading that I did indicates that many things, HTML 
>included, use white space compaction (by default), which means that 
>multiple white space characters are reduced to a single white space 
>character.

Oh yes, tell me about the html 'there is no such thing as hard formatting
and you can't have any even when you want it' concept. Thank you Tim Berners 
Lee.
  http://everist.org/NobLog/20130904_Retarded_ideas_in_comp_sci.htm
  http://everist.org/NobLog/20140427_p-term_is_retarded.htm

>  So, when Ed wants multiple white spaces, his MUA has to do 
>something to state that two consecutive spaces can't be compacted. 
>Hence the non-breaking space.

Except that 'non-breaking space' is mostly about inhibiting line wrap at
that word gap. But anyway, there's little point trying to psychoanalyze
the writers of that software. Probably involved pointy-headed bosses.


>As stated in another reply, I don't think ASCII was ever trying to be 
>the Babel fish.  (Thank you Douglas Adams.)

Of course not. It was for American English only. This is one of the major
points of failure in the history of information processing.

>> Takeaway: Ed, one space is enough. I don't know how you got the idea 
>> people might miss seeing a single space, and so you need to type two or 
>> more.
>
>I wondered if it wasn't a typo or keyboard sensitivity issue.  I 
>remember I had to really slow down the double click speed for my grandpa 
>(R.I.P.) so that he could use the mouse.  Maybe some users actuate keys 
>slowly enough that the computer thinks that it's repeated keys.  ¯\_(ツ)_/¯

Well now he's flaunting it in his latest posts. Never mind. :)

>> And since plain ASCII is hard-formatted, extra spaces are NOT ignored 
>> and make for wider spacing between words.
>
>It seems as if you made an assumption.  Just because the underlying 
>character set is ASCII (per RFC 821 & 822, et al) does not mean that the 
>data that they are carrying is also ASCII.  As is evident by the 
>Content-Type: header stating the character set of UTF-8.

Containing extended Unicode character sets via UTF-8, doesn't make it a
non-hard-formatted medium. In ASCII a space is a space, and multi-spaces
DON'T collapse. White space collapse is a feature of html, and whether an
email is html or not is determined by the sending utility.


>Especially when textual white space compression does exactly that, 
>ignore extra white spaces.
>
>> Which  looksvery   odd, even if your mail utility didn't try to 
>> do something 'special' with your unusual user input.

As you see, this IS NOT HTML, since those extra spaces and your diagram below 
would
have collapsed if it was html. Also saving it as text and opening in a plain 
text ed
or hex editor absolutely reveals what it is.


>I frequently use multiple spaces with ASCII diagrams.
>
>+--+
>| This |
>|  is  |
>|   a  |
>|  box |
>+--+


>> Btw, I changed the subject line, because this is a wider topic. I've been 
>> meaning to start a conversation about the original evolution of ASCII, 
>> and various extensions. Related to a side project of mine.
>
>I'm curious to know more about your side project.

Hmm... the problem is it's intended to be serious, but is still far from 
exposure-ready.
So if I talk about it now, I risk having specific terms I've coined in the doco 
(including
the project name) getting meme-jammed or trademarked by others. The plan is to 
release it
all in one go, eventually. Definitely will be years before that happens, if 
ever.

However, here's a cut-n-paste (in plain text) of a section of the Introduction 
(html with diags.)
--
Almost always, a first attempt at some unfamiliar, complex task produces a less 
than optimal result. Only with the knowledge gained from actually doing a new 
thing, can one look back and see the mistakes made. It usually takes at least 
one more cycle of doing it over from scratch to produce something that is 
optimal for the needs of the situation. Sometimes, especially where deep and 
subtle conceptual innovations are involved, it takes many iterations.

Human development of computing science (including information coding schemes) 
has been effectively a 'first time effort', since we kept on developing new 
stuff built on top of earlier work. We almost never went back to the roots and 
rebuilt everything, applying insights gained from the many mistakes made.

In reviewing the evolution of information coding schemes since very early 
stages 

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Peter Corlett via cctalk
On Sun, Nov 25, 2018 at 03:06:29PM -0800, Chuck Guzis via cctalk wrote:
[...]
> I routinely get Turkish and Greek spam in my mailbox--and I've gotten
> Cyrillic-alphabet stuff as well.

I had started to get slightly paranoid about the fact that there was a sudden
increase in Dutch-language spam and wondered how they had figured out my
physical location. On reflection, it's probably just that I now receive enough
legitimate-ish email that the Bayesian filter has adjusted and no longer
assumes that the correct response to Dutch text is "dat kan niet".



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Peter Corlett via cctalk
On Sun, Nov 25, 2018 at 07:59:13PM -0800, Fred Cisin via cctalk wrote:
[...]
> Alas, "current" computers use 8, 16, 32. They totally fail to understand the
> intrinsic benefits of 9, 12, 18, 24, and 36 bits.

Oh go on then, I'm curious. What are the benefits? Is it just that there are
useful prime factors for bit-packing hacks? And if so, why not 30?



RE: George Keremedjiev

2018-11-26 Thread Dave Wade via cctalk
> 
> #4 _You_ appear to have some "very old mail program" (to use your own
> phrase) because it is screwing up your posting _and_ screwing up double
> spaces.
> 

There are no "double spaces" but for some reason he has a space and a UTF-8 
non-breaking space next to each other. Most odd...
It looks like the mail client is the AOL webmail client. Headers say:-

Message-Id: <1674dba424c-1ec3-5...@webjas-vad199.srv.aolmail.net>
X-MB-Message-Source: WebUI
X-MB-Message-Type: User
X-Mailer: JAS DWEB

Dave

> So it is you causing the problems here, I'm sorry to say.
> 
> --
> Liam Proven - Profile: https://about.me/liamproven
> Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
> Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
> UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053



Re: George Keremedjiev

2018-11-26 Thread Liam Proven via cctalk
On Mon, 26 Nov 2018 at 12:17, ED SHARPE via cctalk
 wrote:
>
> seems only the  very old   mail programs  do not adapt  to all character sets?

Maybe so, Ed, but it's basic good manners to both (a) not make your
emails unnecessarily difficult for others to read, and (b) respect the
etiquette of the forum that you're posting in.

You do neither.

So, for instance, in your message to which I am replying, you:
#1 top-post, against general mailing-list etiquette
#2 fail to capitalise the sentence, against basic English rules
#3 insert unnecessary double-spaces into "the very", "old mail",
"programs do", and "adapt to".

*And*

#4 _You_ appear to have some "very old mail program" (to use your own
phrase) because it is screwing up your posting _and_ screwing up
double spaces.

So it is you causing the problems here, I'm sorry to say.

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Liam Proven via cctalk
On Mon, 26 Nov 2018 at 01:00, Grant Taylor via cctalk
 wrote:
>
> If they are not seen as separate letters, then do their meaning's
> change?  Or is the different accent more for pronunciation?

No, mainly, it changes alphabetical order and it makes asking questions tricky.

I see š as an s-with-a-haček and if I forget the haček, I may
pronounce it as an s; š = ``sh'' in English. ``č'' = "ch" in English.

But that isn't how Czechs think. It's as impossible to misread or
mispronounce Š as S as it would be a nonsense to mispronounced ``T''
as ``M'' in English, so people find it very hard to guess what I mean.
To me, the diacritic modifies a letter, and in a word with 4 or 5
diacritics, they pile up in my head, I overload and may drop one or 2
of them. That renders the world as babel in Czech.

(I chose T/M because, incredibly to me, hand-written T in Russian is
written as M. Mind you, handwritten almost everything in Russian
becomes mMmmmMMmm. I can read printed Cyrillic but I find
handwritten stuff impossible.)


> I assume that they have different meanings (if that applies to letters)
> and are uses as different as "A" and "q".

Yes.

> > Czech is like that. Š and Č and Ž and many more that my Mac can't
> > readily type are _extra letters_ which come after the unmodified form
> > in the alphabet.
>
> ~twitch~

Yep. The Scandinavians have just 3 extras.

Czech has about a dozen.

https://en.wikipedia.org/wiki/Czech_orthography

42 letters (!).

> I don't even know how to properly describe something that visually looks
> like letters (glyphs?) to me, but may be an imprecise simplification on
> my part.

If it's in Roman, Cyrillic, or Greek, they're alphabets, so it's a letter.

I can't read Arabic or Hebrew but I believe they're alphabets too.

I don't know anything about any Asian scripts except a tiny bit of
Japanese and Chinese, and they get called different things, but
"character" is probably most common.


> I had to zoom my font to see enough detail in Křižíkova, but it does
> look like things came through just like you describe.  (They even made
> it through my shell script that I use to re-flow text in replies.)

Good!

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Grant Taylor via cctalk
Not to beat a dead horse, but I ran across "Â Â Â  " in a text file when 
read via a web browser this evening and wanted to share my findings as 
they seemed timely.


On 11/22/18 5:55 PM, Guy Dunphy via cctalk wrote:
Anyway, I was wondering how Ed's emails (and sometimes others elsewhere) 
acquired that odd corruption.


IMHO it's not corruption as much as it is incompatibility.

Answer: Ed's email util … interpret the user typing space twice in 
succession, as meaning "I really, really want there to be a space here, 
no matter what." So it inserts a 'no-break space' unicode character, 
which of course requires a 2-byte UTF-8 encoding.


What I'm not sure of is how the 0xC2 0xA0 translates to 0xC3 0xA2 that 
is the  character.


I think that the 0xC2 0xA0 pair is treated as two independent 
characters.  Thus 0xC2 is "Â", and 0xA0 is a non-breaking space.


I don't know what happens to the non-breaking space, but the  and the 
space (0x20) that is after 0xC2 0xA0 (three byte sequence being 0xC2 
0xA0 0x20) is included and becomes "Â " which is what we see in reply 
text.  (Encoded as 0xC3 0x83 0x20.)


So, arguably, improperly processed / translated text that results in 
0xC3 0x83 0x20 / "Â " should have been a non-breaking space followed by 
a space.


This jives with both Ed's email and the document that I was reading that 
prompted this email.



Then adds a plain ASCII space 0x20 just to be sure.


I don't think it's adding a plain ASCII space 0x20 just to be sure. 
Looking at the source of the message, I see =C2=A0, which is the UTF-8 
representation followed by the space.  My MUA that understands UTF-8 
shows that "=C2=A0 " translates to "  ".  Further, "=C2=A0 =C2=A0" 
translates to "   ".


Some of the reading that I did indicates that many things, HTML 
included, use white space compaction (by default), which means that 
multiple white space characters are reduced to a single white space 
character.  So, when Ed wants multiple white spaces, his MUA has to do 
something to state that two consecutive spaces can't be compacted. 
Hence the non-breaking space.


=C2=A0 quite literally translates to a space character that can't be 
compacted.  Thus "=C2=A0 =C2=A0" is really " " or "   ".


Multiple successive spaces will need to be a mixture of space and 
non-breaking space characters.


So, the plain ASCII space 0x20 after (or before) =C2=A0 is not there 
just to be sure.


Personally I find it more interesting than annoying. Just another example 
of the gradual chaotic devolution of ASCII, into a Babel of incompatible 
encodings. Not that ASCII was all that great in the first place.


As stated in another reply, I don't think ASCII was ever trying to be 
the Babel fish.  (Thank you Douglas Adams.)


Takeaway: Ed, one space is enough. I don't know how you got the idea 
people might miss seeing a single space, and so you need to type two or 
more.


I wondered if it wasn't a typo or keyboard sensitivity issue.  I 
remember I had to really slow down the double click speed for my grandpa 
(R.I.P.) so that he could use the mouse.  Maybe some users actuate keys 
slowly enough that the computer thinks that it's repeated keys.  ¯\_(ツ)_/¯


But it isn't so. The normal convention in plain text is one space 
character between each word.


The operative word is "convention", as in commonly accepted but not 
always the case behavior.  ;-)


And since plain ASCII is hard-formatted, extra spaces are NOT ignored 
and make for wider spacing between words.


It seems as if you made an assumption.  Just because the underlying 
character set is ASCII (per RFC 821 & 822, et al) does not mean that the 
data that they are carrying is also ASCII.  As is evident by the 
Content-Type: header stating the character set of UTF-8.


Especially when textual white space compression does exactly that, 
ignore extra white spaces.


Which  looksvery   odd, even if your mail utility didn't try to 
do something 'special' with your unusual user input.


I frequently use multiple spaces with ASCII diagrams.

+--+
| This |
|  is  |
|   a  |
|  box |
+--+

That will not look like I intended it with white space compression.

Btw, I changed the subject line, because this is a wider topic. I've been 
meaning to start a conversation about the original evolution of ASCII, 
and various extensions. Related to a side project of mine.


I'm curious to know more about your side project.



--
Grant. . . .
unix || die


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Fred Cisin via cctalk

Therefore, for use with current computers, 32 bits would be needed.
Some games can be played with mixing sizes by doing things like setting 
high bit, for 128 7 bit characters plus 32768 15 bit characters, and 
2147483648 31 bit characters.


On Sun, 25 Nov 2018, ben via cctalk wrote:

REAL COMPUTERS USE 18 BITS...  RUNS
BEN.


Alas, "current" computers use 8, 16, 32.
They totally fail to understand the intrinsic benefits of 9, 12, 18, 24, 
and 36 bits.




Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread ben via cctalk

On 11/25/2018 6:34 PM, Fred Cisin via cctalk wrote:

On Mon, 26 Nov 2018, Tomasz Rola via cctalk wrote:

To supply this train of thought with some numbers:

- my copy of Common Lisp HyperSpec claims 978 symbols (i.e. words) on
  its alphabetical index; many words have modifiers (a.k.a. keyword
  options, with default values) which increases the number at least
  twofold, IMHO, if one agrees that each combo should be counted as
  different word, to which I would say yes

- I have read somewhere that Japanese pupil after graduating from
  elementary school is supposed to know 1000 kanjis by heart (there
  is a standardised set, I have a book)


Would those "modifiers of words" qualify as ADJECTIVES?


The Japanes phonetic alphabets, Katakana and Hirigana, have 46 letters 
each, almost twice that with diacritics.
I have heard that Japanese Kanji has more than 50,000 words/characters 
(for which 16bits would fit, but be a little risky).  But, that in 
common usage, 1100 to 2000 words comprise most of common usage.  
Wikipedia says that as of 2010, the student requirement is 2136.


Japanese Kanji and Chinese have substantial overlap, but there is no way 
that you could squeeze both into 16 bits, without leaving out important 
stuff.


Therefore, for use with current computers, 32 bits would be needed.
Some games can be played with mixing sizes by doing things like setting 
high bit, for 128 7 bit characters plus 32768 15 bit characters, and 
2147483648 31 bit characters.





REAL COMPUTERS USE 18 BITS...  RUNS
BEN.



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread ED SHARPE via cctalk
ASL is   quite  different than  English... you can sign in English or  you can 
sign in ASL  The  ASL  has a different sentence structure. When I  was  first 
learning  about  the  Deaf Teletype  revolution  (We have a collection of  a 
diverse group of  TTY both  mechanical and  CRT and portable  and ...  I would  
correspond  via  email  with a young  person that  sold  us   some ttys  and 
wondered  why it  was almost  a different  sentence  structure, almost  like  
Yoda  but  if  you  look at  both closely  not  really  the  exact  same.  Hard 
 to  explain... but  English and ASL  utilize  2  different   Sentence 
structuring ... or  so it  appears  to me.

 
If  you  learn  ASL and Signing  well there is a  good need  for  excellent 
interpreters out there.
 
And  yes,  always  looking for  ANYTHING  related to the  history of  TTY  and 
other  assertive  communications  devices.
 
Ed#
 
 
In a message dated 11/25/2018 5:46:55 PM US Mountain Standard Time, 
cctalk@classiccmp.org writes:

 
There are still MANY schools arguing about whether to accept ASL (American 

Sign Language, as used by Deaf people). I would think that therefore, BSL 
(British Sign Language) should qualify


Re: e-mail, character sets, encodings (was Re: George Keremedjiev)

2018-11-25 Thread Toby Thain via cctalk
On 2018-11-25 7:45 PM, Bill Gunshannon via cctalk wrote:
> It's not a mailing list problem.  It's not even a mail problem. It's a
> 
> Mail User Agent problem.  It is a display problem.  It is up to the
> 
> users mail program to display the email as it was sent.  Unless the
> 


Did you really double space this email like a high school essay? Don't
see that every day.

--T



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Fred Cisin via cctalk

On Mon, 26 Nov 2018, Tomasz Rola via cctalk wrote:

To supply this train of thought with some numbers:

- my copy of Common Lisp HyperSpec claims 978 symbols (i.e. words) on
  its alphabetical index; many words have modifiers (a.k.a. keyword
  options, with default values) which increases the number at least
  twofold, IMHO, if one agrees that each combo should be counted as
  different word, to which I would say yes

- I have read somewhere that Japanese pupil after graduating from
  elementary school is supposed to know 1000 kanjis by heart (there
  is a standardised set, I have a book)


Would those "modifiers of words" qualify as ADJECTIVES?


The Japanes phonetic alphabets, Katakana and Hirigana, have 46 letters 
each, almost twice that with diacritics.
I have heard that Japanese Kanji has more than 50,000 words/characters 
(for which 16bits would fit, but be a little risky).  But, that in common 
usage, 1100 to 2000 words comprise most of common usage.  Wikipedia says 
that as of 2010, the student requirement is 2136.


Japanese Kanji and Chinese have substantial overlap, but there is no way 
that you could squeeze both into 16 bits, without leaving out important 
stuff.


Therefore, for use with current computers, 32 bits would be needed.
Some games can be played with mixing sizes by doing things like setting 
high bit, for 128 7 bit characters plus 32768 15 bit characters, and 
2147483648 31 bit characters.




Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Tomasz Rola via cctalk
On Sun, Nov 25, 2018 at 04:46:50PM -0800, Fred Cisin via cctalk wrote:
[...]
> Is FORTRAN considered modern enough?
[...]
> What about APL?  Although its structure is fairly straight-forward,
> it does, indeed, have a unique character set.

To supply this train of thought with some numbers:

 - my copy of Common Lisp HyperSpec claims 978 symbols (i.e. words) on
   its alphabetical index; many words have modifiers (a.k.a. keyword
   options, with default values) which increases the number at least
   twofold, IMHO, if one agrees that each combo should be counted as
   different word, to which I would say yes

 - I have read somewhere that Japanese pupil after graduating from
   elementary school is supposed to know 1000 kanjis by heart (there
   is a standardised set, I have a book)

-- 
Regards,
Tomasz Rola

--
** A C programmer asked whether computer had Buddha's nature.  **
** As the answer, master did "rm -rif" on the programmer's home**
** directory. And then the C programmer became enlightened...  **
** **
** Tomasz Rola  mailto:tomasz_r...@bigfoot.com **


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Fred Cisin via cctalk

On Sun, 25 Nov 2018, Frank McConnell via cctalk wrote:
I have been told that in the 1960s taking a course in FORTRAN 
programming fulfilled the foreign language requirement at UC Berkeley.


Not currently, and I have some doubt about then.

But, there are conflicting staatements.
One section requires that it be a MODERN language, but with specific 
exceptions for ASL and "classical languages, such as Latin and Greek".

Is FORTRAN considered modern enough?

There are still MANY schools arguing about whether to accept ASL (American 
Sign Language, as used by Deaf people). I would think that therefore, BSL 
(British Sign Language) should qualify.



What about APL?  Although its structure is fairly straight-forward, it 
does, indeed, have a unique character set.





Re: e-mail, character sets, encodings (was Re: George Keremedjiev)

2018-11-25 Thread Bill Gunshannon via cctalk
It's not a mailing list problem.  It's not even a mail problem. It's a

Mail User Agent problem.  It is a display problem.  It is up to the

users mail program to display the email as it was sent.  Unless the

user doesn't want to see anything in character sets other than

their favorite.  Nothing along the way should change anything

in an email message.  The endpoint should receive whatever the

beginning point sent out and either handle it or not.  But it is the

endpoints responsibility to try to display it accurately.  I often

send emails (and post on USENET) characters that are not a part

of ASCII or the English alphabet.  I certainly don't want someone

in between to modify what I send.


bill


On 11/25/18 7:00 PM, ED SHARPE via cctalk wrote:
> Hi  Frank  and  others-
> Yea it  is  only here   we  have  the  problem. or  at leased this is the 
> only  list  serve that  does not  like  it.
>   
> I  wondered if  something  could be handled at the listserv  end  or  not  
> but I have littleknowledge of list serves alas...
>   
> Sad  when people   spent  more   time  on characters  rather than  George the 
>   museum archivist that passed away.
>
>   
> George  worked his  ass off to achieve what  he  did.
>   
> Google him  and  read  about  his  early days. You will be  surprised and  
> you might   find yourself  thankful  for  how easy  you  had  it.
>   
> I did not  know  him  all that  well  but I did  provide his  PDP-8  classic  
> with the  plexis  when He was  first starting up It  was a  beauty and in the 
> 200 serial number  range as  I  remember. We kept  #18 classic  Plexi for  
> SMECC
>   
> I  had  not  planned on selling it  as   always  handy to have a  #2  for an 
> offsite display and you do not have to disturb the in-house  display but 
> George seems  so  focused and  intense on  making a  museum  too  so  who  
> could  say no to that? I  wish I  had.  traveled to  see his  effort  up  
> close.
>   
> Project this  week is  to  find  someone  one  with a UNIVAC  422 or   the 
> predecessor  UNIVAC Digital trainer.  I can NOT BELIEVE I am fortunate enough 
>  to be the only one   with a UNIVAC  422'
>   
> That is all for now...  I  think  I  hear   a half of  turkey and leftover 
> dressing in the refrig  wailing to  be consumed.
>   
> Ed#  www.smecc.org
>   
>   
>   
> In a message dated 11/25/2018 4:32:34 PM US Mountain Standard Time, 
> cctalk@classiccmp.org writes:
>
>   
> Most mail servers sending inbound messages to the list include the encoding
>
> scheme in the header. The mailer program should process and translate the
> email message body accordingly...in theory anyway. The set up and testing
> of a sampling of encoding variations would reveal which interpreters were
> missing in our particular list's relay process. Someone could create tests
> with the most common 20 or so encoding schemes and a character set dump and
> document the results etc. Anyone have the time for that? I dont really
> think asking persons to fix their email program is the solution, it's a
> mailing list fix/enhancement. I bet there is documentation on such a
> procedure I can't imagine we are the first to encounter this problem. It's
> fixable
> B
>
> On Sun, Nov 25, 2018, 3:24 PM Frank McConnell via cctalk <
> cctalk@classiccmp.org wrote:
>
>> Very old mail programs indeed have no understanding whatsoever of
>> character sets or encoding. They simply display data from the e-mail file
>> on stdout or equivalent. If you are lucky, the character set and encoding
>> in the e-mail match the character set and encoding used by your terminal.
>>
>> The early-to-mid-1990s MIME work was in some part about allowing e-mail to
>> indicate its character set and encoding, because at that point in time
>> there were many character sets and multiple encodings. Before that, you
>> had to figure them out from your correspondent's e-mail address and the
>> mess on your screen or printout.
>>
>> And really it's not just about the mail program, it's about the host
>> operating system and the hardware on which it runs and which you are using
>> to view e-mail. Heavy-metal characters are likely to look funny on a
>> terminal built to display US-ASCII like an HP 2645. Your chances get
>> better if the software has enough understanding of various Roman-language
>> text encodings and you are using an HP 2622 with HP-ROMAN8 character
>> support and the connection between your host and terminal is
>> eight-bit-clean. But then you get something that uses Cyrillic and now
>> you're looking at having another HP 2645 set up to do Russian. And hoping
>> your host software knows how to deal with those character sets and
>> encodings too!
>>
>> -Frank McConnell
>>
>> On Nov 25, 2018, at 9:55, ED SHARPE via cctalk wrote:
>>> seems only the very old mail programs do not adapt to all character
>> sets?
>>>
>>> In a message dated 11/25/2018 6:19:52 AM US Mountain Standard Time,
>> cctalk@classiccmp.org writes:
>>>
>>>

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Fred Cisin via cctalk
We have a tendency to be remarkably ethnocentric.  When you apply for a 
job, do you send them a copy of your RESUME?

There is an exit on 280 for "La Canada" road.

For most European languages (I did say MOST), an 8 bit extended ASCII 
could be adequate.


"Recently" (1981), I was disappointed in IBM's character extensions for 
the 5150.  We got smiley faces, but not even pound-sterling nor Yen!


16 bits would presumably be adequate for designing a character set for 
most phonetic alphabets. (I did say MOST).


When I got my Epson HC-20's (like the HX-20, but including Katakana), and 
my Epson RC-20 (wristwatch, Z80 like, with RAM, ROM, and serial port)
I started to try to learn a little Japanese.  I didn't get very far, but I 
did at least learn the sounds of Katakana, and could sound out words 
written in it (a LOT of computer materials use Katakana for non-Japanese 
words, such as "monitor")


But, full inclusion of pictographic languages (Kanji, etc.) would require 
more than 16 bits.



--
Grumpy Ol' Fred ci...@xenosoft.com


Re: e-mail, character sets, encodings (was Re: George Keremedjiev)

2018-11-25 Thread ED SHARPE via cctalk
Hi  Frank  and  others-
Yea it  is  only here   we  have  the  problem. or  at leased this is the only  
list  serve that  does not  like  it.
 
I  wondered if  something  could be handled at the listserv  end  or  not  but 
I have littleknowledge of list serves alas...
 
Sad  when people   spent  more   time  on characters  rather than  George the   
museum archivist that passed away. 

 
George  worked his  ass off to achieve what  he  did.
 
Google him  and  read  about  his  early days. You will be  surprised and  you 
might   find yourself  thankful  for  how easy  you  had  it. 
 
I did not  know  him  all that  well  but I did  provide his  PDP-8  classic  
with the  plexis  when He was  first starting up It  was a  beauty and in the 
200 serial number  range as  I  remember. We kept  #18 classic  Plexi for  SMECC
 
I  had  not  planned on selling it  as   always  handy to have a  #2  for an 
offsite display and you do not have to disturb the in-house  display but George 
seems  so  focused and  intense on  making a  museum  too  so  who  could  say 
no to that? I  wish I  had.  traveled to  see his  effort  up  close. 
 
Project this  week is  to  find  someone  one  with a UNIVAC  422 or   the 
predecessor  UNIVAC Digital trainer.  I can NOT BELIEVE I am fortunate enough  
to be the only one   with a UNIVAC  422'
 
That is all for now...  I  think  I  hear   a half of  turkey and leftover 
dressing in the refrig  wailing to  be consumed.
 
Ed#  www.smecc.org 
 
 
 
In a message dated 11/25/2018 4:32:34 PM US Mountain Standard Time, 
cctalk@classiccmp.org writes:

 
Most mail servers sending inbound messages to the list include the encoding

scheme in the header. The mailer program should process and translate the
email message body accordingly...in theory anyway. The set up and testing
of a sampling of encoding variations would reveal which interpreters were
missing in our particular list's relay process. Someone could create tests
with the most common 20 or so encoding schemes and a character set dump and
document the results etc. Anyone have the time for that? I dont really
think asking persons to fix their email program is the solution, it's a
mailing list fix/enhancement. I bet there is documentation on such a
procedure I can't imagine we are the first to encounter this problem. It's
fixable
B

On Sun, Nov 25, 2018, 3:24 PM Frank McConnell via cctalk <
cctalk@classiccmp.org wrote:

> Very old mail programs indeed have no understanding whatsoever of
> character sets or encoding. They simply display data from the e-mail file
> on stdout or equivalent. If you are lucky, the character set and encoding
> in the e-mail match the character set and encoding used by your terminal.
>
> The early-to-mid-1990s MIME work was in some part about allowing e-mail to
> indicate its character set and encoding, because at that point in time
> there were many character sets and multiple encodings. Before that, you
> had to figure them out from your correspondent's e-mail address and the
> mess on your screen or printout.
>
> And really it's not just about the mail program, it's about the host
> operating system and the hardware on which it runs and which you are using
> to view e-mail. Heavy-metal characters are likely to look funny on a
> terminal built to display US-ASCII like an HP 2645. Your chances get
> better if the software has enough understanding of various Roman-language
> text encodings and you are using an HP 2622 with HP-ROMAN8 character
> support and the connection between your host and terminal is
> eight-bit-clean. But then you get something that uses Cyrillic and now
> you're looking at having another HP 2645 set up to do Russian. And hoping
> your host software knows how to deal with those character sets and
> encodings too!
>
> -Frank McConnell
>
> On Nov 25, 2018, at 9:55, ED SHARPE via cctalk wrote:
> >
> > seems only the very old mail programs do not adapt to all character
> sets?
> >
> >
> > In a message dated 11/25/2018 6:19:52 AM US Mountain Standard Time,
> cctalk@classiccmp.org writes:
> >
> >
> >
> >
> >> On Nov 21, 2018, at 4:46 PM, Bill Gunshannon via cctalk <
> cctalk@classiccmp.org> wrote:
> >>
> >>
> >>> On 11/21/18 5:19 PM, Fred Cisin via cctalk wrote:
> >>> Ed,
> >>> It is YOUR mail program that is doing the extraneous insertions, and
> >>> then not showing them to you when you view your own messages.
> >>>
> >>> ALL of us see either extraneous characters, or extraneous spaces in
> >>> everything that you send!
> >>> I use PINE in a shell account, and they show up as a whole bunch of
> >>> inappropriate spaces.
> >>>
> >>> Seriously, YOUR mail program is inserting extraneous stuff.
> >>> Everybody? but you sees it.
> >>>
> >>
> >> I don't. I didn't see it until someone replied with a
> >>
> >> copy of the offending text included.
> >>
> >>
> >> bill
> >>
> > same here. i didnt see them until some replies included the text.
> >
> > kelly
> >
>
>


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Grant Taylor via cctalk

On 11/25/18 3:53 PM, Liam Proven wrote:

It's been enlightening!


:-)


Some I was ready for.

E.g. In French or Spanish, both of which I can speak to some extent, 
letters  like á or ó are not seen as separate letters: French would call 
them a-acute, an a with an acute accent. Ç is a c with a cedilla.  Etc.


If they are not seen as separate letters, then do their meaning's 
change?  Or is the different accent more for pronunciation?


But in Swedish/Norwegian/Danish -- I speak basic Norwegian and rudimentary 
Swedish -- ø and å and ä and so on are not a or o with accents on: 
they are _different letters_ that come at the end of the alphabet.


I assume that they have different meanings (if that applies to letters) 
and are uses as different as "A" and "q".


Czech is like that. Š and Č and Ž and many more that my Mac can't 
readily type are _extra letters_ which come after the unmodified form 
in the alphabet.


~twitch~

I don't even know how to properly describe something that visually looks 
like letters (glyphs?) to me, but may be an imprecise simplification on 
my part.


Without them, you can't write correct Czech. It's worse than writing 
English without the letter E.


Usually you can guess but not always.

Byt means flat, apartment; b y-acute t means the verb "to be".

You can probably work that out, but you can't always. A restaurant 
menu would be hopelessly corrupted as both "raw" and "with cheese" 
are quite likely.


Indeed.


Sure, my office street name:  Křižíkova

K, r haček, i, z haček, i acute, k o v a.


I had to zoom my font to see enough detail in Křižíkova, but it does 
look like things came through just like you describe.  (They even made 
it through my shell script that I use to re-flow text in replies.)



A hacek is like an upside down circumflex: ^

Also known as a caron.


ACK


Oh yes. It's quite a minefield.


/me blinks and shakes his head.

Czech keyboards have so many extra letters, the *numbers* are on shift 
combinations!


~chuckle~


Well yes.

I believe Mr Corlett here rejects all mail from gmail.com -- except 
mine... ;-)


¯\_(ツ)_/¯



--
Grant. . . .
unix || die


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Frank McConnell via cctalk
On Nov 25, 2018, at 15:44, Sean Conner wrote:
>  I even heard of a high school in Tennessee who said computer languages
> fulfill the "foreign language requirements" ... who'da thunk?

I have been told that in the 1960s taking a course in FORTRAN programming 
fulfilled the foreign language requirement at UC Berkeley.

-Frank McConnell



Re: e-mail, character sets, encodings (was Re: George Keremedjiev)

2018-11-25 Thread Grant Taylor via cctalk

On 11/25/18 4:32 PM, Bill Degnan via cctalk wrote:
Most mail servers sending inbound messages to the list include the 
encoding scheme in the header.  The mailer program should process 
and translate the email message body accordingly...in theory anyway. 


Most email handling programs don't need to bother with what the data is, 
as they just move the data.  This largely includes email list managers. 
This really only becomes a concern if something is modifying part of the 
message (data) as it moves through the system.


The set up and testing of a sampling of encoding variations would reveal 
which interpreters were missing in our particular list's relay process. 


cctalk is using Mailman, and I'm fairly sure that Mailman does handle 
this properly.  Or if there is a bug it has likely been found & 
resolved.  In the event that a bug is found, I think that it would be 
best to report it upstream to Mailman so they can fix it, and then 
install the updates when they are released.


Someone could create tests with the most common 20 or so encoding schemes 
and a character set dump and document the results etc.  Anyone have the 
time for that?


I doubt that this is necessary.

Based on what I've seen, Mailman is handling the message (data) just 
fine.  It's passing the Ed's messages with the UTF-8 =C2=A0 
(quoted-printable) encoded parts just fine.


I dont really think asking persons to fix their email program is 
the solution


I agree that it's asking an end user to fix their email client is the 
most viable solution.



it's a mailing list fix/enhancement.


I disagree.

I'm not convinced that this is a problem in email.

I question how many people are seeing the symptoms -and- what email 
client they are using.


If someone knowingly chooses to use an email client that doesn't support 
UTF-8, then ¯\_(ツ)_/¯  That's their choice.  I just hope that they are 
informed in their choice.


I bet there is documentation on such a procedure I can't imagine we are 
the first to encounter this problem.  It's fixable


If you really do think that this is a problem with the mailing list, I'd 
suggest bringing the problem up on the Mailman mailing list.  Mark S. is 
very responsive and can help people fix problems / configurations in 
short order.




--
Grant. . . .
unix || die


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Sean Conner via cctalk
It was thus said that the Great Bill Gunshannon via cctalk once stated:
> 
> On 11/25/18 5:42 PM, Grant Taylor via cctalk wrote:
> > On 11/23/18 5:52 AM, Peter Corlett via cctalk wrote:
> >> Worse than that, it's *American* ignorance and cultural snobbery 
> >> which also affects various English-speaking countries.
> >
> > Please do not ascribe such ignorance with such a broad brush, at least 
> > not without qualifiers that account for people that do try to respect 
> > other people's cultures.
> >
> >
> Q.  What do you call someone who speaks three languages?
> 
> A. Trilingual.
> 
> Q.  What do you call someone who speaks two languages?
> 
> A. Bilingual.
> 
> Q.  What do you call someone who speaks one language?
> 
> A. American.

  As an American, a friend of mine from Sweden (who himself speaks at least
three languages) considered me multilingual.  Of course, my other languages
are BASIC, Assembly, C, Forth ...

  I even heard of a high school in Tennessee who said computer languages
fulfill the "foreign language requirements" ... who'da thunk?

> OK, it's a joke. (I'm American and speak 4 languages.)

  -spc (Who speaks English and perhaps a dozen words in German, but
plenty of computer languages ... )



Re: e-mail, character sets, encodings (was Re: George Keremedjiev)

2018-11-25 Thread Bill Degnan via cctalk
Most mail servers sending inbound messages to the list include the encoding
scheme in the header.  The mailer program should process and translate the
email message body accordingly...in theory anyway.  The set up and testing
of a sampling of encoding variations would reveal which interpreters were
missing in our particular list's relay process.  Someone could create tests
with the most common 20 or so encoding schemes and a character set dump and
document the results etc.  Anyone have the time for that?  I dont really
think asking persons to fix their email program is the solution, it's a
mailing list fix/enhancement.  I bet there is documentation on such a
procedure I can't imagine we are the first to encounter this problem.  It's
fixable
B

On Sun, Nov 25, 2018, 3:24 PM Frank McConnell via cctalk <
cctalk@classiccmp.org wrote:

> Very old mail programs indeed have no understanding whatsoever of
> character sets or encoding.  They simply display data from the e-mail file
> on stdout or equivalent.  If you are lucky, the character set and encoding
> in the e-mail match the character set and encoding used by your terminal.
>
> The early-to-mid-1990s MIME work was in some part about allowing e-mail to
> indicate its character set and encoding, because at that point in time
> there were many character sets and multiple encodings.  Before that, you
> had to figure them out from your correspondent's e-mail address and the
> mess on your screen or printout.
>
> And really it's not just about the mail program, it's about the host
> operating system and the hardware on which it runs and which you are using
> to view e-mail.  Heavy-metal characters are likely to look funny on a
> terminal built to display US-ASCII like an HP 2645.  Your chances get
> better if the software has enough understanding of various Roman-language
> text encodings and you are using an HP 2622 with HP-ROMAN8 character
> support and the connection between your host and terminal is
> eight-bit-clean.  But then you get something that uses Cyrillic and now
> you're looking at having another HP 2645 set up to do Russian. And hoping
> your host software knows how to deal with those character sets and
> encodings too!
>
> -Frank McConnell
>
> On Nov 25, 2018, at 9:55, ED SHARPE via cctalk wrote:
> >
> > seems only the  very old   mail programs  do not adapt  to all character
> sets?
> >
> >
> > In a message dated 11/25/2018 6:19:52 AM US Mountain Standard Time,
> cctalk@classiccmp.org writes:
> >
> >
> >
> >
> >> On Nov 21, 2018, at 4:46 PM, Bill Gunshannon via cctalk <
> cctalk@classiccmp.org> wrote:
> >>
> >>
> >>> On 11/21/18 5:19 PM, Fred Cisin via cctalk wrote:
> >>> Ed,
> >>> It is YOUR mail program that is doing the extraneous insertions, and
> >>> then not showing them to you when you view your own messages.
> >>>
> >>> ALL of us see either extraneous characters, or extraneous spaces in
> >>> everything that you send!
> >>> I use PINE in a shell account, and they show up as a whole bunch of
> >>> inappropriate spaces.
> >>>
> >>> Seriously, YOUR mail program is inserting extraneous stuff.
> >>> Everybody? but you sees it.
> >>>
> >>
> >> I don't. I didn't see it until someone replied with a
> >>
> >> copy of the offending text included.
> >>
> >>
> >> bill
> >>
> > same here. i didnt see them until some replies included the text.
> >
> > kelly
> >
>
>


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Grant Taylor via cctalk

On 11/25/18 3:51 PM, Bill Gunshannon via cctalk wrote:

Q.  What do you call someone who speaks three languages?

A. Trilingual.

Q.  What do you call someone who speaks two languages?

A. Bilingual.

Q.  What do you call someone who speaks one language?

A. American.


Monolingual.


OK, it's a joke. (I'm American and speak 4 languages.)


I've heard it before.  I know there are a LOT of monolingual people in 
the world that don't live in the U.S.A.  But I'll guess that percentage 
wise, the U.S.A. is probably up there for monolingual people.




--
Grant. . . .
unix || die



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Bill Gunshannon via cctalk

On 11/25/18 6:06 PM, Chuck Guzis via cctalk wrote:
> On 11/25/18 2:53 PM, Liam Proven via cctalk wrote:
>> On Sun, 25 Nov 2018 at 23:42, Grant Taylor via cctalk
>>  wrote:
>>
>>> I bet you see all sorts of things that I'm ignorant of.
>> It's been enlightening!
> I routinely get Turkish and Greek spam in my mailbox--and I've gotten
> Cyrillic-alphabet stuff as well.
>
> Shrug.  We all live on the same planet.
>
I live in the US and while I see less of it now than I used to,

at the University I used to get SPAM in Korean, Chinese,

Japanese, Cyrillic, Arabic, Hebrew and a couple of time

even Amharic.  Thus the reason ASCII is no longer the

"standard".


bill




Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Chuck Guzis via cctalk
On 11/25/18 2:53 PM, Liam Proven via cctalk wrote:
> On Sun, 25 Nov 2018 at 23:42, Grant Taylor via cctalk
>  wrote:
> 
>> I bet you see all sorts of things that I'm ignorant of.
> 
> It's been enlightening!

I routinely get Turkish and Greek spam in my mailbox--and I've gotten
Cyrillic-alphabet stuff as well.

Shrug.  We all live on the same planet.

--Chuck



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Liam Proven via cctalk
On Sun, 25 Nov 2018 at 23:42, Grant Taylor via cctalk
 wrote:

> I bet you see all sorts of things that I'm ignorant of.

It's been enlightening!

Some I was ready for.

E.g. In French or Spanish, both of which I can speak to some extent,
letters  like á or ó are not seen as separate letters: French would
call them a-acute, an a with an acute accent. Ç is a c with a cedilla.
Etc.

But in Swedish/Norwegian/Danish -- I speak basic Norwegian and
rudimentary Swedish -- ø and å and ä and so on are not a or o with
accents on: they are _different letters_ that come at the end of the
alphabet.

Czech is like that. Š and Č and Ž and many more that my Mac can't
readily type are _extra letters_ which come after the unmodified form
in the alphabet.

Without them, you can't write correct Czech. It's worse than writing
English without the letter E.

Usually you can guess but not always.

Byt means flat, apartment; b y-acute t means the verb "to be".

You can probably work that out, but you can't always. A restaurant
menu would be hopelessly corrupted as both "raw" and "with cheese" are
quite likely.

> > For example, right now, I am in my office in Křižíkova. I can't
> > type that name correctly without Unicode characters, because the ANSI
> > character set doesn't contain enough letters for Czech.
>
> Intriguing.  Is there an old MS-DOS Code Page (or comparable technique)
> that does encompass the necessary characters?

Don't know. But I suspect there weren't many PCs here before the
Velvet Revolution in 1989. Democracy came around the time of Windows
3.0 so there may not have been much of a commerical drive.


> Would you please provide an example?

Sure, my office street name:  Křižíkova

> (I'm curious if my email client
> will display things properly.)

K, r haček, i, z haček, i acute, k o v a.

A hacek is like an upside down circumflex: ^

Also known as a caron.

> Oh my.  I had no idea that accent characters made such a difference. But
> I consider that to be my personal ignorance living in the U.S.A.  I do
> NOT think it's anybody's fault by my own.  I'll defend others if someone
> tries to say that their native / local regional norm is the problem.

Oh yes. It's quite a minefield.

Czech keyboards have so many extra letters, the *numbers* are on shift
combinations!

> I will say that I think everybody has their own individual prerogative
> to filter email as they see fit.  They just need to know that they are
> doing and own the fact that they might be causing unintentional harm.
>
> P.S.  Resending from the correct email address.  —  A recent Thunderbird
> update broke the Correct-Identity add-on.  :-(

Well yes.

I believe Mr Corlett here rejects all mail from gmail.com -- except mine... ;-)

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Bill Gunshannon via cctalk

On 11/25/18 5:42 PM, Grant Taylor via cctalk wrote:
> On 11/23/18 5:52 AM, Peter Corlett via cctalk wrote:
>> Worse than that, it's *American* ignorance and cultural snobbery 
>> which also affects various English-speaking countries.
>
> Please do not ascribe such ignorance with such a broad brush, at least 
> not without qualifiers that account for people that do try to respect 
> other people's cultures.
>
>
Q.  What do you call someone who speaks three languages?

A. Trilingual.

Q.  What do you call someone who speaks two languages?

A. Bilingual.

Q.  What do you call someone who speaks one language?

A. American.



OK, it's a joke. (I'm American and speak 4 languages.)


bill




Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Grant Taylor via cctalk

On 11/23/18 11:27 AM, Tomasz Rola via cctalk wrote:
Well, that was low hanging fruit. But if he indeed turns it off and the 
problem is not gone, that will be a bit of puzzle. Will require some way 
to compare mailboxes in search of pattern in missing emails... Which may 
or may not be obvious... which will lead to more puzzles... oy maybe I 
should have stayed muted and let others do the job...


I'd question modern anti-spam techniques like DMARC and DKIM.  I'd 
suggest checking the mailing list to see if there is any information 
about bounces.


You can probably see crumbs of missing messages in message flow (likely 
already happening), the References: & In-Reply-To: headers, and the list 
archive.




--
Grant. . . .
unix || die


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Grant Taylor via cctalk

On 11/23/18 4:12 AM, Liam Proven via cctalk wrote:

That's English-language cultural snobbery.


I don't think I'd go that far.

I'd suspect it's an unfortunate false positive of a spam filtering 
technique that Guy uses.


Does the technique have some negative side effects?  Sure.

Are said side effects intentional?  I doubt it.

I'm a native Anglophone but I live in a non-English speaking country, 
Czechia.


I bet you see all sorts of things that I'm ignorant of.

For example, right now, I am in my office in Křižíkova. I can't 
type that name correctly without Unicode characters, because the ANSI 
character set doesn't contain enough letters for Czech.


Intriguing.  Is there an old MS-DOS Code Page (or comparable technique) 
that does encompass the necessary characters?


It can cope with some Western European letters needed for Spanish, 
French etc., but not even enough for the Norwegian letter ``ø''. So 
I can type the name of the district of Prague I'm in -- Karlín -- 
and you'll probably see that, but the street name, I am guessing not.


Would you please provide an example?  (I'm curious if my email client 
will display things properly.)  Feel free to pick any example that you 
like so that you don't have to reveal information you might want to keep 
private.


"Krizikova" is usually close enough but it's not correct. Those letters 
are important. E.g. "sýrové" means cheesy, but "syrové" means 
raw. That's a significant difference.


Oh my.  I had no idea that accent characters made such a difference. But 
I consider that to be my personal ignorance living in the U.S.A.  I do 
NOT think it's anybody's fault by my own.  I'll defend others if someone 
tries to say that their native / local regional norm is the problem.


It matters to me and I'm not even Czech and don't speak it particularly 
well...


Fair enough.

So if you tried to mail me something at work -- the address I normally 
use, for instance for the Alphasmart Dana Wireless on the way to to 
me from Baltimore right now -- and you get a reply saying "package for 
[streetname] undeliverable" in the subject -- you'd just reject it.


That's basically discriminating against people who don't speak your 
language, and in my book, that's not OK.


I will say that I think everybody has their own individual prerogative 
to filter email as they see fit.  They just need to know that they are 
doing and own the fact that they might be causing unintentional harm.


P.S.  Resending from the correct email address.  —  A recent Thunderbird 
update broke the Correct-Identity add-on.  :-(




--
Grant. . . .
unix || die


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Grant Taylor via cctalk

On 11/23/18 5:52 AM, Peter Corlett via cctalk wrote:
Worse than that, it's *American* ignorance and cultural snobbery which 
also affects various English-speaking countries.


Please do not ascribe such ignorance with such a broad brush, at least 
not without qualifiers that account for people that do try to respect 
other people's cultures.


The pound sign is not in US-ASCII, and the euro sign is not in ISO-8859-1, 
for example.


Well, seeing as how ASCII, the /American/ Standard Code for Information 
Interchange, is inherently /American/, I don't personally fault it for 
not having currency symbols for other languages / regions.


Instead, I consider ASCII to be a limited standard.  Hence why so much 
effort has gone into other standards to overcome this, and other, 
limitation(s).


I do not know for sure, but I'm confident that other character sets 
don't have characters / glyphs from other languages.


I'm sure that there is room for a discussion of why ASCII is used as the 
underlying character set for network services and the imposition that it 
imposes on international friends and colleagues.


Amusingly, peering through my inbox in which I have mail in both Dutch 
and English, the only one with a UTF-8 subject line is in English. It 
was probably composed on a Windows box which "helpfully" turned a hyphen 
into an en-dash.


I'm trying to NOT search my mailbox.

I'd be more curious about the number of bodies that contain UTF-8 or 
UTF-16 that can encode more characters / glyphs.  It's my understanding 
that without some special quite modern extensions, non-ASCII is shunned 
in headers, including the Subject: header.


P.S.  Resending from the correct email address.  —  A recent Thunderbird 
update broke the Correct-Identity add-on.  :-(




--
Grant. . . .
unix || die


e-mail, character sets, encodings (was Re: George Keremedjiev)

2018-11-25 Thread Frank McConnell via cctalk
Very old mail programs indeed have no understanding whatsoever of character 
sets or encoding.  They simply display data from the e-mail file on stdout or 
equivalent.  If you are lucky, the character set and encoding in the e-mail 
match the character set and encoding used by your terminal.

The early-to-mid-1990s MIME work was in some part about allowing e-mail to 
indicate its character set and encoding, because at that point in time there 
were many character sets and multiple encodings.  Before that, you had to 
figure them out from your correspondent's e-mail address and the mess on your 
screen or printout.

And really it's not just about the mail program, it's about the host operating 
system and the hardware on which it runs and which you are using to view 
e-mail.  Heavy-metal characters are likely to look funny on a terminal built to 
display US-ASCII like an HP 2645.  Your chances get better if the software has 
enough understanding of various Roman-language text encodings and you are using 
an HP 2622 with HP-ROMAN8 character support and the connection between your 
host and terminal is eight-bit-clean.  But then you get something that uses 
Cyrillic and now you're looking at having another HP 2645 set up to do Russian. 
And hoping your host software knows how to deal with those character sets and 
encodings too!

-Frank McConnell

On Nov 25, 2018, at 9:55, ED SHARPE via cctalk wrote:
> 
> seems only the  very old   mail programs  do not adapt  to all character 
> sets? 
> 
> 
> In a message dated 11/25/2018 6:19:52 AM US Mountain Standard Time, 
> cctalk@classiccmp.org writes:
> 
>  
> 
> 
>> On Nov 21, 2018, at 4:46 PM, Bill Gunshannon via cctalk 
>>  wrote:
>> 
>> 
>>> On 11/21/18 5:19 PM, Fred Cisin via cctalk wrote:
>>> Ed,
>>> It is YOUR mail program that is doing the extraneous insertions, and 
>>> then not showing them to you when you view your own messages.
>>> 
>>> ALL of us see either extraneous characters, or extraneous spaces in 
>>> everything that you send!
>>> I use PINE in a shell account, and they show up as a whole bunch of 
>>> inappropriate spaces.
>>> 
>>> Seriously, YOUR mail program is inserting extraneous stuff.
>>> Everybody? but you sees it.
>>> 
>> 
>> I don't. I didn't see it until someone replied with a
>> 
>> copy of the offending text included.
>> 
>> 
>> bill
>> 
> same here. i didnt see them until some replies included the text.
> 
> kelly
> 



Re: George Keremedjiev

2018-11-25 Thread ED SHARPE via cctalk
seems only the  very old   mail programs  do not adapt  to all character sets? 


In a message dated 11/25/2018 6:19:52 AM US Mountain Standard Time, 
cctalk@classiccmp.org writes:

 


> On Nov 21, 2018, at 4:46 PM, Bill Gunshannon via cctalk 
>  wrote:
> 
> 
>> On 11/21/18 5:19 PM, Fred Cisin via cctalk wrote:
>> Ed,
>> It is YOUR mail program that is doing the extraneous insertions, and 
>> then not showing them to you when you view your own messages.
>> 
>> ALL of us see either extraneous characters, or extraneous spaces in 
>> everything that you send!
>> I use PINE in a shell account, and they show up as a whole bunch of 
>> inappropriate spaces.
>> 
>> Seriously, YOUR mail program is inserting extraneous stuff.
>> Everybody? but you sees it.
>> 
> 
> I don't. I didn't see it until someone replied with a
> 
> copy of the offending text included.
> 
> 
> bill
> 
same here. i didnt see them until some replies included the text.

kelly



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-25 Thread Guy Dunphy via cctalk
At 07:27 PM 23/11/2018 +0100, you wrote:
>On Fri, Nov 23, 2018 at 07:01:17PM +0100, Liam Proven wrote:
>> On Fri, 23 Nov 2018 at 18:54, Tomasz Rola via cctalk
>>  wrote:
>> >
>> > Turn off trashing mails with Unicode in Subject and see if this solves
>> > a problem?
>> 
>> *Loud laughter in the office*
>> 
>> Well _played_, sir!
>
>Well, that was low hanging fruit.

Yes, I should have pre-empted that one. But glad it gave someone a laugh.

>But if he indeed turns it off and
>the problem is not gone, that will be a bit of puzzle.

It's not related. My cctalk filter runs before the UTF-8 trash filter,
and I check the trashbin regularly.

>Will require
>some way to compare mailboxes in search of pattern in missing
>emails... Which may or may not be obvious... which will lead to more
>puzzles... oy maybe I should have stayed muted and let others do the
>job...

Here's one check. See attached screen-cap of cctalk emails. Usually many per
day, but only one per day on the 15th & 16th Nov, none at all on the 17th.
Did the list actually go silent then? It's possible by random ebb and flow,
or maybe everyone was in shock over the awful Paradise fire death toll.
Which may be over 1000, unless a lot of people listed as missing do turn up.

Guy



Re: George Keremedjiev

2018-11-25 Thread Kelly Fergason via cctalk



> On Nov 21, 2018, at 4:46 PM, Bill Gunshannon via cctalk 
>  wrote:
> 
> 
>> On 11/21/18 5:19 PM, Fred Cisin via cctalk wrote:
>> Ed,
>> It is YOUR mail program that is doing the extraneous insertions, and 
>> then not showing them to you when you view your own messages.
>> 
>> ALL of us see either extraneous characters, or extraneous spaces in 
>> everything that you send!
>> I use PINE in a shell account, and they show up as a whole bunch of 
>> inappropriate spaces.
>> 
>> Seriously, YOUR mail program is inserting extraneous stuff.
>> Everybody? but you sees it.
>> 
> 
> I don't.  I didn't see it until someone replied with a
> 
> copy of the offending text included.
> 
> 
> bill
> 
same here.  i didnt see them until some replies included the text.

kelly



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-23 Thread Tomasz Rola via cctalk
On Fri, Nov 23, 2018 at 11:44:23PM +0100, Tomasz Rola wrote:
[...]
> Just my wet phantasies about how such things work or might work. It
> only requires one lousy admin to make it true, or a good one fired and
> never to be heard from again.
> 
> Perhaps asking your ISP could give you some clues. Perhaps this is
> even more horrific (micro black holes? aliens tuning in?) and wetter
> than my wettest dreams.

The huge problem with wet phantasies is that they take over and
distract the dreamer. The first thing I should have asked: is this
problem limited only to mails from cctalk? If yes, then the most
probably culprit would be list's server.

-- 
Regards,
Tomasz Rola

--
** A C programmer asked whether computer had Buddha's nature.  **
** As the answer, master did "rm -rif" on the programmer's home**
** directory. And then the C programmer became enlightened...  **
** **
** Tomasz Rola  mailto:tomasz_r...@bigfoot.com **


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-23 Thread Tomasz Rola via cctalk
On Sat, Nov 24, 2018 at 08:56:09AM +1100, Guy Dunphy wrote:
> Resend, just in case that screen-cap image attachment fails. It is also here:
>   http://everist.org/6F2a/cctalk_rcvd.png
> 
> >Will require
> >some way to compare mailboxes in search of pattern in missing
> >emails... Which may or may not be obvious... which will lead to more
> >puzzles... oy maybe I should have stayed muted and let others do the
> >job...
> 
> Here's one check. See attached screen-cap of cctalk emails. Usually many per
> day, but only one per day on the 15th & 16th Nov, none at all on the 17th.
> Did the list actually go silent then? It's possible by random ebb and flow,
> or maybe everyone was in shock over the awful Paradise fire death toll.
> Which may be over 1000, unless a lot of people listed as missing do turn up.

Ok, here is a c-pasted fragment from my mutt's index view, limited to
messages from cctalk & cctech (which hopefully shows what I
expect). The first column is message number in my mailbox, they are not
consecutive because in between I got messages from other mailing lists
and spammers):

3091 O   Nov 13 Jon Elson via c (  10) Re: Font for DEC indicator panels
3092 Nov 13 systems_glitch  (  60) Re: Looking for optical grid mouse pad
3106 O   Nov 13 Jason Howe via  (  22) Re: Swap clarification (Was: bill was my
3166 O   Nov 14 systems_glitch  (  40) Re: desoldering (was Re: VAX 9440)
3173 O   Nov 14 Bill Degnan via (  48) Re: desoldering (was Re: VAX 9440)
3192 O   Nov 14 Ethan Dicks via (  28) Re: TU58 tape formatter (was Re: rebuildi
3196 O   Nov 14 William Sudbrin (  15) RE: desoldering (was Re: VAX 9440)
3208 O   Nov 14 Eric Smith via  (  17) Re: TU58 tape formatter (was Re: rebuildi
3216 O   Nov 14 allison via cct (  70) Re: TU58 tape formatter (was Re: rebuildi
3227 Nov 14 ED SHARPE via c (   5) The fundamental building block of modern
3229 O   Nov 14 Ethan Dicks via (  17) Re: TU58 tape formatter (was Re: rebuildi
3277 Nov 14 Kevin Bowling v (  10) HP 88780B density 
3388 O   Nov 15 Noel Chiappa vi (  19) Re: Font for DEC indicator panels
3473 O   Nov 16 Andrew Luke Nes (  75) Re: early ANSI C drafts, pre-1989 standar
3816 O   Nov 18 Toby Thain via  (  39) Re: Font for DEC indicator panels
3835 O   Nov 18 Jerome H. Fine  ( 137) Re: RT-11 DY install
3845 O   Nov 18 Michael Brutman (  40) VCF PNW 2019: Exhibitors needed!
3887 O   Nov 19 Patrick Finnega (   6) IBM 3270 Emulation Adapter (ISA)
3889 O   Nov 18 jim stephens vi (  26) Re: IBM 3270 Emulation Adapter (ISA)
3940 O   Nov 19 Jim Brain via c (  10) IND
3944 O   Nov 19 Al Kossow via c (  20) Re: IBM 3270 Emulation Adapter (ISA)
3953 Nov 19 dwight via ccta (   9) What is windoes doing?
3954 Nov 19 Ethan via cctal (  11) Re: What is windoes doing?
3965 Nov 19 geneb via cctal (  27) Re: What is windoes doing?
3989 Nov 19 Bill Degnan via (  40) Re: What is windoes doing?
3997 Nov 19 Alan Perry via  (  25) Removing PVA from a CRT
3999 Nov 19 Peter Coghlan v (  17) Re: What is windoes doing?
4041 Nov 19 Alan Perry via  (  50) Re: Removing PVA from a CRT
4046 O   Nov 19 jim stephens vi (  38) Re: IND
4052 Nov 19 Sean Conner via (  19) IEFBR14 (was Re: IND)
4053 O   Nov 19 Sven Schnelle v (  17) Re: HP-Apollo 9000/425t RAM
4054 O   Nov 19 Dennis Boone vi (  14) Re: IND
4066 Nov 19 dwight via ccta (  25) Re: What is windoes doing?
4071 O   Nov 19 dwight via ccta (  45) Re: What is windoes doing?
4083 O   Nov 19 Al Kossow via c (  12) Battery warning in Falco terminals
4088 O   Nov 19 Al Kossow via c (  16) Re: Battery warning in Falco terminals
4095 O   Nov 19 Eric Smith via  (  15) Re: IEFBR14 (was Re: IND)
4100 Nov 19 Alan Perry via  (  32) Re: Removing PVA from a CRT
4102 Nov 19 Alan Perry via  (  83) Re: Removing PVA from a CRT
4103 O   Nov 19 ben via cctalk  (  19) Re: IEFBR14 (was Re: IND)
4113 O   Nov 19 Douglas Taylor  (  11) Missing FORRTL
4118 O   Nov 19 Jon Elson via c (  10) Re: IND
4122 O   Nov 19 Kevin McQuiggin (  16) Re: IND

A quick comparison by eye, you seem to miss for example msg no 3277
and 4083:

no 3277:

  -- From: Kevin Bowling via cctalk 
  -- To: "General Discussion: On-Topic and Off-Topic Posts" 

  -- Subject: HP 88780B density

I have a dual density 88780B. Is it possible to upgrade to quad
density by
acquiring/swapping boards?

Or does someone have an 800bpi 9-track on SCSI Incan borrow or buy?

I have a pair of 1984 pdp11/70 UNIX SysV (R0, R1?) tapes that need to
be
archived.

Regards,
Kevin
   

and no 4083:

  -- From: Al Kossow via cctalk 
  -- To: "General Discussion: On-Topic and Off-Topic Posts" 

  -- Subject: Battery warning in Falco terminals

I've been helping the MAME guys simulate a TS-2624, which is a block mode HP
emulating terminal.
I had bought this a while ago, and never dumped the firmware. Unfortunately
there is a large
NiCd battery right in the middle of the board that leaked all over. I've taken
some pictures
which are up under falco on 

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-23 Thread Guy Dunphy via cctalk
Resend, just in case that screen-cap image attachment fails. It is also here:
  http://everist.org/6F2a/cctalk_rcvd.png

>Will require
>some way to compare mailboxes in search of pattern in missing
>emails... Which may or may not be obvious... which will lead to more
>puzzles... oy maybe I should have stayed muted and let others do the
>job...

Here's one check. See attached screen-cap of cctalk emails. Usually many per
day, but only one per day on the 15th & 16th Nov, none at all on the 17th.
Did the list actually go silent then? It's possible by random ebb and flow,
or maybe everyone was in shock over the awful Paradise fire death toll.
Which may be over 1000, unless a lot of people listed as missing do turn up.

Guy




Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-23 Thread Guy Dunphy via cctalk
At 06:54 PM 23/11/2018 +0100, you wrote:
>On Fri, Nov 23, 2018 at 11:55:18AM +1100, Guy Dunphy via cctalk wrote:
>[...]
>> 
>> I see them because I'm using an old email client - Eudora 3 (1997.)
>> I stick with this specifically _because_ it doesn't understand UTF-8
>> or any other non-ASCII coding, especially in the header, and hence
>> simply ignores any executables in the headers or email body. Which
>> makes it totally virus proof, unlike Microsoft's intentionally
>
>Totally say totally.

Except it turns out some feel that rejecting UTF-8 is culturally insensitive.
I agree they have a point. But for my practical purposes, all the 'UTF-8 in 
header'
messages that end up in my trash folder are all, always, spam. I do check.
(And now someone's going to start posting cctalk messages with UTF-8 in Subject,
just watch.)


>> open-backdoor junk like Outlook. And most other email 'modern
>> wonders.'  Eudora barely even understands html in emails, and I'm
>> fine with that. Also I have it configured to dust-bin any incomimg
>> mail containing UTF-8 chars in the Subject header. Avoids a lot of
>> time-wasting.
>[...]
>> 
>> But first, I'm having a problem with some portion of cctalk posts
>> going missing, ie I don't receive all messages.  The ratio seems to
>> vary day to day. Sometimes no obvious missing, sometimes a lot.
>> Still don't know why, or how to fix this. Any suggestions?
>
>Turn off trashing mails with Unicode in Subject and see if this solves
>a problem?

Ha, I knew someone would say that. But no, I do check the email trash bin
regularly (before emptying it) and so far no cctalk or cctech emails are
being diverted to there. My filter for them runs before the UTF-filter (last.)
I'm guessing it's an overly picky spam filter somewhere in the network
routes into Australia.

Guy



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-23 Thread Tomasz Rola via cctalk
On Fri, Nov 23, 2018 at 07:01:17PM +0100, Liam Proven wrote:
> On Fri, 23 Nov 2018 at 18:54, Tomasz Rola via cctalk
>  wrote:
> >
> > Turn off trashing mails with Unicode in Subject and see if this solves
> > a problem?
> 
> *Loud laughter in the office*
> 
> Well _played_, sir!

Well, that was low hanging fruit. But if he indeed turns it off and
the problem is not gone, that will be a bit of puzzle. Will require
some way to compare mailboxes in search of pattern in missing
emails... Which may or may not be obvious... which will lead to more
puzzles... oy maybe I should have stayed muted and let others do the
job...

-- 
Regards,
Tomasz Rola

--
** A C programmer asked whether computer had Buddha's nature.  **
** As the answer, master did "rm -rif" on the programmer's home**
** directory. And then the C programmer became enlightened...  **
** **
** Tomasz Rola  mailto:tomasz_r...@bigfoot.com **


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-23 Thread Liam Proven via cctalk
On Fri, 23 Nov 2018 at 18:54, Tomasz Rola via cctalk
 wrote:
>
> Turn off trashing mails with Unicode in Subject and see if this solves
> a problem?

*Loud laughter in the office*

Well _played_, sir!


-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-23 Thread Tomasz Rola via cctalk
On Fri, Nov 23, 2018 at 11:55:18AM +1100, Guy Dunphy via cctalk wrote:
[...]
> 
> I see them because I'm using an old email client - Eudora 3 (1997.)
> I stick with this specifically _because_ it doesn't understand UTF-8
> or any other non-ASCII coding, especially in the header, and hence
> simply ignores any executables in the headers or email body. Which
> makes it totally virus proof, unlike Microsoft's intentionally

Totally say totally.

> open-backdoor junk like Outlook. And most other email 'modern
> wonders.'  Eudora barely even understands html in emails, and I'm
> fine with that. Also I have it configured to dust-bin any incomimg
> mail containing UTF-8 chars in the Subject header. Avoids a lot of
> time-wasting.
[...]
> 
> But first, I'm having a problem with some portion of cctalk posts
> going missing, ie I don't receive all messages.  The ratio seems to
> vary day to day. Sometimes no obvious missing, sometimes a lot.
> Still don't know why, or how to fix this. Any suggestions?

Turn off trashing mails with Unicode in Subject and see if this solves
a problem?

-- 
Regards,
Tomasz Rola

--
** A C programmer asked whether computer had Buddha's nature.  **
** As the answer, master did "rm -rif" on the programmer's home**
** directory. And then the C programmer became enlightened...  **
** **
** Tomasz Rola  mailto:tomasz_r...@bigfoot.com **


Re: George Keremedjiev

2018-11-23 Thread Tomasz Rola via cctalk
On Wed, Nov 21, 2018 at 07:20:25PM -0500, ED SHARPE via cctalk wrote:
> wrong not everybody sees it this is the only list serve problems...
> I suppose modern email programs either do not see or know what to do
> with the characters... please consider using the delete key and not
> reading things frI'm me if it bothers,you

> thanks ed#
> 
> Sent from AOL Mobile Mail

To me, the problem is not with your emails (or anybody else's from
this list), but the slow invasion performed by offending
software. Since you pressed space once, it should be entered as single
space, 0x20 in ASCII. If you pressed space twice, it should be entered
into email written by you as two 0x20 bytes, and this is what should
show on my side. My software receives some extra stuff from you, but
not in a consistent manner, i.e. some ASCII spaces are prepended with
extra two bytes and some not. I was not conscious about it - thought
you had some peculiar space pressing manner or text postprocessor
(like fmt) made double spaces in order to fit your lines into
130-characters width (because your lines were not folded at 79 or
anywhere close).

(In other words, it looks like everybody gets those extra bytes, only
some programs choose to not show them, which - for me - is another
problem and should be examined in due time).

If what you press and what is being sent out to your recipients
differs, then this is a problem, with potential security implications
(as I learn with some horror, just anything in modern computer can
turn against the owner, if he could be called owner at all). A
software that mangles your input is not a friend. It should be
terminated. Just MHO.

-- 
Regards,
Tomasz Rola

--
** A C programmer asked whether computer had Buddha's nature.  **
** As the answer, master did "rm -rif" on the programmer's home**
** directory. And then the C programmer became enlightened...  **
** **
** Tomasz Rola  mailto:tomasz_r...@bigfoot.com **


Re: George Keremedjiev

2018-11-23 Thread Kevin Anderson via cctalk
These ? characters often show up for users like me who read via the e-mailed 
digests.

Kevin Anderson


Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-23 Thread Peter Corlett via cctalk
On Fri, Nov 23, 2018 at 12:12:32PM +0100, Liam Proven via cctalk wrote:
> On Fri, 23 Nov 2018 at 01:55, Guy Dunphy via cctalk 
> wrote:
[...]
>> Also I have it configured to dust-bin any incomimg mail containing UTF-8
>> chars in the Subject header. Avoids a lot of time-wasting.
> That's English-language cultural snobbery. I'm a native Anglophone but I live
> in a non-English speaking country, Czechia.

Worse than that, it's *American* ignorance and cultural snobbery which also
affects various English-speaking countries. The pound sign is not in US-ASCII,
and the euro sign is not in ISO-8859-1, for example.

Amusingly, peering through my inbox in which I have mail in both Dutch and
English, the only one with a UTF-8 subject line is in English. It was probably
composed on a Windows box which "helpfully" turned a hyphen into an en-dash.



Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-23 Thread Liam Proven via cctalk
On Fri, 23 Nov 2018 at 01:55, Guy Dunphy via cctalk
 wrote:

> Also I have it configured to
> dust-bin any incomimg mail containing UTF-8 chars in the Subject header. 
> Avoids a lot of time-wasting.

That's English-language cultural snobbery. I'm a native Anglophone but
I live in a non-English speaking country, Czechia.

For example, right now, I am in my office in Křižíkova. I can't type
that name correctly without Unicode characters, because the ANSI
character set doesn't contain enough letters for Czech. It can cope
with some Western European letters needed for Spanish, French etc.,
but not even enough for the Norwegian letter ``ø''. So I can type the
name of the district of Prague I'm in -- Karlín -- and you'll probably
see that, but the street name, I am guessing not.

"Krizikova" is usually close enough but it's not correct. Those
letters are important. E.g. "sýrové" means cheesy, but "syrové" means
raw. That's a significant difference. It matters to me and I'm not
even Czech and don't speak it particularly well...

So if you tried to mail me something at work -- the address I normally
use, for instance for the Alphasmart Dana Wireless on the way to to me
from Baltimore right now -- and you get a reply saying "package for
[streetname] undeliverable" in the subject -- you'd just reject it.

That's basically discriminating against people who don't speak your
language, and in my book, that's not OK.

> Takeaway: Ed, one space is enough.

Look, we haven't even been able to get him to quote correctly, so I
suspect changing his typing habits is right out!

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053


Re: George Keremedjiev

2018-11-23 Thread Christian Corti via cctalk

On Thu, 22 Nov 2018, Robert Feldman wrote:
BTW, we went through this about 6 months ago. Someone pointed out the 
strange characters in Ed's posts. No change resulted from that, however, 
and I doubt this thread will cause any change.


Yup, Ed is resistant to any form of advice. He could just install a real 
mail client on his mobile phone instead of using the crappy AOL client. 
;-)


Christian


Text encoding Babel. Was Re: George Keremedjiev

2018-11-22 Thread Guy Dunphy via cctalk
At 10:33 PM 21/11/2018 -0500, ED SHARPE wrote:
>if I type an extra space I am sure every one sees it. but the chars not 
>everyone sees them. 
>what I do figure us the older email programs are not accepting of all charter 
>sets? ( dunno if I am using the right term)
>
>Sent from AOL Mobile Mail

Ah ha! Mystery explained. I'm another who sees funny characters where Ed's 
mails contain "c2 a0".
This is the UTF-8 encoding of a 'no-break space' character, which is NOT in the 
original ASCII set.
See https://apps.timwhitlock.info/unicode/inspect/hex/c2/a0

I see them because I'm using an old email client - Eudora 3 (1997.) I stick 
with this specifically
_because_ it doesn't understand UTF-8 or any other non-ASCII coding, especially 
in the header, and
hence simply ignores any executables in the headers or email body. Which makes 
it totally virus proof,
unlike Microsoft's intentionally open-backdoor junk like Outlook. And most 
other email 'modern wonders.'
Eudora barely even understands html in emails, and I'm fine with that. Also I 
have it configured to
dust-bin any incomimg mail containing UTF-8 chars in the Subject header. Avoids 
a lot of time-wasting.

Anyway, I was wondering how Ed's emails (and sometimes others elsewhere) 
acquired that odd corruption.
Answer: Ed's email util (AOL Mobile Mail, and probably various other 'content 
enhanced' email clients)
interpret the user typing space twice in succession, as meaning "I really, 
really want there to be a space
here, no matter what." So it inserts a 'no-break space' unicode character, 
which of course requires a
2-byte UTF-8 encoding. Then adds a plain ASCII space 0x20 just to be sure.

Personally I find it more interesting than annoying. Just another example of 
the gradual chaotic devolution
of ASCII, into a Babel of incompatible encodings. Not that ASCII was all that 
great in the first place.
It's also interesting that even on cctalk, where you'd think everyone would be 
aware of the differences
between ASCII and later 'extensions', low level coding schemes, and the 
desirability of sticking to common
standards, some are not.

Takeaway: Ed, one space is enough. I don't know how you got the idea people 
might miss seeing a
single space, and so you need to type two or more. But it isn't so. The normal 
convention in plain
text is one space character between each word. And since plain ASCII is 
hard-formatted, extra spaces
are NOT ignored and make for wider spacing between words. Which  looksvery  
 odd, even if
your mail utility didn't try to do something 'special' with your unusual user 
input.


Btw, I changed the subject line, because this is a wider topic. I've been 
meaning to start a conversation
about the original evolution of ASCII, and various extensions. Related to a 
side project of mine.

But first, I'm having a problem with some portion of cctalk posts going 
missing, ie I don't receive all messages.
The ratio seems to vary day to day. Sometimes no obvious missing, sometimes a 
lot.
Still don't know why, or how to fix this. Any suggestions?

Guy



>On Wednesday, November 21, 2018 Fred Cisin  wrote:
>Ed,
>It is YOUR mail program that is doing the extraneous insertions, and 
>then not showing them to you when you view your own messages.
>
>ALL of us see either extraneous characters, or extraneous spaces in 
>everything that you send!
>I use PINE in a shell account, and they show up as a whole bunch of 
>inappropriate spaces.
>
>Seriously, YOUR mail program is inserting extraneous stuff.
>Everybody? but you sees it.
>
>> who  knows?   what  mail program  are  you using that   does that?
>It is YOUR mail program that is "doing that"!!
>
>
>On Wed, 21 Nov 2018, ED SHARPE via cctalk wrote:
>
>> who  knows?   what  mail program  are  you using that   does that?
>>
>>
>> In a message dated 11/21/2018 1:25:08 PM US Mountain Standard Time, 
>> cctalk@classiccmp.org writes:
>>
>>  
>> At 02:03 PM 11/21/2018, ED SHARPE via cctalk wrote:
>>
>>> I sold him my extra classic 8 with the plexi covers on it... sn 
>>> 200 series we kept sn #18
>>
>> Side question: What process is turning non-blanking spaces into ISO-8859-1
>> circumflex-A for you?
>>
>> I see 'Â' all throughout your emails.
>>
>> - John
>
>


Re: George Keremedjiev

2018-11-22 Thread Robert Feldman via cctalk
>Message: 10
>Date: Wed, 21 Nov 2018 16:17:27 -0500
>From: ED SHARPE 
>To: jfo...@threedee.com, cctalk@classiccmp.org, cctalk@classiccmp.org
>Subject: Re: George Keremedjiev
>Message-ID: <16738228ce4-1ebf-2...@webjas-vad240.srv.aolmail.net>
>Content-Type: text/plain; charset=utf-8
>
>who? knows?? ?what? mail program? are? you using that? ?does that?
>
>
>In a message dated 11/21/2018 1:25:08 PM US Mountain Standard Time, 
>cctalk@classiccmp.org writes:
>
>?
>At 02:03 PM 11/21/2018, ED SHARPE via cctalk wrote:
>
>>I? sold? him my? extra classic 8? with the plexi covers on it... sn 200? 
>>series? we? kept? sn #18
>
>Side question: What process is turning non-blanking spaces into ISO-8859-1
>circumflex-A for you?
>
>I see '?' all throughout your emails.
>
>- John

I get CCTalk in digest form and see the "?" in Ed's posts. Almost all (but 
strangely not all) of his posts are like that. I might occasionally see a 
strange extra character in someone else's post, but only rarely and then they 
usually are some non-English diacritical mark.

BTW, we went through this about 6 months ago. Someone pointed out the strange 
characters in Ed's posts. No change resulted from that, however, and I doubt 
this thread will cause any change.

Bob


Re: George Keremedjiev

2018-11-22 Thread Mike Stein via cctalk


- Original Message - 
From: "geneb via cctalk" 
To: "ED SHARPE" ; "General Discussion: On-Topic and 
Off-Topic Posts" 
Sent: Thursday, November 22, 2018 11:45 AM
Subject: Re: George Keremedjiev


> On Wed, 21 Nov 2018, ED SHARPE via cctalk wrote:
> 
>> not much adjustments... may be easier if you just bypass my messages?
>>
>> Sent from AOL Mobile Mail
>>
> Maybe it's because many of us don't use a point-and-drool interface that 
> would give the user the chance to skip the message before being forced to 
> read it.
> 
> Look, I get that you've decided that hundreds of people are wrong and it's 
> not your fault.  How about we work on getting you to stop top posting 
> instead? ;)
> 
> g.
> 
> 

And proofreading a bit before pressing 'send'...


Re: George Keremedjiev

2018-11-22 Thread geneb via cctalk

On Wed, 21 Nov 2018, ED SHARPE via cctalk wrote:


not much adjustments... may be easier if you just bypass my messages?

Sent from AOL Mobile Mail

Maybe it's because many of us don't use a point-and-drool interface that 
would give the user the chance to skip the message before being forced to 
read it.


Look, I get that you've decided that hundreds of people are wrong and it's 
not your fault.  How about we work on getting you to stop top posting 
instead? ;)


g.

--
Proud owner of F-15C 80-0007
http://www.f15sim.com - The only one of its kind.
http://www.diy-cockpits.org/coll - Go Collimated or Go Home.
Some people collect things for a hobby.  Geeks collect hobbies.

ScarletDME - The red hot Data Management Environment
A Multi-Value database for the masses, not the classes.
http://scarlet.deltasoft.com - Get it _today_!


Re: George Keremedjiev

2018-11-22 Thread geneb via cctalk

On Wed, 21 Nov 2018, ED SHARPE via cctalk wrote:


who  knows?   what  mail program  are  you using that   does that?


In a message dated 11/21/2018 1:25:08 PM US Mountain Standard Time, 
cctalk@classiccmp.org writes:

 
At 02:03 PM 11/21/2018, ED SHARPE via cctalk wrote:


I sold him my extra classic 8 with the plexi covers on it... sn 200 
series we kept sn #18


Side question: What process is turning non-blanking spaces into ISO-8859-1
circumflex-A for you?

I see 'Â' all throughout your emails.

It's not his email client that's the problem, it's yours.  It constantly 
inserts weird characters between words.  I see the same problem in Alpine, 
and I've never seen the issue from any other sender.


g.

--
Proud owner of F-15C 80-0007
http://www.f15sim.com - The only one of its kind.
http://www.diy-cockpits.org/coll - Go Collimated or Go Home.
Some people collect things for a hobby.  Geeks collect hobbies.

ScarletDME - The red hot Data Management Environment
A Multi-Value database for the masses, not the classes.
http://scarlet.deltasoft.com - Get it _today_!


Re: George Keremedjiev

2018-11-21 Thread ED SHARPE via cctalk
not much adjustments... may be easier if you just bypass my messages?

Sent from AOL Mobile Mail

On Wednesday, November 21, 2018 Fred Cisin  wrote:
On Wed, 21 Nov 2018, ED SHARPE via cctalk wrote:
> wrong not everybody sees it this is the only list serve problems... I 
> suppose modern email programs either do not see or know what to do with 
> the characters... please consider using the delete key and not reading 
> things frI'm me if it bothers,you
> thanks ed#

That is a very good hypothesis.
"Modern" (bordering on profanity in this list) email programs might insert 
characters that we are not intended to notice in support of "features" 
(also bordering on profanity). When they encounter those special 
characters, they know to activate that "feature", and suppress their 
display.
But email progams from "LAST MONTH" (prior to the "10 year rule"?) do NOT 
recognize, respect, nor understand those "modern" "control" characters.
("Modern" companies, such as Microsoft, Apple, AOL, etc. deprecate the 
use of any software or hardware that is not "current")
Email seems to be being handled like word processor file formats - what 
happens when you try to load a document from a current version prograam 
into a copy of a previous version of the program?

You would never know there was an issue if everybody that you associate is 
using the same current programs.


Q: is line wrap ON or OFF in the program?
Q: is "format: flowed" ON or OFF?
Either/both might insert "non-breaking spaces".

These do not seem to be adequately documented in this context -
(differentiation between "bug" and "feature").




Re: George Keremedjiev

2018-11-21 Thread ED SHARPE via cctalk
if I type an extra space I am sure every one sees it. but the chars not 
everyone sees them. 
what I do figure us the older email programs are not accepting of all charter 
sets? ( dunno if I am using the right term)

Sent from AOL Mobile Mail

On Wednesday, November 21, 2018 Fred Cisin  wrote:
Ed,
It is YOUR mail program that is doing the extraneous insertions, and 
then not showing them to you when you view your own messages.

ALL of us see either extraneous characters, or extraneous spaces in 
everything that you send!
I use PINE in a shell account, and they show up as a whole bunch of 
inappropriate spaces.

Seriously, YOUR mail program is inserting extraneous stuff.
Everybody? but you sees it.

> who  knows?   what  mail program  are  you using that   does that?
It is YOUR mail program that is "doing that"!!


On Wed, 21 Nov 2018, ED SHARPE via cctalk wrote:

> who  knows?   what  mail program  are  you using that   does that?
>
>
> In a message dated 11/21/2018 1:25:08 PM US Mountain Standard Time, 
> cctalk@classiccmp.org writes:
>
>  
> At 02:03 PM 11/21/2018, ED SHARPE via cctalk wrote:
>
>> I sold him my extra classic 8 with the plexi covers on it... sn 200 
>> series we kept sn #18
>
> Side question: What process is turning non-blanking spaces into ISO-8859-1
> circumflex-A for you?
>
> I see 'Â' all throughout your emails.
>
> - John



Re: George Keremedjiev

2018-11-21 Thread ED SHARPE via cctalk
some blank spaces whereas us 2 instead of one is some times bad mr. hand

Sent from AOL Mobile Mail

On Wednesday, November 21, 2018 Fred Cisin  wrote:
Ed,
It is YOUR mail program that is doing the extraneous insertions, and 
then not showing them to you when you view your own messages.

ALL of us see either extraneous characters, or extraneous spaces in 
everything that you send!
I use PINE in a shell account, and they show up as a whole bunch of 
inappropriate spaces.

Seriously, YOUR mail program is inserting extraneous stuff.
Everybody? but you sees it.

> who  knows?   what  mail program  are  you using that   does that?
It is YOUR mail program that is "doing that"!!


On Wed, 21 Nov 2018, ED SHARPE via cctalk wrote:

> who  knows?   what  mail program  are  you using that   does that?
>
>
> In a message dated 11/21/2018 1:25:08 PM US Mountain Standard Time, 
> cctalk@classiccmp.org writes:
>
>  
> At 02:03 PM 11/21/2018, ED SHARPE via cctalk wrote:
>
>> I sold him my extra classic 8 with the plexi covers on it... sn 200 
>> series we kept sn #18
>
> Side question: What process is turning non-blanking spaces into ISO-8859-1
> circumflex-A for you?
>
> I see 'Â' all throughout your emails.
>
> - John



Re: George Keremedjiev

2018-11-21 Thread Fred Cisin via cctalk

On Wed, 21 Nov 2018, ED SHARPE via cctalk wrote:
wrong not everybody sees it this is the only list serve problems...  I 
suppose modern email programs either do not see or know what to do with 
the characters... please consider using the delete key and not reading 
things frI'm me if it bothers,you

thanks ed#


That is a very good hypothesis.
"Modern" (bordering on profanity in this list) email programs might insert 
characters that we are not intended to notice in support of "features" 
(also bordering on profanity).  When they encounter those special 
characters, they know to activate that "feature", and suppress their 
display.
But email progams from "LAST MONTH" (prior to the "10 year rule"?) do NOT 
recognize, respect, nor understand those "modern" "control" characters.
("Modern" companies, such as Microsoft, Apple, AOL, etc. deprecate the 
use of any software or hardware that is not "current")
Email seems to be being handled like word processor file formats - what 
happens when you try to load a document from a current version prograam 
into a copy of a previous version of the program?


You would never know there was an issue if everybody that you associate is 
using the same current programs.



Q: is line wrap ON or OFF in the program?
Q: is "format: flowed" ON or OFF?
Either/both might insert "non-breaking spaces".

These do not seem to be adequately documented in this context -
(differentiation between "bug" and "feature").




  1   2   >