subject:"George Keremedjiev"

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-12-05 Thread Maciej W. Rozycki via cctalk

On Tue, 4 Dec 2018, Liam Proven wrote:

> >  I don't know if the unreal mode has been retained in the x86 architecture
> > to this day; as I noted above it was not officially supported.  But then
> > some originally undocumented x86 features, such as the second byte of AAD
> > and AAM instructions actually being an immediate argument that could have
> > a value different from 10, have become standardised at one point.
> 
> I know, and was surprised that, v86 mode isn't supported in x86-64.

 In the native long mode, that is.  If you run the CPU 32-bit, then VM86 
works.  I guess AMD didn't want to burden the architecture in case pure 
64-bit parts were made in the future.

> This caused major problems for the developers of DOSEMU.

 And also for expansion-BIOS emulation, especially with graphics adapters 
(which, accompanied by scarce to inexistent hardware documentation, made 
mode switching even trickier in Linux than it already was).  It looks like 
fully-software machine code interpretation like with QEMU is the only way 
remaining for x86-64.

  Maciej

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-12-04 Thread Liam Proven via cctalk

On Sat, 1 Dec 2018 at 02:00, Maciej W. Rozycki  wrote:

>  Be assured there were enough IBM PC clones running DOS around from 1989
> onwards for this stuff to matter,

OK, fair enough. Thanks for the info!

> and hardly anyone switched to MS Windows
> before version 95 (running Windows 3.0 with the ubiquitous HGC-compatible
> graphics adapters was sort of fun anyway, and I am not sure if Windows 3.1
> even supported it; maybe with extra drivers).

It did. Demo:

https://www.youtube.com/watch?v=0lOGPQQlxT8

Screenshot:

http://nerdlypleasures.blogspot.com/2016/12/windows-30-multimedia-edition-early.html

The difficult bit was Windows 3.0 on an 8088/8086 with VGA, I believe.
The VGA driver contained 80286 instructions because MS didn't imagine
anyone would want Win3 on such old PCs.

(This again shows that MS didn't believe Win3 would be such a big hit,
giving the lie to all the pro-OS/2 anti-MS conspiracy theories...

https://virtuallyfun.com/wordpress/2011/06/01/windows-3-0/
)

To run Win3 on an 8086 in VGA mode, you had to replace the CPU with an
NEC V20 or V30, as I heard it and faintly recall...

The driver did later get patched to work:

http://www.vcfed.org/forum/showthread.php?35593-Windows-3-0-VGA-color-driver-for-8088-XT

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-12-04 Thread Liam Proven via cctalk

On Tue, 4 Dec 2018 at 15:02, Maciej W. Rozycki via cctalk
 wrote:

>  I don't know if the unreal mode has been retained in the x86 architecture
> to this day; as I noted above it was not officially supported.  But then
> some originally undocumented x86 features, such as the second byte of AAD
> and AAM instructions actually being an immediate argument that could have
> a value different from 10, have become standardised at one point.

I know, and was surprised that, v86 mode isn't supported in x86-64.

This caused major problems for the developers of DOSEMU.


-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-12-04 Thread Maciej W. Rozycki via cctalk

On Fri, 30 Nov 2018, Fred Cisin via cctalk wrote:

> > Well, ATA drives at that time should have already had the capability to
> > remap bad blocks or whole tracks transparently in the firmware, although
> 
> Not even IDE.
> Seagate ST4096  (ST506/412 MFM)  80MB formatted, which was still considered
> good size by those of us who weren't wealthy.

 Sure!  You did need a bad block list for such a drive though.

> > Of course the ability to remap bad storage areas transparently is not an
> > excuse for the OS not to handle them gracefully, it was not that time yet
> > back then when a hard drive with a bad block or a dozen was considered
> > broken like it usually is nowadays.
> 
> Yes, they still came with list of known bad blocks.  Usually taped to the
> drive.  THIS one wasn't on the manufacturer's list, and neither SpeedStor nor
> SpinRite could find it!
> There were other ways to lock out a block besides filling it with a garbage
> file, but that was easiest.

 IIRC for MS-DOS the canonical way was to mark the containing cluster as 
bad using a special code in the FAT.  Both `format' and `chkdsk' were able 
to do that, as were some third-party tools.  That ensured that disk 
maintenance tools, such as `defrag', didn't reuse the cluster for 
something else as it could happen with a real file assignment of such a 
cluster.

> And, I did try to tell the Microsoft people that the OS "should recover
> gracefully from hardware errors".  In those words.

 I found switching to Linux a reasonable solution to this kind of customer 
service attitude.  There you can fix an issue yourself or if you don't 
feel like, then you can hire someone to do it for you (or often just ask 
kindly, as engineers usually feel responsible for code they have 
committed, including any bugs). :)

> > Did 3.1 support running in the real mode though (as opposed to switching
> > to the real mode for DOS tasks only)?  I honestly do not remember anymore,
> > and ISTR it was removed at one point.  I am sure 3.0 did.
> 
> I believe that it did.  I don't remember WHAT the program didn't like about
> 3.1, or if there were a real reason, not just an arbitrary limit.
> I don't think that the Cordata's refusal to run on 286 was based on a real
> reason.
> 
> But, the Win 3.1 installation program(s) balked at anything without A20 and a
> tiny bit of RAM above 10h I didn't have a problem with having a few
> dedicated machines (an XT with Cordata interface, an AT with Eiconscript card
> for postscript and HP PCL, an AT Win 3.0 for the font editor, a machine for
> disk duplication (no-notch disks), order entry, accounting, and lots of
> machines with lots of different floppy drive types.)  I also tested every
> release of my programs on many variants of the platform (after I discovered
> the hard way that 286 had a longer pre-fetch buffer than 8088!)

 Hmm, interesting.  I never tried any version of MS Windows on a PC/XT 
class machine and the least equipped 80286-based system I've used had at 
least 1MiB of RAM and a chipset clever enough to remap a part of it above 
1MiB.  And then that was made available via HIMEM.SYS.

 What might be unknown to some is that apart from toggling the A20 mask 
gate HIMEM.SYS also switched on the so-called "unreal mode" on processors 
that supported it.  These were at least the 80486 and possibly the 80386 
as well (but my memory has faded about it at this point), and certainly 
not the 80286 as it didn't support segment sizes beyond 64kiB.  This mode 
gave access to the whole 4GiB 32-bit address space to real mode programs, 
by setting data segment limits (sizes) to 4GiB.

 This was possible by programming segment descriptors in the protected 
mode and then switching back to the real mode without resetting the limits 
to the usual 64kiB value beforehand.  This worked because unlike in the 
protected mode segment register writes made in the real mode only updated 
the segment base and not the limit stored in the corresponding descriptor.  
IIRC it was not possible for the code segment to use a 4GiB limit in the 
real mode as it would malfunction (i.e. it would not work as per real mode 
expectations), so it was left at 64kiB.

 According to Intel documentation software was required to reset segment 
sizes to 64kiB before switching back to the real mode, so this was not an 
officially supported mode of operation.  MS Windows may or may not have 
made use of this feature in its real mode of operation; I am not sure, 
although I do believe HIMEM.SYS itself did use it (or otherwise why would 
it set it in the first place?).

 I discovered it by accident in early 1990s while experimenting with some 
assembly programming (possibly by trying to read from beyond the end of a 
segment by using an address size override prefix, a word or a doubleword 
data quantity and an offset of 0x and not seeing a trap or suchlike) 
and could not explain where this phenomenon came from as it contradicted 
the x86 processor manual I

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-12-01 Thread Jim Manley via cctalk

On Fri, Nov 30, 2018 at 3:28 PM Grant Taylor via cctalk <
cctalk@classiccmp.org> wrote:

> On 11/30/2018 02:33 PM, Jim Manley via cctalk wrote:
> > There's enough slack in the approved offerings that electives can be
> > weighted more toward the technical direction (e.g., user interface and
> > experience) or the arts direction (e.g., psychology and history).  The
> idea
> > was to close the severely-growing gap between those who know everything
> > about computing and those who need to know enough, but not everything, to
> > be truly effective in the information-dominant world we've been careening
> > toward without nearly enough preparation of future generations.
>
> I kept thinking to myself that many of the people that are considered
> pioneers in computers were actually something else by trade and learned
> how to use computers and / or created what they needed for the computer
> to be able to do their primary job.
> --
> Grant. . . .
> unix || die
>

Most people know that Newton's motivation for developing calculus was
explaining the motions of the planets, but not many know that he served as
the Warden, and then Master, of the Royal Mint, as well as being fascinated
with optics and vision (to the point where he inserted a needle into one of
his eyes!) and a closet alchemist.  His competitor, Leibniz, was motivated
to develop calculus by a strong desire to win more billiards bets from his
fellow wealthy buddies in Hanover, the financial capital of Germany at the
time, while developing the mathematics of the physics governing the
collisions of billiard balls.  Babbage was motivated to develop calculating
and computing machines to eliminate the worldwide average of seven errors
per page in astronomical, navigational, and mathematical tables of the
1820s.

Shannon and Hamming (with whom I worked - the latter, not the former!) were
motivated to represent Boolean logic in digital circuits and improve
long-distance communications by formalizing how to predictably ferret more
signal out of noise.  Turing was motivated to test his computing theories
to break the Nazi Enigma ciphers (character-oriented, vs. word-oriented
codes) and moved far beyond the mathematical underpinnings of his theories
into the engineering of Colossus and the bombes.  Hollerith was motivated
by the requirement to complete the decennial census tabulations within 10
years (the 1890 census was going to take 13 years to tabulate using
traditional manual methods within the available budget).  Mauchly and
Eckert were motivated to automate calculations for ballistics tables for
WW-II weapons systems that were being fielded faster than tables could be
produced manually.

Hopper developed the first compiler and the first programming language to
use English words, Flow-Matic, that led, in turn, to COBOL being created to
meet financial software needs.  John Backus and the other developers of
FORTRAN were likewise motivated by scientific and engineering calculation
requirements.  Kernigan, Ritchie, and Thompson were motivated by a desire
to perform an immense prank, in the form of Unix and A/B/BCPL/C, on an
unsuspecting and all-too-serious professional computing world (
http://www.stokely.com/lighter.side/unix.prank.html).  Gates and Allen were
motivated by all of the money lying around on desks, in their drawers, and
in the drawers worn by the people sitting at said desks, to foist PC/MS-DOS
and Windows on the less serious computing public.  Kildall was motivated by
the challenges of developing multi-pass compilation on systems with minimal
microcomputer hardware resources.

Meanwhile, the rest of the computing field was motivated to pursue the next
shinier pieces of higher-performance hardware, developing ever-more-bloated
programming languages, OSes, services, and applications that continue to
slow down even the latest-and-greatest systems.  Berners-Lee was motivated
to help scientists and engineers at the European Organization for Nuclear
Research (CERN - the Conseil Européen pour la Recherche Nucléaire) organize
and share their work without having to become expert software developers in
their own right.  Yang, Filo, Brin, Page, Zuckerberg, et al, were motivated
by whatever money could be scrounged from sofas used by couch-surfing,
homeless Millenials (redundant syntax fully intended), and from local news
outlets' advertising accounts.  Selling everyone's, but their own,
personally-identifiable information, probably including that of their own
mothers, has been a welcome additional cornucopia of revenue to them.

Computer science and engineering degrees weren't even offered yet when I
attended the heavily science and engineering oriented naval institution
where I earned my BS in engineering (70% of degrees awarded were in STEM
fields).  The closest you could get were math and electrical engineering
degrees, taking the very few electives offered in CS and CE disciplines.
Granted, the computer I primarily had access to was a secondhand GE-265

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Fred Cisin via cctalk


I found the bad spot and put a SECTORS.BAD file there, and then was OK.


On Sat, 1 Dec 2018, Maciej W. Rozycki wrote:

Well, ATA drives at that time should have already had the capability to
remap bad blocks or whole tracks transparently in the firmware, although


Not even IDE.
Seagate ST4096  (ST506/412 MFM)  80MB formatted, which was still 
considered good size by those of us who weren't wealthy.




Of course the ability to remap bad storage areas transparently is not an
excuse for the OS not to handle them gracefully, it was not that time yet
back then when a hard drive with a bad block or a dozen was considered
broken like it usually is nowadays.


Yes, they still came with list of known bad blocks.  Usually taped to the 
drive.  THIS one wasn't on the manufacturer's list, and neither SpeedStor 
nor SpinRite could find it!
There were other ways to lock out a block besides filling it with a 
garbage file, but that was easiest.


And, I did try to tell the Microsoft people that the OS "should recover 
gracefully from hardware errors".  In those words.




I had a font editor that wouldn't tolerate 3.1, and quite a few XTs (no A20),
so I continued to keep Win 3.0 on a bunch of machines.

Did 3.1 support running in the real mode though (as opposed to switching
to the real mode for DOS tasks only)?  I honestly do not remember anymore,
and ISTR it was removed at one point.  I am sure 3.0 did.


I believe that it did.  I don't remember WHAT the program didn't like 
about 3.1, or if there were a real reason, not just an arbitrary limit.
I don't think that the Cordata's refusal to run on 286 was based on a real 
reason.


But, the Win 3.1 installation program(s) balked at anything without A20 
and a tiny bit of RAM above 10h I didn't have a problem with having a 
few dedicated machines (an XT with Cordata interface, an AT with 
Eiconscript card for postscript and HP PCL, an AT Win 3.0 for the font 
editor, a machine for disk duplication (no-notch disks), order entry, 
accounting, and lots of machines with lots of different floppy drive 
types.)  I also tested every release of my programs on many variants of 
the platform (after I discovered the hard way that 286 had a longer 
pre-fetch buffer than 8088!)

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Maciej W. Rozycki via cctalk

On Fri, 30 Nov 2018, Fred Cisin via cctalk wrote:

> I found the bad spot and put a SECTORS.BAD file there, and then was OK.
> The Microsoft Beta program wanted cheerleaders, and ABSOLUTELY didn't want any
> negative feedback nor bug reports, and insisted that the OS had no
> responsibility to recover from nor survive hardware problems, and that
> therefore it was not their problem.  I told them that they would soon have to
> do a recall (THAT was EXACTLY what happened with DOS 6.2x).  They did not
> invite me to participate in any more Betas.

 Well, ATA drives at that time should have already had the capability to 
remap bad blocks or whole tracks transparently in the firmware, although 
obviously it took some time for the industry to notice that and catch up 
with support for the relevant protocol requests in the software tools.  
It took many years after all for PC BIOS vendors to notice that ATA drives 
generally do report their C/H/S geometry supported (be it real or 
simulated; I only ever came across one early ATA HDD whose C/H/S geometry 
was real, all the rest were ZBR), so there is no need for the user to 
enter it manually for a hard drive to work.

 Of course the ability to remap bad storage areas transparently is not an 
excuse for the OS not to handle them gracefully, it was not that time yet 
back then when a hard drive with a bad block or a dozen was considered 
broken like it usually is nowadays.

> I had a font editor that wouldn't tolerate 3.1, and quite a few XTs (no A20),
> so I continued to keep Win 3.0 on a bunch of machines.

 Did 3.1 support running in the real mode though (as opposed to switching 
to the real mode for DOS tasks only)?  I honestly do not remember anymore, 
and ISTR it was removed at one point.  I am sure 3.0 did.

  Maciej

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Fred Cisin via cctalk


On Sat, 1 Dec 2018, Maciej W. Rozycki via cctalk wrote:

Be assured there were enough IBM PC clones running DOS around from 1989
onwards for this stuff to matter, and hardly anyone switched to MS Windows
before version 95 (running Windows 3.0 with the ubiquitous HGC-compatible
graphics adapters was sort of fun anyway, and I am not sure if Windows 3.1
even supported it; maybe with extra drivers).


Depending on which question you are asking, . . .
Windows 3.1 definitely did support Hercules video.  We had about 3 dozen 
such machines (386SX) in the school student homework lab.
It also supported CGA, but initially didn't come with the driver, so it 
would work if you upgraded from 3.0 to 3.1, or otherwise used the 3.0 CGA 
driver.


In August 1991, I went to a Microsoft conference in Seattle.  Although it 
was the anniversary of the 5150, Bill Gates was making appearances on the 
east coast, instead of being there.


They asked our opinion of the NEW flying ["dry rot" disintegrating] window 
logo, and couldn't believe it that we did NOT love it.


I found out about, and got a copy of, a CD-ROM "International" Windows 
3.0, with many languages, including Chinese!  I loved being able to 
install from CD, instead of boxes of floppies, and was glad that they were 
at least trying to expand to the rest of the world.


They introduced Windows 3.1.  But the borrowed Toshiba laptop that I had 
with me had 1MB of contiguous RAM, but not A20 support, and 3.1 "NEEDED" 
64K above 1MB for HIMEM.SYS, which "SOLVES the problem of not enough RAM".
3.1 also was the first product to force SMARTDRV.SYS.  As soon as I got 
home, I contacted the Win3.1 Beta program to tell them that write-cacheing 
without a way to turn it off was a BIG problem.  There was a bad spot on 
the hard drive that I was installing it to that neither Spinrite nor 
SpeedStor could find, but it consistently crashed the 3.1 installation. 
But, with the forced write cacheing, there was NO possible way to recover. 
(Without write-cacheing, you just rename the file that failed, and 
manually install another copy of that one file)

I found the bad spot and put a SECTORS.BAD file there, and then was OK.
The Microsoft Beta program wanted cheerleaders, and ABSOLUTELY didn't want 
any negative feedback nor bug reports, and insisted that the OS had no 
responsibility to recover from nor survive hardware problems, and that 
therefore it was not their problem.  I told them that they would soon 
have to do a recall (THAT was EXACTLY what happened with DOS 6.2x).  They 
did not invite me to participate in any more Betas.


I had a font editor that wouldn't tolerate 3.1, and quite a few XTs (no 
A20),  so I continued to keep Win 3.0 on a bunch of machines.

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Maciej W. Rozycki via cctalk

On Sun, 25 Nov 2018, Liam Proven via cctalk wrote:

> > > For example, right now, I am in my office in Křižíkova. I can't
> > > type that name correctly without Unicode characters, because the ANSI
> > > character set doesn't contain enough letters for Czech.
> >
> > Intriguing.  Is there an old MS-DOS Code Page (or comparable technique)
> > that does encompass the necessary characters?
> 
> Don't know. But I suspect there weren't many PCs here before the
> Velvet Revolution in 1989. Democracy came around the time of Windows
> 3.0 so there may not have been much of a commerical drive.

 Be assured there were enough IBM PC clones running DOS around from 1989 
onwards for this stuff to matter, and hardly anyone switched to MS Windows 
before version 95 (running Windows 3.0 with the ubiquitous HGC-compatible 
graphics adapters was sort of fun anyway, and I am not sure if Windows 3.1 
even supported it; maybe with extra drivers).

 Anyway MS-DOS 5.0 onwards had a complete set of code pages for various 
regions of the world.  For Czechia, Hungary, Lithuania, Poland, and other 
European countries located towards the east and using a language with a 
latin transcription code page 852 was provided.  For France, Germany, 
Spain, Nordic countries, etc. page 850 was provided.  There were other 
pages included as well, beyond the IBM's original page 437, including 
Greek and Cyrillic, but I don't know the details.  It's quite likely 
Wikipedia has them.

 Of course the HGC didn't support text mode character switching, however 
ISA VGA clones started trickling in at one point too.  I still have my ISA
Trident TVGA 8900C adapter from 1993 working in one of my machines, though 
I have since switched to Linux.

 NB my last name is also correctly spelled Różycki rather than Rozycki, 
and the two letters with the diacritics are completely different from and 
have sounds associated that bear no resemblance to the corresponding ones 
without, i.e. these are not merely accents, which we don't have in Polish 
at all (Polish complicates this further in that the sound of `ó' is the 
same as the sound of `u' and the sound of `ż' is the same as the sound of 
`rz' (which is BTW different from where the two letters are written 
separately), however the alternatives are not interchangeable and are 
either invalid or change the meaning of a word, and many native Polish 
speakers get them wrong anyway).

 FWIW,

  Maciej

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Grant Taylor via cctalk


On 11/30/2018 03:57 PM, Sean Conner via cctalk wrote:
There are several problems with this.  One, how many bits do you set 
aside per character?  8?  16?  There are potentially an open ended set 
of stylings that one might use.


I acknowledge that the idea I shared was incomplete and likely has 
shortcomings.  But I do think that it demonstrates a concept, which is 
what I was after.


Second problem---where do you store such bits?  Not to imply this is a 
bad idea, just that there are issues that need to be resolved with how 
things are done today (how does this interact with UTF-8 for instance? 
Or UCS-4?).


Ideally, I'd like to see UTF-8 / UTF-16 code points (?) for the 
different styles of a letter.  Not every letter (character ~> byte / 
double) needs the styling.  So I suspect that it would be better to 
judiciously place code points in the UTF-8 / UTF-16 space.


Sadly, when I try to search for "this", the letters aren't found in 
"푡ℎ푖푠 푖푠 푎 푠푡푟푖푛푔" or "혁헵헶혀 헶혀 헮 헰헼헺헺헲헻혁". 
Something that I think should work.


Also, storage of these letters can work just like it is in this email.  ;-)



--
Grant. . . .
unix || die

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Sean Conner via cctalk

It was thus said that the Great Keelan Lightfoot via cctalk once stated:
> > I see no reason that we can't have new control codes to convey new
> > concepts if they are needed.
> 
> I disagree with this; from a usability standpoint, control codes are
> problematic. Either the user needs to memorize them, or software needs
> to inject them at the appropriate times. There's technical problems
> too; when it comes to playing back a stream of characters, control
> characters mean that it is impossible to just start listening. It is
> difficult to fast forward and rewind in a file, because the only way
> to determine the current state is to replay the file up to that point.

  [ and further down the message ... ]

> I'm going to lavish on the unicode for this example, so those of you
> properly unequipped may not see this example:
> 
> foo := 푡ℎ푖푠 푖푠 푎 푠푡푟푖푛푔 혁헵헶혀 헶혀 헮 헰헼헺헺헲헻혁
> printf(푡ℎ푒 푠푡푟푖푛푔 푖푠 ① 푖푠푛푡 푡ℎ푎푡 푒푥푐푖푡푖푛푔, foo)
> if 혁헵헶혀 헶혀 헮 헽헼헼헿헹혆 헽헹헮헰헲헱 헰헼헺헺헲헻혁 foo ==
> 푡ℎ푖푠 푖푠 푎푙푠표 푎 푠푡푟푖푛푔, 푏푢푡 푛표푡 푡ℎ푒 푠푎푚푒
> 표푛푒 { 혁헵헶혀 헶혀 헮헹혀헼 헮 헰헼헺헺헲헻혁
> ...
> 
> An atrocious example, but a good demonstration of my point. If I had a
> toggle switch on my keyboard to switch between code, comment and
> string, it would have been much simpler to construct too!

  Somehow, the compiler will have to know that "푡ℎ푖푠 푖푠 푎 푠푡푟푖푛푔" is a
string while "혁헵헶혀 헶혀 헮 헰헼헺헺헲헻혁" is a comment to be ignored.  You lamented
the lack of a toggle switch for the two, but existing langauges, like C,
already have them, '"' is the "toggle" for strings, while '/*' and '*/' are
the toggles for comment (and now '//' if you are using C99).  It's still
something you have to "type" (or "toggle" or "switch" or somehow indicate
the mode).

  The other issue is now such inforamtion is stored, and there, I only see
two solutions---in-band and out-of-band.  In-band would be included with the
text.  Something along the lines of (where  is the ASCII ESC character
27, and this is an example only):

foo := _this is a string\ ^this is a comment\
printf(_the string is [1p isn't that exciting\,foo)

  But this has a problem you noted above---it's a lot harder to seek through
the file to arbitrary positions.  Grant Taylor stated another way of doing
this:

> What if there were (functionally) additional bits that indicated various
> other (what I was calling) stylings?
> 
> I think that something along those lines could help avoid a concern I
> have.  Namely how do search for an A, what ever ""style it's in.  I
> think I could hypothetically search for bytes ~> words (characters)
> containing ( ) () 01x1 (assuming that the
> proceeding don't cares are set appropriately) and find any format of A,
> upper case, lower case, bold, italic, underline, strike through, etc.

  There are several problems with this.  One, how many bits do you set aside
per character?  8?  16?  There are potentially an open ended set of stylings
that one might use.  Second problem---where do you store such bits?  Not to
imply this is a bad idea, just that there are issues that need to be
resolved with how things are done today (how does this interact with UTF-8
for instance?  Or UCS-4?).

Then there's out-of-band storage, which stores such information outside the
text (an example---I'm not saying this is the only way to store such
information out-of-band):

foo := this is a string this is a comment
printf(the string is 1 isn't that exciting,foo)

---

string 8-23
string 50-63
string 65-84
replacement 64
comment 25-41

  This has its own problems---namely, how to you keep the two together.  It
will either be a separate file, which could get separated, or part of the
text file but then you run into the problem of reading Microsoft Word files
cira 1986 with today's tools.  

  -spc (I like the ideas, but the implementations are harder than it first
appears ... )

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Grant Taylor via cctalk


On 11/30/2018 02:33 PM, Jim Manley via cctalk wrote:

There's enough slack in the approved offerings that electives can be
weighted more toward the technical direction (e.g., user interface and
experience) or the arts direction (e.g., psychology and history).  The idea
was to close the severely-growing gap between those who know everything
about computing and those who need to know enough, but not everything, to
be truly effective in the information-dominant world we've been careening
toward without nearly enough preparation of future generations.


I kept thinking to myself that many of the people that are considered 
pioneers in computers were actually something else by trade and learned 
how to use computers and / or created what they needed for the computer 
to be able to do their primary job.




--
Grant. . . .
unix || die

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Grant Taylor via cctalk


On 11/30/2018 11:34 AM, Keelan Lightfoot via cctalk wrote:

Thanks!


:-)

Both. In the beginning we were content, because the keyboard was well 
suited to the capabilities of the technology available at the time it was 
invented. We didn't see a better way, because when compared to using a pen 
and paper (for writing) or using toggle switches (to control a computer), 
a keyboard was a significant improvement. It's the the explosive growth 
and universal adoption of computers that has locked us in to the keyboard 
as the standard.


*sigh*  The Steve G. from Security Now's comments about passwords not 
going away comes to mind and seems apropos for keyboards.


There are other, likely better, things out there.  But keyboards 
themselves aren't going to go away.


I disagree with this; from a usability standpoint, control codes are 
problematic. Either the user needs to memorize them, or software needs 
to inject them at the appropriate times.


Okay.

There's technical problems too; when it comes to playing back a stream 
of characters, control characters mean that it is impossible to just 
start listening. It is difficult to fast forward and rewind in a file, 
because the only way to determine the current state is to replay the 
file up to that point.


Now I'm wondering about something akin to the differences in upper case 
and lower case.  Functionally the same code, just a different value in 
the 6th bit.


What if there were (functionally) additional bits that indicated various 
other (what I was calling) stylings?


I think that something along those lines could help avoid a concern I 
have.  Namely how do search for an A, what ever ""style it's in.  I 
think I could hypothetically search for bytes ~> words (characters) 
containing ( ) () 01x1 (assuming that the 
proceeding don't cares are set appropriately) and find any format of A, 
upper case, lower case, bold, italic, underline, strike through, etc.


The other think that the additional bit / flags could do is allow the 
bytes (words / characters) to be read mid-stream.


Do you mean modal control codes? As in "everything after here is bold" 
and "the bold stops here"?


Yes.  That's what I was thinking when I wrote that.

We've gone backwards sadly. For a brief while, this kind of rich user 
interface stuff was provided by the OS. A text box, regardless of 
the application, would use the OS's text box control, and would have 
a universal interface for rich text.


Indeed.

But the growth of the web has resulted in an atavism. We're back to plain 
text, and using markup to style our text.


I mostly agree.  But I do wonder how true that actually is, at least on 
a technical level.  I think the text input box can be enhanced to allow 
more than just plain text.


If I want bold text in Slack, I have to use markup.  Facebook Messages and 
YouTube comments also support markup, but the syntax is slightly different 
between them.


*sigh*

Back in 1991, If I wanted bold text in any application that supported 
rich text on my SE/30, I hit command-B and I got bold text. Sure, there 
are Javascript rich text editors that can be bolted on, but they all 
have their own UI concepts, and they're all a trainwreck.


I believe that we can do better.

In addition to crusty old computers, I also enjoy the company of three 
crusty old Linotypes. In fact, that's what got me thinking about this 
stuff in the first place. The Linotype keyboard has 90 keys, which 
directly map to the 90 glyphs a Linotype can "render". The keyboard is 
laid out in three qual sized sections: lowercase letters on the left, 
uppercase on the right, with numbers and punctuation in the middle. 
Push the button, and what's marked on the button is what ultimately 
ends up on the page. Each Linotype mat (matrix; letter mold) has two 
positions, which can be selected by flipping a little lever when they're 
being assembled into a line. The two positions are almost always used 
to select between two versions of a font; roman/bold or roman/italic 
are the most common pairings.


Intriguing.  I have a vague mental image of what you're talking about 
after watching Linotype: The Film (http://www.linotypefilm.com).  I 
found it quite entertaining and informative.


But what it means is that you can walk up to a machine with a half-typed 
line in the assembler and immediately determine its state.  Any mats 
set in the bold position are in a physically different position in 
the assembler. The position of the switch tells you if you're typing 
in bold or roman. When you push the 'A' key, you know an uppercase 'A' 
in bold will be added to the line. Additionally, the position of that 
switch can be verified without taking your eyes off of the copy. There 
is no black magic, no spooky action at a distance.  The capabilities of 
the machine are immediately apparent.


I was not aware of the physically different positions.  But either I 
don't remember, pick up on, or they

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Jim Manley via cctalk

> Back on topic, the tools exist, but they are often seen as toys and
> not serious software
> development tools. Are we at the point where the compiler for a visual
> programming
> language is written in the visual programming language?
>
> - Keelan
>

Hi Keelan,

I was going to mention this further back in the thread when visual
programming was first mentioned, but for those not aware, there has been a
shift in emphasis in teaching computing principles to newbies who have no
idea what a bit, byte, assembler, compiler, interpreter, etc., are.  UC
Berkeley's "The Beauty and Joy of Computing" (and a follow-on "The Beauty
and Joy of Data", offered at some institutions) curricula are increasingly
being taught (starting in high school advanced placement computer science,
as well as in freshman coursework in universities) to convey fundamental
computing concepts:

https://bjc.berkeley.edu

The associated courses are taught using a visual programming environment
called Snap!, where the (now browser-based, thank goodness) ease-of-use of
Scratch (drag-and-drop interface, visual metaphors for loops, conditionals,
etc., as well as easy animation tools) is combined with the power of Scheme
(first class procedures, first class lists, first class objects, and first
class continuations).

https://snap.berkeley.edu

Some universities have begun offering Bachelor of Arts degrees in CS, in
addition to BSCSs, where about half of the BACS coursework is
technically-oriented, and the remainder is oriented to more traditional
arts offerings.  TB, TB, and Snap! form a bridge so that students
who ordinarily would never even consider studying CS can become
knowledgeable enough to truly comprehend and appreciate computing's
possibilities and limitations in its role in civilization (or at least
what's left of it).

There's enough slack in the approved offerings that electives can be
weighted more toward the technical direction (e.g., user interface and
experience) or the arts direction (e.g., psychology and history).  The idea
was to close the severely-growing gap between those who know everything
about computing and those who need to know enough, but not everything, to
be truly effective in the information-dominant world we've been careening
toward without nearly enough preparation of future generations.

I haven't worked with Snap! enough yet to know for sure whether it can be
used to develop itself, but I strongly suspect that is the case (it's
actually implemented in Javascript using an HTML5 canvas due to its
browser-based nature).  It wouldn't be suitable for doing systems level
development, unless optimized C code (or equivalent) could be emitted, but
it could certainly be used to demonstrate the logic principles involved in
any level of software development that most people are ever likely to need
to understand.  There's mention of Snap! programs being convertible to
mainstream programming languages such as Python, JavaScript, C, etc., but I
haven't traced to ground in documentation how that's supposed to happen,
yet.

We may be part-way there because Google's Blockly spin-off of Scratch can
already emit five scripting languages (Javascript, Python, PHP, Lua, and
Dart), and it uses a modular approach where emission of code in additional
languages could reportedly be added.  That magic word, "optimized", is the
key to whether the code is fundamentally correct and would need oodles of
hand-rewriting to improve efficiency, or there are ways to automate at
least some of the optimization.

Snap! can be run off-line in a browser, as well as on the on-line primary
and mirror sites, and standalone applications can be generated.  Scratch
has been extended to provide an easy way to control and sense physical
environments via typical robotics components, but I haven't looked to see
if Snap! has inherited those extensions.

For any doubters, note that Pacman was ported to Scratch years ago,
complete with the authentic sounds (including the "shrivel and
disappear-in-death" clip), so ... ;^)

All the Best,
Jim

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-30 Thread Keelan Lightfoot via cctalk

> Welcome.  :-)

Thanks!

> Do you think that we stopped enhancing the user input experience more
> because we were content with what we had or because we didn't see a
> better way to do what we wanted to do?

Both. In the beginning we were content, because the keyboard was well
suited to the capabilities of the technology available at the time it
was invented. We didn't see a better way, because when compared to
using a pen and paper (for writing) or using toggle switches (to
control a computer), a keyboard was a significant improvement. It's
the the explosive growth and universal adoption of computers that has
locked us in to the keyboard as the standard.

> I agree that markup languages are a kludge.  But I don't know that they
> require plain text to describe higher level concepts.
>
> I see no reason that we can't have new control codes to convey new
> concepts if they are needed.

I disagree with this; from a usability standpoint, control codes are
problematic. Either the user needs to memorize them, or software needs
to inject them at the appropriate times. There's technical problems
too; when it comes to playing back a stream of characters, control
characters mean that it is impossible to just start listening. It is
difficult to fast forward and rewind in a file, because the only way
to determine the current state is to replay the file up to that point.

> Aside:  ASCII did what it needed to do at the time.  Times are different
> now.  We may need more / new / different control codes.
>
> By control codes, I'm meaning a specific binary sequence that means a
> specific thing.  I think it needs to be standardized to be compatible
> with other things -or- it needs to be considered local and proprietary
> to an application.

Do you mean modal control codes? As in "everything after here is bold"
and "the bold stops here"?

> I actually wonder how much need there is for /all/ of those utilities.
> I expect that things should have streamlined and simplified, at least
> some, in the last 30 years.

We've gone backwards sadly. For a brief while, this kind of rich user
interface stuff was provided by the OS. A text box, regardless of the
application, would use the OS's text box control, and would have a
universal interface for rich text. But the growth of the web has
resulted in an atavism. We're back to plain text, and using markup to
style our text. If I want bold text in Slack, I have to use markup.
Facebook Messages and YouTube comments also support markup, but the
syntax is slightly different between them. Back in 1991, If I wanted
bold text in any application that supported rich text on my SE/30, I
hit command-B and I got bold text. Sure, there are Javascript rich
text editors that can be bolted on, but they all have their own UI
concepts, and they're all a trainwreck.

> What would you like to do or see done differently?  Even if it turns out
> to be worse, it would still be something different and likely worth
> trying at least once.

In addition to crusty old computers, I also enjoy the company of three
crusty old Linotypes. In fact, that's what got me thinking about this
stuff in the first place. The Linotype keyboard has 90 keys, which
directly map to the 90 glyphs a Linotype can "render". The keyboard is
laid out in three qual sized sections: lowercase letters on the left,
uppercase on the right, with numbers and punctuation in the middle.
Push the button, and what's marked on the button is what ultimately
ends up on the page. Each Linotype mat (matrix; letter mold) has two
positions, which can be selected by flipping a little lever when
they're being assembled into a line. The two positions are almost
always used to select between two versions of a font; roman/bold or
roman/italic are the most common pairings.

But what it means is that you can walk up to a machine with a
half-typed line in the assembler and immediately determine its state.
Any mats set in the bold position are in a physically different
position in the assembler. The position of the switch tells you if
you're typing in bold or roman. When you push the 'A' key, you know an
uppercase 'A' in bold will be added to the line. Additionally, the
position of that switch can be verified without taking your eyes off
of the copy. There is no black magic, no spooky action at a distance.
The capabilities of the machine are immediately apparent.

> I don't think of bold or italic or underline as second class concepts.
> I tend to think of the following attributes that can be applied to text:
>
>   · bold
> [snip]
>
> I don't think that normal is superior to the other four (five) in any
> way.  I do think that normal does occur VASTLY more frequently than the
> any combination of the others.  As such normal is what things default to
> as an optimization.  IMHO that optimization does not relegate the other
> styles to second class.

I agree. I think that they're normal enough that they should exist as
their own code points in unicode. Our

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-28 Thread Jim Manley via cctalk

Some computing economics history:

I'm an engineer and scientist by both education and experience, and one
major difference between the disciplines is that engineers are required to
pass coursework and demonstrate proficiency in economics.  That's because
we need to deliver things that actually do what customers think they paid
for within strict budgets and schedules, or we go hungry.  Scientists, on
the other hand, if they can accurately predict what it will cost to prove a
theory, aren't practicing science, because they have to already know the
outcome and are taking no risk.  A theoretically "superior" encoding may
not see practical use by a significant number of people because of legacy
inertia that often makes no sense, but is rooted in cultural, sociological,
emotional, and other factors, including economics.

Dvorak computer keyboards are allegedly far more efficient
speed/accuracy-wise than QWERTY computer keyboards, so they should rule the
computing world, but they don't.  Keyboards that reduce the risk of
repetitive stress injuries (e.g., carpal tunnel syndrome) should dominate
the market for very sensible health reasons, but they don't, either.
Legacy inertia is a beyotch to overcome, especially when
international-level manufacturers and investors have a strong
interest making lots of money from the status quo.  Logic and reasoning are
simply nowhere near enough to create the conditions necessary for
widespread adoption - sometimes it's just good luck in timing (or, bad
luck, as the case may be).

ASCII was developed in an age when Teletypes and similar devices were the
only textual I/O options, with fixed-width/size/style typefaces (font
family is an attribute of a typeface - there's no such thing as a "font").
By the late 1950s, there were around 250 computer manufacturers, and none
of their products were interoperable in any form.  Until the IBM 360 was
released in 1965, IBM had 14 product _lines_ that were incompatible with
each other, despite having 20,000+ very capable scientists and engineers on
their payroll.

You can't blame the ASCII developers for lack of foresight when no one in
their right mind back then would have ever predicted we could have upwards
of a trillion bytes of memory in our pockets (e.g., the Samsung Note 9),
much less multi-megapixel touch displays with millions of colors, with
worldwide-reaching cellular/Internet access with milliseconds of round-trip
response, etc.

Someone thinking that they're going to make oodles of money from some
supposedly new-and-improved proprietary encoding "standard" that discards
five-plus decades of legacy intellectual and economic investment, is
pursuing a fool's errand.  Even companies with resources at the level of
Apple, Google, Microsoft, etc., aren't that arrogant, and they've
demonstrated some pretty heavy-duty chutzpah over time.  BTW, you won't be
able to patent what apparently amounts to a lookup table, and even if you
copyright it, it will be a simple matter of developing
functionally-equivalent code that performs a translation on-the-fly.  See
also the clever schemes where DVD encryption keys, that had been left on an
unprotected server accessible via the Internet, were transformed into prime
numbers that didn't infringe on the copyrights associated with the keys.

True standards are open nowadays - the days of proprietary "standards" are
a couple of decades behind us - even Microsoft has been publishing the
binary structure of their Office document file formats.  The specification
for Word, that includes everything going back to v 1.0, is humongous, and
even they were having fits trying to maintain the total spec, which is
reportedly why they went with XML to create the .docx, .xlsx, .pptx, etc.,
formats.  That also happened to make it possible to placate governments
(not to mention customers) that are looking for any hint of
anti-competitive behavior, and thus also made it easier for projects such
as OpenOffice and LibreOffice to flourish.

Typographical bigots, who are more interested in style than content, were
safely fenced off in the back rooms of publishing houses and printing
plants until Apple released the hounds on an unsuspecting public.  I'm
actually surprised that the style purists haven't forced Smell-o-Vision
technology on The Rest of Us to ensure that the musty smell of old books is
part of every reading "experience" (I can't stand the current common use of
that word).  At least I have the software chops to transform the visual
trash that passes for "style" these days into something pleasing to _my_
eyes (see what I did there with "severely-flawed" ASCII?  Here's how you
can do /italics/ and !bold! BTW.).

Nothing frosts me more than reading text that can't be resized and
auto-reflowed, especially on mobile devices with extremely limited display
real estate.  I'm fully able-bodied and I'm perturbed by such bad design,
so, I'm pretty sure that pages that prevent pinch-zooming, and that don't
allow for direct

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-28 Thread Christian Gauger-Cosgrove via cctalk

On Wed, 28 Nov 2018 at 09:27, Paul Koning via cctalk
 wrote:
> I learned it about 15 years ago (OpenAPL, running on a Solaris workstation 
> with a modified Xterm that handled the APL characters).  Nice.  It made a 
> handy tool for some cryptanalysis programs I needed to write.
>
I am interested in this cryptanalysis program...


> I wonder if current APL implementations use the Unicode characters for APL, 
> that would make things easy.
>
I can confirm that both NARS 2000 and Dyalog APL both use the Unicode
APL characters.

Regards,
Christian
-- 
Christian M. Gauger-Cosgrove
STCKON08DS0
Contact information available upon request.

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-28 Thread Paul Koning via cctalk

> On Nov 27, 2018, at 9:23 PM, Fred Cisin via cctalk  
> wrote:
> 
>>> I have long wondered if there are computer languages that aren't rooted
>>> in English / ASCII.  I feel like it's rather pompous to assume that all
>>> programming languages are rooted in English / ASCII.  I would hope that
>>> there are programming languages that are more specific to the region of
>>> the world they were developed in.  As such, I would expect that they
>>> would be stored in something other than ASCII.
> 
> On Tue, 27 Nov 2018, William Donzelli via cctalk wrote:
>> APL.
> 
> APL requires adding additional characters.  That was a major obstacle to 
> acceptance, both in terms of keyboard and type ball (my use preceded CRT), 
> but also asking the user/programmer to learn new characters.  I loved APL!

I learned it about 15 years ago (OpenAPL, running on a Solaris workstation with 
a modified Xterm that handled the APL characters).  Nice.  It made a handy tool 
for some cryptanalysis programs I needed to write.

I wonder if current APL implementations use the Unicode characters for APL, 
that would make things easy.

> I love the use of an arrow for assignment.  ...

One of the strangest programming languages I've used is POP-2, which we used in 
an AI course (Expert Systems) at the University of Illinois, in 1976.  Taught 
by a visiting prof from the University of Edinborough, I think Donald Mickie 
but I may have the name confused.

Like APL, POP-2 had the same associativity for all operators.  Unlike APL, the 
designers decided that the majority should win so assignment would be 
left-associative like everything else -- rather than APL's rule that all the 
other operators are right-associative like assignment.  So you'd end up with 
statements like:

n + 1 -> n

More at https://en.wikipedia.org/wiki/POP-2

paul

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-28 Thread Liam Proven via cctalk

On Tue, 27 Nov 2018 at 20:47, Grant Taylor via cctalk
 wrote:
>
> I don't think that HTML can reproduce fixed page layout like PostScript
> and PDF can.  It can make a close approximation.  But I don't think HTML
> can get there.  Nor do I think it should.

There are a wider panoply of options to consider.

For instance, Display Postscript, and come to that, arguably, NeWS.

Also, modern document-specific markups. I work in DocBook XML, which I
dislike intensely.

There's also, at another extreme, AsciiDoc (and Markdown (in various
"flavours")), Restructured Text, and similar "lightweight" MLs:

http://hyperpolyglot.org/lightweight-markup

But there are, of course, rivals. DITA is also widely-used.

And of course there are things like LyX/LaTeX/TeX, which some find
readable. I am not one of them. But I get paid to do Docbook, I don't
get paid to do TeX.

Neal Stephenson's highly enjoyable novel /Seveneves/ contains some
interesting speculations on the future of the Roman alphabet and what
close contact with Cyrillic over a period will do to it.

Aside:

[[

> I'm not personally aware of any cases where ASCII limits programming
> languages.  But my ignorance does not preclude that situation from existing.

APL and ColorForth, as others have pointed out.

> I have long wondered if there are computer languages that aren't rooted
> in English / ASCII.

https://en.wikipedia.org/wiki/Qalb_(programming_language)

More generally:

https://en.wikipedia.org/wiki/Non-English-based_programming_languages

Personally I am more interested in non-*textual* programming
languages. A trivial candidate is Scratch:

https://scratch.mit.edu/

But ones that entirely subvert the model of using linear files
containing characters that are sequentially interpreted are more
interesting to me. I blogged about one family I just discovered last
week:

https://liam-on-linux.livejournal.com/60054.html

The videos are more or less _necessary_ here, because trying to
describe this in text will fail _badly_. Well worth a couple of hours
of anyone's time.

]]

Anyway. To return to text encodings.

Again I wish to refer to a novel; to Kim Stanley Robinson's "Mars
trilogy", /Red Mars/, /Green Mars/ and /Blue Mars/. Or as a friend
called them, "RGB Mars" or even "Technicolor Mars".

A character presents an argument that if you try to summarise many
things on a scale -- e.g. for text encodings, from simplicity and
readability, to complexity and capability -- you can't encapsulate any
sophisticated system.

He urges a 4-cornered system, using the example of the "four humours":
phlegm, bile, choler and sang. The opposed corners of the diagram are
as important as the sides of the square; characteristics form the
corners, but the intersections between them are what defines us.

So. There is more than one scale here.

At one extreme, we could have the simplest possible text encoding.
Something like Morse code or Braille, which omits almost all "syntax"
-- almost no punctuation, no carriage returns or anything like that,
which are _metadata_, they are information about how to display the
content, not content themselves. Not even case is encoded: no
capitals, no minuscule letters. But of course a number of alphabets
don't have that distinction, and it's not essential in the Roman
alphabet.

Slightly richer, but littered with historical baggage from its origins
in teletypes: ASCII.

Much richer, but still not rich enough for all the
Roman-alphabet-using-languages: ANSI.

Insanely rich, but still not rich enough for all the written
languages: Unicode. (What plane? What encoding? What version, even?)

At the other extreme, markup languages that either weren't really
intended for humans but often are written by them -- e.g. the SGML/XML
family -- or are only usable by relatively few humans -- e.g. the TeX
family -- or that are almost never used by humans, e.g. PostScript, or
HP PCL.

And what I find a fairly happy medium -- AsciiDoc, say. Perfectly
readable by untrained people as plain ASCII, can be written with mere
hours of study, if that, but also can be processed and rendered into
something much prettier.

The richer the encoding, the harder it is for *humans* to read, and
the more complex the software to handle it needs to be.

So, yes, ASCII is perhaps too minimal. ANSI is just a superset.

But I'd argue that there _should_ be a separation between at least 2,
maybe 3 levels, and arguably more.

#1 Plain text encoding. Ideally able to handle all the characters in
all forms of the Latin alphabet, and single-byte based. Drop ASCII
legacy baggage such as backspace, bell, etc.

#2 Richer text, with simple markup, but human-readable and
human-writable without needing much skill or knowledge. Along the
lines of Markdown or *traditional* /email/ _formatting_ perhaps.

#3 Formatted text, with embedded control codes. The Oberon OS does this.

#4 Full 1980s word-processor-style document, with control codes,
formatting, font and page layout features, etc.

#5

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-28 Thread Liam Proven via cctalk

On Wed, 28 Nov 2018 at 08:05, Fred Cisin via cctalk
 wrote:
>
> He also created the Canon Cat.
>
> His idea of a user interface included that the program should KNOW
> (assume) what the user wanted to do.

One of my heroes.

I've never used a Cat or his other software UIs, but the demos I've
seen are enough to make me wonder at how much we have lost already,
and secondarily, if it would be possible to code up a Raskin-style
editor in Emacs.

It's about the only editor I know that's smart enough and programmable
enough. Unfortunately, I also find it horrible to use and don't know
how to do this.

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Fred Cisin via cctalk


Why not a language even more self-documenting than COBOL, wherein the main
body is text, and special markers to identify the CODE that corresponds?

On Wed, 28 Nov 2018, Sean Conner wrote:

 In the book _Programmers at Work_ there's a picture of a program Jef
Raskin [1] wrote that basically embeds BASIC into a word processor document.
[1] He started the Macintosh project at Apple.  It was later taken over
by Steve Jobs and taken in a different direction.


He also created the Canon Cat.

His idea of a user interface included that the program should KNOW 
(assume) what the user wanted to do.


I showed him WHY the OS shouldn't go ahead (without asking for 
confirmation!) and format a disk that it couldn't read.  I do not know 
whether that change got made before commercial release.
(The Cat, incidentally, was SS 512 bytes per sector, with 10 sectors per 
track.  Sometimes described as 256K (because use was primarily for 
imaging 256K of RAM), but sometimes [more accurately] as 384K)


Before his death, I almost ended up with his electric van (Subaru 600 
based).  There was not enough computational capability in it for that to 
be on-topic here.

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread ben via cctalk


On 11/27/2018 9:11 PM, Sean Conner via cctalk wrote


   But I can still load and read circa-1968-plain-text files without issue,
on a computer that didn't even exist at the time, using tools that didn't
exist at the time.  The same can't be said for a circa-1988-Microsoft-word
file.  It requires either the software of the time, or specialized software
that understands the format.



But where do find the 1968 plain text files?
Right now I am looking for free online books on computers
and computer science books in the 1971 to 1977 year range.
a fictional example "HAL 9000 programing" AI BOOTSTRAPPING WITH A LISP
1st edition. Useful knowledge for back then. "HAL 9000 programing"
HOW the AI BOOTSTRAPS windows 1000 in HOT JAVA 2001 edition. No so 
useful for historic knowledge.


Looking to write a simple integer language as I have no floating point
yet on 1973-1974 ish paper computer design. And yes it is 18 bits and
TTL. Right now I am programing in C for what little quick and dirty 
software I have written and digging around for ideas.


It would be nice if bitsavers could have the old 1st edition books.
The latest may sell but old knowledge is being lost.
Ben.

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Sean Conner via cctalk

It was thus said that the Great Fred Cisin via cctalk once stated:
>
> >>I like the C comment example; Why do I need to call out a comment with
> >>a special sequence of letters? Why can't a comment exist as a comment?
> 
> Why not a language even more self-documenting than COBOL, wherein the main 
> body is text, and special markers to identify the CODE that corresponds?

  In the book _Programmers at Work_ there's a picture of a program Jef
Raskin [1] wrote that basically embeds BASIC into a word processor document.

  -spc

[1] He started the Macintosh project at Apple.  It was later taken over
by Steve Jobs and taken in a different direction.

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Sean Conner via cctalk

It was thus said that the Great Keelan Lightfoot via cctalk once stated:
> I'm a bit dense for weighing in on this as my first post, but what the heck.
> 
> Our problem isn't ASCII or Unicode, our problem is how we use computers.
> 
> Going back in time a bit, the first keyboards only recorded letters
> and spaces, even line breaks required manual intervention. As things
> developed, we upgraded our input capabilities a little bit (return
> keys! delete keys! arrow keys!), but then, some time before graphical
> displays came along, we stopped upgrading. We stopped increasing the
> capabilities of our input, and instead focused on kludges to make them
> do more. We created markup languages, modifier keys, and page
> description languages, all because our input devices and display
> devices lacked the ability to comprehend anything more than letters.
> Now we're in a position where we have computers with rich displays
> bolted to a keyboard that has remained unchanged for 150 years.

  Do you have anything in particular in mind?

> Unpopular opinion time: Markup languages are a kludge, relying on
> plain text to describe higher level concepts. TeX has held us back.
> It's a crutch so religiously embraced by the people that make our
> software that the concept of markup has come to be accepted "the way".
> I worked with some university students recently, who wasted a
> ridiculous amount of time learning to use LaTeX to document their
> projects. Many of them didn't even know that page layout software
> existed, they thought there was this broad valley in capabilities with
> TeX on one side, and Microsoft Word on the other. They didn't realize
> that there is a whole world of purpose built tools in between. Rather
> than working on developing and furthering our input capabilities,
> we've been focused on keeping them the same. Markup languages aren't
> the solution. They are a clumsy bridge between 150 year old input
> technology and modern display capabilities.
> 
> Bold or italic or underlined text shouldn't be a second class concept,
> they have meaning that can be lost when text is conveyed in
> circa-1868-plain-text. 

  But I can still load and read circa-1968-plain-text files without issue,
on a computer that didn't even exist at the time, using tools that didn't
exist at the time.  The same can't be said for a circa-1988-Microsoft-word
file.  It requires either the software of the time, or specialized software
that understands the format.

> I've read many letters that predate the
> invention of the typewriter, emphasis is often conveyed using
> underlines or darkened letters. We've drawn this arbitrary line in the
> sand, where only letters that can be typed on a typewriter are "text",
> Everything else is fluff that has been arbitrarily decided to convey
> no meaning. I think it's a safe argument to make that the primary
> reason we've painted ourselves into this unexpressive corner is
> because of a dogged insistence that we cling to the keyboard.

  There were conventions developed for typewriters to get around this. 
Underlining text indicated italicized text (if the typewriter didn't have
the capability---some did).

  In fact, typewriters have more flexibility than computers do even today. 
Within the restriction of a typewriter (only characters and spaces) you
could use the back-space key (which did not erase the previous
character) and re-type the same character to get a bold effect.  You could
back-space and hit the underscore to get underlined text.  You could
back-space and hit the ` key to get a grave accent, and the ' to get an
acute accent.  With a bit more fiddling with the back-space and adjusting
the paper via the platten, you could get umlauts (either via the . or '
keys).

  I think the original intent of the BS control character in ASCII was to
facilitate this behavior, but alas, nothing ever did.  Shame, it's a neat
concept.

> I like the C comment example; Why do I need to call out a comment with
> a special sequence of letters? Why can't a comment exist as a comment?

  The smart-ass answer is "because the compiler only looks at a stream of
text and needs a special marker" but I get the deeper question---is a plain
text file the only way to program?

  No.  There are other ways.  There are many attempts at so-called "visual
languages" but none of them have been used to any real extent.  Yes, there
are languages like Visual Basic or Smalltalk, but even with those, you still
type text for the computer to run.

  The only really alternative programming language I know of is Excel. 
Seriously.  That's about the closest thing you get to a comment existing as
a comment without special markers, because you don't include those as part
of the program (specifically, you will exclude those cells from the
computation least you get an error).

> Why is a comment a second class concept? When I take notes in the
> margin, I don't explicitly need to call them out as notes. This
> extends to

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Chuck Guzis via cctalk

On 11/27/18 6:23 PM, Fred Cisin via cctalk wrote:

> I love the use of an arrow for assignment.  In teaching, a student's
> FIRST encounter with programming can be daunting.  Use of an equal sign
> immediately runs up against the long in-grained concept of commutative
> equality.  You would be surprised how many first time students try to
> say 3 = X .  Then, of course,
> N = 1
> N = N + 1
> is a mathematical "proof by induction" that all numbers are equal!
> (Don't let a mathematician see that, or the universe will cease to
> exist, and be replaced by something even more inexplicable!)

It's worth noting that in 1963 ASCII, hex 5E was the up-arrow (now the
circumflex) and hex 5F was the left-arrow (now underline).

It's also worth nothing that in the original CDC 6-bit display code,
there were symbols, not only for left-to-right arrow, but not equals,
logical OR and AND, up- and down-arrow, equivalence, logical NOT,
less-than-or-equal, and greater-than-or equal--pretty much the original
Algool-60 special characters.

--Chuck

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Sean Conner via cctalk

It was thus said that the Great Grant Taylor via cctalk once stated:
> On 11/27/2018 04:43 PM, Keelan Lightfoot via cctalk wrote:
> >
> >Unpopular opinion time: Markup languages are a kludge, relying on plain 
> >text to describe higher level concepts.
> 
> I agree that markup languages are a kludge.  But I don't know that they 
> require plain text to describe higher level concepts.
> 
> I see no reason that we can't have new control codes to convey new 
> concepts if they are needed.
> 
> Aside:  ASCII did what it needed to do at the time.  Times are different 
> now.  We may need more / new / different control codes.
> 
> By control codes, I'm meaning a specific binary sequence that means a 
> specific thing.  I think it needs to be standardized to be compatible 
> with other things -or- it needs to be considered local and proprietary 
> to an application.

  [ snip ]

> I don't think of bold or italic or underline as second class concepts. 
> I tend to think of the following attributes that can be applied to text:
> 
>  · bold
>  · italic
>  · overline
>  · strike through
>  · underline
>  · superscript exclusive or subscript
>  · uppercase exclusive or lowercase
>  · opposing case
>  · normal (none of the above)

  But there are defined control codes for that (or most of that list
anyway).  It's not ANSI, but an ISO standard.  Let's see ... 

^[[1m bold
^[[3m italic
^[[53m overline
^[[9m strike through
^[[4m underline
^[[0m normal

  The superscript/subscribe could be done via another font

^[[11m ... ^[[19m

  Maybe even the opposing case case ... um ... yeah.

  By the way, ^[ is a single character representing the ASCII ESC character
(27).  

> I see no reason that the keyboard can't have keys / glyphs added to it.
> 
> I'm personally contemplating adding additional keys (via an add on 
> keyboard) that are programmed to produce additional symbols.  I 
> frequently use the following symbols and wish I had keys for easier 
> access to them:  ≈, ·, ¢, ©, °, …, —, ≥, ∞, ‽, ≤, µ, 
> ≠, Ω, ½, ¼, ⅓, ¶, ±, ®, §, ¾, ™, ⅔, ¿, ⊕.

  Years ago I came across an IBM Model M keyboard that had the APL character
set on the keyboard, along with the normal characters one finds.  I would
have bought it on the spot if it weren't for a friend of mine who saw it 10
seconds before I did.

  I did recently get another IBM Model M keyboard (an SSK model) that had
additional labels on the keys:

http://boston.conman.org/2018/10/31.2

The nice thing about the IBM Model M is the keycaps are easy to replace.

> I will concede that many computers and / or programming languages do 
> behave based on text.  But I am fairly confident that there are some 
> programming languages (I don't know about computers) that work 
> differently.  Specifically, simple objects are included as part of the 
> language and then more complex objects are built using the simpler 
> objects.  Dia and (what I understand of) Minecraft come to mind.

  You might be thinking of Smalltalk.

  -spc

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Fred Cisin via cctalk


I have long wondered if there are computer languages that aren't rooted
in English / ASCII.  I feel like it's rather pompous to assume that all
programming languages are rooted in English / ASCII.  I would hope that
there are programming languages that are more specific to the region of
the world they were developed in.  As such, I would expect that they
would be stored in something other than ASCII.


On Tue, 27 Nov 2018, William Donzelli via cctalk wrote:

APL.


APL requires adding additional characters.  That was a major obstacle to 
acceptance, both in terms of keyboard and type ball (my use preceded CRT), 
but also asking the user/programmer to learn new characters.  I loved APL!


I love the use of an arrow for assignment.  In teaching, a student's FIRST 
encounter with programming can be daunting.  Use of an equal sign 
immediately runs up against the long in-grained concept of commutative 
equality.  You would be surprised how many first time students try to say 
3 = X .  Then, of course,

N = 1
N = N + 1
is a mathematical "proof by induction" that all numbers are equal!
(Don't let a mathematician see that, or the universe will cease to 
exist, and be replaced by something even more inexplicable!)


Even the archaic keyword "LET" in BASIC helped clarify that.

We tend to be dismissive of such problems, declaring that 
students "need to LEARN the right way".



I remember a cartoon in a publication, that might have been Interface Age,
where an archeologist looking at hieroglyphics says that it looks like a 
subset of APL.



But, I think that the comment was more in regards to programming by 
non-English speaking programmers.  While FORTRAN, COBOL, BASIC can be 
almost trivially adapted to Spanish, Italian, German, etc.,

What about Chinese? Japanese?
Yes, there IS a Chinese COBOL!
But, THOSE programmers essentially have to learn English before they can 
program!
Surely a Chinese or Japanese based programming language could be 
developed.



--
Grumpy Ol' Fred ci...@xenosoft.com

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Toby Thain via cctalk

On 2018-11-27 8:33 PM, Grant Taylor via cctalk wrote:
> ...
>> Bold or italic or underlined text shouldn't be a second class concept,
>> they have meaning that can be lost when text is conveyed in
>> circa-1868-plain-text. I've read many letters that predate the
>> invention of the typewriter, emphasis is often conveyed using
>> underlines or darkened letters.
> 
> I don't think of bold or italic or underline as second class concepts. I
> tend to think of the following attributes that can be applied to text:
> 
>  · bold
>  · italic
>  · overline
>  · strike through
>  · underline
>  · superscript exclusive or subscript
>  · uppercase exclusive or lowercase
>  · opposing case
>  · normal (none of the above)
> 

This covers only a small fraction of the Latin-centric typographic
palette - much of which has existed for 500 years in print (non-Latin
much older). Computerisation has only impoverished that palette, and
this is how it happens: Checklists instead of research.

Work with typographers when trying to represent typography in a
computer. The late Hermann Zapf was Knuth's close friend. That's the
kind of expertise you need on your team.

--Toby


> I don't think that normal is superior to the other four (five) in any
> way.  I do think that normal does occur VASTLY more frequently than the
> any combination of the others.  As such normal is what things default to
> as an optimization.  IMHO that optimization does not relegate the other
> styles to second class.
> ...

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Grant Taylor via cctalk

On 11/27/2018 04:43 PM, Keelan Lightfoot via cctalk wrote:
I'm a bit dense for weighing in on this as my first post, but what
the heck.

Welcome. :-)

Our problem isn't ASCII or Unicode, our problem is how we use computers.

Okay.

Going back in time a bit, the first keyboards only recorded letters
and spaces, even line breaks required manual intervention. As things
developed, we upgraded our input capabilities a little bit (return
keys! delete keys! arrow keys!), but then, some time before graphical
displays came along, we stopped upgrading. We stopped increasing the
capabilities of our input, and instead focused on kludges to make them do
more.

Do you think that we stopped enhancing the user input experience more
because we were content with what we had or because we didn't see a
better way to do what we wanted to do?

We created markup languages, modifier keys, and page description
languages, all because our input devices and display devices lacked
the ability to comprehend anything more than letters. Now we're in a
position where we have computers with rich displays bolted to a keyboard
that has remained unchanged for 150 years.

Hum

Unpopular opinion time: Markup languages are a kludge, relying on plain
text to describe higher level concepts.

I agree that markup languages are a kludge. But I don't know that they
require plain text to describe higher level concepts.

I see no reason that we can't have new control codes to convey new
concepts if they are needed.

Aside: ASCII did what it needed to do at the time. Times are different
now. We may need more / new / different control codes.

By control codes, I'm meaning a specific binary sequence that means a
specific thing. I think it needs to be standardized to be compatible
with other things -or- it needs to be considered local and proprietary
to an application.

TeX has held us back. It's a crutch so religiously embraced by the
people that make our software that the concept of markup has come to be
accepted "the way". I worked with some university students recently,
who wasted a ridiculous amount of time learning to use LaTeX to document
their projects. Many of them didn't even know that page layout software
existed, they thought there was this broad valley in capabilities with
TeX on one side, and Microsoft Word on the other. They didn't realize
that there is a whole world of purpose built tools in between.

I actually wonder how much need there is for /all/ of those utilities.
I expect that things should have streamlined and simplified, at least
some, in the last 30 years.

Rather than working on developing and furthering our input capabilities,
we've been focused on keeping them the same. Markup languages aren't the
solution. They are a clumsy bridge between 150 year old input technology
and modern display capabilities.

What would you like to do or see done differently? Even if it turns out
to be worse, it would still be something different and likely worth
trying at least once.

Bold or italic or underlined text shouldn't be a second class
concept, they have meaning that can be lost when text is conveyed in
circa-1868-plain-text. I've read many letters that predate the invention
of the typewriter, emphasis is often conveyed using underlines or darkened
letters.

I don't think of bold or italic or underline as second class concepts.
I tend to think of the following attributes that can be applied to text:

· bold
· italic
· overline
· strike through
· underline
· superscript exclusive or subscript
· uppercase exclusive or lowercase
· opposing case
· normal (none of the above)

I don't think that normal is superior to the other four (five) in any
way. I do think that normal does occur VASTLY more frequently than the
any combination of the others. As such normal is what things default to
as an optimization. IMHO that optimization does not relegate the other
styles to second class.

We've drawn this arbitrary line in the sand, where only letters that
can be typed on a typewriter are "text", Everything else is fluff that
has been arbitrarily decided to convey no meaning.

I don't agree that the decision was made (by most people). At least not
consciously.

I will say that some people probably decided what a minimum viable
product is when selling typewriters, and consciously chose to omit the
other options.

I think it's a safe argument to make that the primary reason we've
painted ourselves into this unexpressive corner is because of a dogged
insistence that we cling to the keyboard.

I see no reason that the keyboard can't have keys / glyphs added to it.

I'm personally contemplating adding additional keys (via an add on
keyboard) that are programmed to produce additional symbols. I
frequently use the following symbols and wish I had keys for easier
access to them: ≈, ·, ¢, ©, °, …, —, ≥, ∞, ‽, ≤, µ, ≠, Ω, ½, ¼, ⅓, ¶,
±, ®, §, ¾, ™, ⅔, ¿, ⊕.

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Keelan Lightfoot via cctalk

I'm a bit dense for weighing in on this as my first post, but what the heck.

Our problem isn't ASCII or Unicode, our problem is how we use computers.

Going back in time a bit, the first keyboards only recorded letters
and spaces, even line breaks required manual intervention. As things
developed, we upgraded our input capabilities a little bit (return
keys! delete keys! arrow keys!), but then, some time before graphical
displays came along, we stopped upgrading. We stopped increasing the
capabilities of our input, and instead focused on kludges to make them
do more. We created markup languages, modifier keys, and page
description languages, all because our input devices and display
devices lacked the ability to comprehend anything more than letters.
Now we're in a position where we have computers with rich displays
bolted to a keyboard that has remained unchanged for 150 years.

Unpopular opinion time: Markup languages are a kludge, relying on
plain text to describe higher level concepts. TeX has held us back.
It's a crutch so religiously embraced by the people that make our
software that the concept of markup has come to be accepted "the way".
I worked with some university students recently, who wasted a
ridiculous amount of time learning to use LaTeX to document their
projects. Many of them didn't even know that page layout software
existed, they thought there was this broad valley in capabilities with
TeX on one side, and Microsoft Word on the other. They didn't realize
that there is a whole world of purpose built tools in between. Rather
than working on developing and furthering our input capabilities,
we've been focused on keeping them the same. Markup languages aren't
the solution. They are a clumsy bridge between 150 year old input
technology and modern display capabilities.

Bold or italic or underlined text shouldn't be a second class concept,
they have meaning that can be lost when text is conveyed in
circa-1868-plain-text. I've read many letters that predate the
invention of the typewriter, emphasis is often conveyed using
underlines or darkened letters. We've drawn this arbitrary line in the
sand, where only letters that can be typed on a typewriter are "text",
Everything else is fluff that has been arbitrarily decided to convey
no meaning. I think it's a safe argument to make that the primary
reason we've painted ourselves into this unexpressive corner is
because of a dogged insistence that we cling to the keyboard.

I like the C comment example; Why do I need to call out a comment with
a special sequence of letters? Why can't a comment exist as a comment?
Why is a comment a second class concept? When I take notes in the
margin, I don't explicitly need to call them out as notes. This
extends to strings, why do I need to use quotes? I know it's a string
why can't the computer remember that too? Why do I have to use the
capabilities of a typewriter to describe that to the computer? There
seems to be confusion that computers are inherently text based. They
are only that way because we program them and use them that way, and
because we've done it the same way since the day of the teletype, and
it's _how it's done._

"Classic" Macs are a great example of breaking this pattern. There was
no way to force the computer into a text mode of operating, it didn't
exist. Right down to the core the operating system was graphical. When
you click an icon, the computer doesn't issue a text command, it
doesn't call a function by name, it merely alters the flow of some
binary stuff flowing through the CPU in response to some other bits
changing. Yes, the program describing that was written in text, but
that text is not what the computer is interpreting.

I'm getting a bit philosophical, so I'll shut up now, but it's an
interesting discussion.

- Keelan

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread ben via cctalk


On 11/27/2018 12:47 PM, Grant Taylor via cctalk wrote:

ASCII is a common way of encoding characters and control codes in the 
same binary pattern.


File formats are what collections of ASCII characters / control codes 
mean / do.


It also was designed for hard copy. Over strikes don't work well on a 
CRT screen.

Ben.

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread William Donzelli via cctalk

> I have long wondered if there are computer languages that aren't rooted
> in English / ASCII.  I feel like it's rather pompous to assume that all
> programming languages are rooted in English / ASCII.  I would hope that
> there are programming languages that are more specific to the region of
> the world they were developed in.  As such, I would expect that they
> would be stored in something other than ASCII.

APL.

--
Will

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Grant Taylor via cctalk


On 11/27/2018 03:05 AM, Guy Dunphy wrote:
It was a core of the underlying philosophy, that html would NOT allow any 
kind of fixed formatting. The reasoning was that it could be displayed 
on any kind of system, so had to be free-format and quite abstract.


That's one of the reasons that I like HTML as much as I do.

Which is great, until you actually want to represent a real printed page, 
or book. Like Postscript can. Thus html was doomed to be inadequate for 
capture of printed works.


I feel like trying to accurately represent fixed page layout in HTML is 
a questionable idea.  I would think that it would be better to use a 
different type of file.


That was a disaster. There wasn't any real reason it could not be 
both. Just an academic's insistense on enforcing his ideology.  Then of 
course, over time html has morphed to include SOME forms of absolute 
layout, because there was a real demand for that. But the result is 
a hodge-podge.


I don't think that HTML can reproduce fixed page layout like PostScript 
and PDF can.  It can make a close approximation.  But I don't think HTML 
can get there.  Nor do I think it should.



Yes, it should be capable of that. But not enforce 'only that way'.


I question if people are choosing to use HTML to store documentation 
because it's so popular and then getting upset when they want to do 
things that HTML is not meant to do.  Or in some cases is actually meant 
to /not/ to.


Use the tool for the job.  Don't alter the wrong tool for your 
particular job.


IMHO true page layout doesn't belong in HTML.  Loosely laying out the 
same content in approximately the same layout is okay.


By 'html' I mean the kludge of html-css-js. The three-cat herd. (Ignoring 
all the _other_ web cats.)  Now it's way too late to fix it properly 
with patches.


I don't agree with that.  HTML (and XML) has markup that can be used, 
and changed, to define how the HTML is meant to be interpreted.


The fact that people don't do so correctly is mostly independent of the 
fact that it has the ability.  I say mostly because there is some small 
amount of wiggle room for discussion of does the functionality actually 
work or not.


I meant there's no point trying to determine why they were so deluded, 
and failed to recognise that maybe some users (Ed) would want to just 
type two spaces.


I /do/ believe that there /is/ a point in trying to understand why 
someone did what they did.



now 'we' (the world) are stuck with it for legacy compatibility reasons.


Our need to be able to read it does not translate to our need to 
continue to use it.



Any extensions have to be retro-compatible.


I disagree.

I see zero reason why we couldn't come up with something new and 
completely different.


Granted, there should be ways to translate from one to the other.  Much 
like how ASCII and EBCDIC are still in use today.


What I'm talking about is not that. It's about how to create a coding 
scheme that serves ALL the needs we are now aware of. (Just one of 
which is for old ASCII files to still make sense.) This involves both 
re-definition of some of the ASCII control codes, AND defining sequential 
structure standards.  For eg UTF-8 is a sequential structure. So are 
all the html and css codings, all programming languages, etc. There's a 
continuum of encoding...structure...syntax.  The ASCII standard didn't 
really consider that continuum.


I don't think that ASCII was even trying to answer / solve the problems 
that you're talking about.


ASCII was a solution for a different problem for a different time.

There is no reason we can't move on to something else.


Which exceptions would those be? (That weren't built on top of ASCII!)


It is subject to the meaning of "back tot he roots" and not worth taking 
more time.


I assume you're thinking that ASCII serves just fine for program source 
code?


I'm not personally aware of any cases where ASCII limits programming 
languages.  But my ignorance does not preclude that situation from existing.


I do believe that there are a number of niche programming languages (if 
you will) that store things as binary data (I'm thinking PLCs and the 
likes) but occasionally have said data represented (as a hexadecimal 
dump) in ASCII.  But the fact that ASCII can or can't easily display the 
data is immaterial to the system being programmed.


I have long wondered if there are computer languages that aren't rooted 
in English / ASCII.  I feel like it's rather pompous to assume that all 
programming languages are rooted in English / ASCII.  I would hope that 
there are programming languages that are more specific to the region of 
the world they were developed in.  As such, I would expect that they 
would be stored in something other than ASCII.


Could the sequence of bytes be displayed as ASCII?  Sure.  Would it make 
much sense?  Not likely.


This is a bandwagon/normalcy bias effect. "Everyone does it that way 
and always has, so it must be

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Peter Corlett via cctalk

On Tue, Nov 27, 2018 at 01:21:52AM +1100, Guy Dunphy via cctalk wrote:
[...]
> Oh yes, tell me about the html 'there is no such thing as hard formatting and
> you can't have any even when you want it' concept. Thank you Tim Berners Lee.

Sure you can! Pick one of:

a) If you're not using HTML features, don't bother wrapping the text in a HTML
   document. Just serve up a bog standard text/plain document with all of your
   favourite ASCII art and hard formatting as you please.

b) Go old-school and use HTML .

c) Go lah-di-dah new-school and use the CSS white-space: property to fine-tune
   the exact formatting behaviour of  you desire.

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Liam Proven via cctalk

On Mon, 26 Nov 2018 at 15:21, Guy Dunphy via cctalk
 wrote:

> Defects in the ASCII code table. This was a great improvement at the time, 
> but fails to implement several utterly essential concepts. The lack of these 
> concepts in the character coding scheme underlying virtually all information 
> processing since the 1960s, was unfortunate. Just one (of many) bad 
> consequences has been the proliferation of 'patch-up' text coding schemes 
> such as proprietry document formats (MS Word for eg), postscript, pdf, html 
> (and its even more nutty academia-gone-mad variants like XML), UTF-8, unicode 
> and so on.

This is fascinating stuff and I am very interested to see how it comes
out, but I think there is a problem here which I wanted to highlight.

The thing is this. You seem to be discussing what you perceive as
_general_ defects in ASCII, but they are I think not _general_
defects. They are specific to your purpose, and I don't know what that
is exactly, but I have a feeling it is not a general overall universal
goal.

Just consider what "A.S.C.I.I." stands for.

[1] it's American. Yes it has lots of issues internationally, but it
does the job well for American English. As a native English speaker I
rue the absence of £ but the fact that Americans as so unfamiliar with
the symbol that they even appropriate its name for the unrelated #
which already had a perfectly good name of its own, but ASCII is
American and Americans don't use £. Fine.

[2] The "I.I." bit. Historical accidents aside, vestigial traces of
specific obsolete hardware implementations, it's _not a markup
language_. Its function is unrelated to those of HTML or XML or
anything like that. It's for "information interchange". That means
from computer or program to other computer or program. It's an
encoding and that's all. We needed a standard one. We got it. It has
flaws, many flaws, but it worked.

No it doesn't contain æ and å and ä and ø and ö. That's a problem for
Scandinavians.

It doesn't contain š and č and ṡ and ý (among others) and that's a
problem for Roman-alphabet-using Slavs.

Even broadening the discussion to 8-bit ANSI...

It does have a very poor way of encoding é and à and so on, which
indicates the relative importance of Latin-language users in the
Americas, compared to Slavs and so on.

But markup languages, formatting, control signalling, all that sort of
stuff is a separate discussion to encoding standards.

Attempt to bring them into encoding systems and the problem explodes
in complexity and becomes insoluble.

Additionally, it also makes a bit of a mockery of OSes focussed on raw
text streams, such as Unix, and whereas I am no great lover of Unix,
it does provide me with a job, and less headaches than Windows.

So, overall, all I wanted to say was: identify the problem domain
specifically and how to separate that from over, *overlapping* domains
before attacking ASCII for weaknesses that are not actually weaknesses
at all but indeed strengths for a lot of its use-cases.

Saying that,  I'd really like to read more about this project. It
looks like it peripherally intersects with one of my own big ones.

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-27 Thread Liam Proven via cctalk

On Mon, 26 Nov 2018 at 23:39, Christian Gauger-Cosgrove
 wrote:
>
> On Mon, 26 Nov 2018 at 03:44, Liam Proven via cctalk
>  wrote:
> > If it's in Roman, Cyrillic, or Greek, they're alphabets, so it's a letter.
> >
> Correct, Latin, Greek, and Cyrillic are alphabets, so each
> letter/character can be a consonant or vowel.
>
> > I can't read Arabic or Hebrew but I believe they're alphabets too.
> >
> Hebrew, Arabic, Syriac, Punic, Aramaic, Ugaritic, et cetera are
> abjads, meaning that each character represents a consonant sound,
> vowel sounds are either derived from context and knowledge of the
> language, or can be added in via diacritics.
>
> Devanagari and Thai (and Tibetan, Khmer, Sudanese, Balinese...) are
> abugidas, where each character is a consonant-vowel pair, with the
> "base" character being one particular vowel sound, and alternates
> being indicated by modifications (example in Devanagari: "क" is "ka",
> while "कि" is "ki"; another example using Canadian Aboriginal
> Syllabics "ᕓ" is "vai" whereas "ᕗ" is "vu").
>
> > I don't know anything about any Asian scripts except a tiny bit of
> > Japanese and Chinese, and they get called different things, but
> > "character" is probably most common.
> >
> Japanese actually uses three different scripts. Chinese characters
> (the kanji script of Japanese, and the hanja script of Korean) are
> logograms.
>
> Japanese also has two syllabic scripts, katakana and hiragana where
> each character represents a specific consonant vowel pair.
>
> Korean hangul (or if you happen to be from the DPRK, chosŏn'gŭl) is a
> mix of alphabet and syllabary, where individual characters consist of
> sub parts stacked in a specific pattern. Stealing Wikipedia's example,
> "kkulbeol" is written as "꿀벌", not the individual parts "ㄲㅜㄹㅂㅓㄹ".
>
>
> And now for even more fun, Egyptian hieroglyphics and cuneiform (which
> started with Sumerian, and then used by the Assyrians/Babylonians and
> others) are a delightful mix of logographic, syllabic and alphabetic
> characters. Because while China loathes you, Babylon has a truly deep
> hatred of you and wishes to revel in your suffering.

Um. Yes. Thank you for that. Very informative, interesting, and I did
actually know most of it already but maybe others didn't.

The thing is that it's not actually very germane to the question I was
addressing, which was "what do you call the individual units in
different scripts?" I.e. "letter" vs "glyph" vs "character" vs
"ideogram" vs "grapheme", etc... :-)

-- 
Liam Proven - Profile: https://about.me/liamproven
Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Fred Cisin via cctalk

Oh yes, tell me about the html 'there is no such thing as hard formatting 
and you can't have any even when you want it' concept. Thank you Tim 
Berners Lee.

I've not delved too deeply into the lack of hard formatting in HTML.


The HTML   . . .   tag helps a bit.
Before I found THAT, I was having serious difficulties with too much of 
what I tried to do with HTML.


Obvious examples include ASCII art, but also program source code.
I should NOT have to create a "table" for that, nor have difficulty having 
a string literal in code that contains varying numbers of space 
characters!


For some reason, a few decades ago, I had substantial difficulty finding 
out about the existence of the  tag, and at that time, did not find 
the  tag, nor CSS.   Now, it seems to be pretty easy to find.



--
Grumpy Ol' Fred ci...@xenosoft.com

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Grant Taylor via cctalk


On 11/26/18 7:21 AM, Guy Dunphy wrote:
I was speaking poetically. Perhaps "the mail software he uses was written 
by morons" is clearer.


;-)

Oh yes, tell me about the html 'there is no such thing 
as hard formatting and you can't have any even when 
you want it' concept. Thank you Tim Berners Lee.


I've not delved too deeply into the lack of hard formatting in HTML.

I've also always considered HTML to be what you want displayed, with 
minimal information about how you want it displayed.  IMHO CSS helps 
significantly with the latter part.



http://everist.org/NobLog/20130904_Retarded_ideas_in_comp_sci.htm
http://everist.org/NobLog/20140427_p-term_is_retarded.htm


Intriguing.  $readingList++.

Except that 'non-breaking space' is mostly about inhibiting line wrap at 
that word gap.


I wouldn't have thought "mostly" or "inhibiting line wrap".  I view the 
non-breaking space as a way to glue two parts of text together and treat 
them as one unit, particularly for display and partially for selection.


Granted, much of the breaking is done when the text can not continue (in 
it's natural direction), frequently needing to start anew on the next line.


But anyway, there's little point trying to psychoanalyze the writers of 
that software. Probably involved pointy-headed bosses.


I like to understand why things have been done the way they were. 
Hopefully I can learn from the reasons.


Of course not. It was for American English only. This is one of the 
major points of failure in the history of information processing.


Looking backwards, (I think) I can understand why you say that.  But 
based on my (possibly limited) understanding of the time, I think that 
ASCII was one of the primordial building blocks that was necessary.  It 
was a standard (one of many emerging standards of the time) that allowed 
computers from different manufacturers interoperate and represent 
characters with the same binary pattern.  Something that we now (mostly) 
take for granted and something that could not be assured at the time or 
before.


Containing extended Unicode character sets via UTF-8, doesn't make it a 
non-hard-formatted medium. In ASCII a space is a space, and multi-spaces 
DON'T collapse. White space collapse is a feature of html, and whether 
an email is html or not is determined by the sending utility.


Having read the rest of your email and now replying, I feel that we may 
be talking about two different things.  One being ASCII's standard 
definition of how to represent different letters / glyphs in a 
consistent binary pattern.  The other being how information is stored in 
an (un)structured sequence of ASCII characters.


As you see, this IS NOT HTML, since those extra spaces and your diagram 
below would have collapsed if it was html. Also saving it as text and 
opening in a plain text ed or hex editor absolutely reveals what it is.


I feel it is important to acknowledge your point and to state that I'm 
moving on.


Hmm... the problem is it's intended to be serious, but is still far from 
exposure-ready.  So if I talk about it now, I risk having specific terms 
I've coined in the doco (including the project name) getting meme-jammed 
or trademarked by others. The plan is to release it all in one go, 
eventually. Definitely will be years before that happens, if ever.


Fair enough.

However, here's a cut-n-paste (in plain text) of a section of the 
Introduction (html with diags.)


ACK


--

Almost always, a first attempt at some unfamiliar, complex task produces 
a less than optimal result. Only with the knowledge gained from actually 
doing a new thing, can one look back and see the mistakes made. It usually 
takes at least one more cycle of doing it over from scratch to produce 
something that is optimal for the needs of the situation. Sometimes, 
especially where deep and subtle conceptual innovations are involved, 
it takes many iterations.


Part way through the first large (for me at the time) project that I 
worked on, I decided that the project (and likely others) needed three 
versions before being production ready:


1)  First whack at solving the problem.  LOTS about the problem is 
learned, including the true requirements and the unknown dependencies 
along the say.  This will not be the final shipping version.  -  Think 
of this as the Alpha release.
2)  This is a complete re-write of the project based on what was learned 
in #1.  -  Think of this as the Beta release.
3)  This is less of a re-write and more of a bug fix for version 2.  - 
Think of this as the shipping release.


Human development of computing science (including information coding 
schemes) has been effectively a 'first time effort', since we kept on 
developing new stuff built on top of earlier work. We almost never went 
back to the roots and rebuilt everything, applying insights gained from 
the many mistakes made.


With few notable (partial) exceptions, I largely agree.

In reviewing the evolution of

A modest side project : redefining text encoding (Was: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Fred Cisin via cctalk


On Tue, 27 Nov 2018, Guy Dunphy via cctalk wrote:

Hmm... the problem is it's intended to be serious, but is still far from 
exposure-ready.
So if I talk about it now, I risk having specific terms I've coined in the doco 
(including
the project name) getting meme-jammed or trademarked by others. The plan is to 
release it
all in one go, eventually. Definitely will be years before that happens, if 
ever.
However, here's a cut-n-paste (in plain text) of a section of the 
Introduction (html with diags.)


Without pushing too hard to get you to reveal more than you are 
comfortable with, I really like what you wrote, and hope that someday we 
can participate in some aspects.



I would like to see some acknowledgement that some things are truly flaws 
in the original design, whereas some others are ideas for further 
expansion and enhancement.


It's probably not going to be possible to objectively differentiate which 
are which.



And, as typified by Intel X86 V Motorola 68000, incremental kludges 
permit compatability and trivial ease of migration, whereas a design from 
scratch permits correcting aspects that would otherwise be stuck, at the 
expense of massive software re-creation.

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread Christian Gauger-Cosgrove via cctalk

On Mon, 26 Nov 2018 at 03:44, Liam Proven via cctalk
 wrote:
> If it's in Roman, Cyrillic, or Greek, they're alphabets, so it's a letter.
>
Correct, Latin, Greek, and Cyrillic are alphabets, so each
letter/character can be a consonant or vowel.

> I can't read Arabic or Hebrew but I believe they're alphabets too.
>
Hebrew, Arabic, Syriac, Punic, Aramaic, Ugaritic, et cetera are
abjads, meaning that each character represents a consonant sound,
vowel sounds are either derived from context and knowledge of the
language, or can be added in via diacritics.

Devanagari and Thai (and Tibetan, Khmer, Sudanese, Balinese...) are
abugidas, where each character is a consonant-vowel pair, with the
"base" character being one particular vowel sound, and alternates
being indicated by modifications (example in Devanagari: "क" is "ka",
while "कि" is "ki"; another example using Canadian Aboriginal
Syllabics "ᕓ" is "vai" whereas "ᕗ" is "vu").

> I don't know anything about any Asian scripts except a tiny bit of
> Japanese and Chinese, and they get called different things, but
> "character" is probably most common.
>
Japanese actually uses three different scripts. Chinese characters
(the kanji script of Japanese, and the hanja script of Korean) are
logograms.

Japanese also has two syllabic scripts, katakana and hiragana where
each character represents a specific consonant vowel pair.

Korean hangul (or if you happen to be from the DPRK, chosŏn'gŭl) is a
mix of alphabet and syllabary, where individual characters consist of
sub parts stacked in a specific pattern. Stealing Wikipedia's example,
"kkulbeol" is written as "꿀벌", not the individual parts "ㄲㅜㄹㅂㅓㄹ".

And now for even more fun, Egyptian hieroglyphics and cuneiform (which
started with Sumerian, and then used by the Assyrians/Babylonians and
others) are a delightful mix of logographic, syllabic and alphabetic
characters. Because while China loathes you, Babylon has a truly deep
hatred of you and wishes to revel in your suffering.

Regards,
Christian
-- 
Christian M. Gauger-Cosgrove
STCKON08DS0
Contact information available upon request.

Re: Text encoding Babel. Was Re: George Keremedjiev

2018-11-26 Thread ben via cctalk


On 11/26/2018 9:26 AM, Charles Anthony via cctalk wrote:

On Mon, Nov 26, 2018 at 4:28 AM Peter Corlett via cctalk <
cctalk@classiccmp.org> wrote:


On Sun, Nov 25, 2018 at 07:59:13PM -0800, Fred Cisin via cctalk wrote:
[...]

Alas, "current" computers use 8, 16, 32. They totally fail to understand

the

intrinsic benefits of 9, 12, 18, 24, and 36 bits.


Oh go on then, I'm curious. What are the benefits? Is it just that there
are
useful prime factors for bit-packing hacks? And if so, why not 30?



As I understand it, 36 bits was used as it could represent a signed 10
digit decimal number in binary; the  Frieden 10 digit calculator was the
"gold standard" of banking and financial institutions, so to compete in
that market, you computer had to be able to match the arithmetic standards.

-- Charles



I say 20 bits needs to be used more often.
Did anything really use all the control codes in ASCII?
Back then you got what the TTY printed.
Did any one ever come up a a character set for ALGOL?
Ben.

Re: Windows Accessibility Settings. RE: George Keremedjiev

2018-11-26 Thread Grant Taylor via cctalk


On 11/26/2018 02:53 PM, Dave Wade via cctalk wrote:
Just in case anyone isn't aware, and who gets duplicate characters input 
because they have some un-steadiness, and are using a Windows/10 PC 
(I think 7 as well) there are some options in in the "Ease of Access" 
settings "Filter Keys" settings => "bounce keys" that may help with 
your typing. These set a configurable delay that will ignore repeated 
keypresses for a very short period of time.  The default is 0.5 of 
second but its configurable. You need to enable "Filter Keys" to see the 
"Bounce Keys" option. There is also a "slow keys" option.


I've found that there are a number of features that land under 
accessibility / ease of access settings that can make the computer quite 
a bit nicer.


So, if you've ever thought that "I don't need anything under 
'Accessibility' or 'Ease of Access' settings." you may be missing out. 
Go check.


I'm extensively using these assistants for a number of things, not the 
least of which is I'm lazy and I want my iPhone to auto-correct scsi to 
SCSI, or pppoe to PPPoE, or shruggie to ¯\_(ツ)_/¯, or ... to …, or … or 
… or


I hope this helps, and I am sorry if you knew this already and it 
doesn't 


I think it's always good to share neat ~> helpful features with others. 
Especially if it's done in the positive sense of "this is really cool" 
and not negative "oh, you need some help, go look here."




--
Grant. . . .
unix || die

Windows Accessibility Settings. RE: George Keremedjiev

2018-11-26 Thread Dave Wade via cctalk

Just in case anyone isn't aware, and who gets duplicate characters input 
because they have some un-steadiness, and are using a Windows/10 PC (I think 7 
as well) there are some options in 
in the "Ease of Access" settings "Filter Keys" settings => "bounce keys" that 
may help with your typing. These set a configurable delay that will ignore 
repeated keypresses for a very short period of time.
The default is 0.5 of second but its configurable. You need to enable "Filter 
Keys" to see the "Bounce Keys" option. There is also a "slow keys" option. 

I hope this helps, and I am sorry if you knew this already and it doesn't 

Dave
G4UGM

> -Original Message-
> From: cctalk  On Behalf Of ED SHARPE via
> cctalk
> Sent: 26 November 2018 18:30
> To: lpro...@gmail.com; cctalk@classiccmp.org
> Subject: Re: George Keremedjiev
> 
> i use email i  use and suggest   you   use a delete key.  no  loss no  gain...
> 
> 
> In a message dated 11/26/2018 11:16:07 AM US Mountain Standard Time,
> lpro...@gmail.com writes:
> 
> 
> On Mon, 26 Nov 2018 at 17:54, ED SHARPE < couryho...@aol.com> wrote:
> >
> > pay attention it us,probaby my hand which adds,Xtra spaces as stated
> > before, please feel free to use the delete key
> 
> Are you saying that you have motor control problems, such as Parkinson's
> Disease or something? If so, I am really sorry -- but you have never said that
> before, to my recollection.
> 
> But you have never commented to anyone who has asked why you don't
> switch to a proper local email client, which would fix the quoting and so on.
> Do you not have access to your own computer, or something? If so I am sure
> someone could give you a machine, if that would help...
> 
> --
> Liam Proven - Profile: https://about.me/liamproven
> Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com
> Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven
> UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053

1 2 >

1 - 100 of 118 matches

Mail list logo