Raymond Mercier <RaymondM at compuserve dot com> wrote: > The problem of the size of Unihan has nothing at all to do with the > cost of storage, and everything to do with the functioning of programs > that might open and read it. > Since the lines in Unihan are separated by 0x0A alone, not 0x0A0x0D, > this means that when opened in notepad the lines are not separated...
I have to agree that an ordinary plain-text editor is probably not the right tool for browsing a 25-megabyte data file, even though I've been known to do the same with UnicodeData.txt (which is admittedly an order of magnitude smaller). Even though Unihan is packaged as plain text, one record per LF-terminated line (well, sort of), it's really more appropriate to think of it as a data file, intended to be read by software. Something like a batch file that calls grep (or other plain-text search tool) would be more appropriate. And as John said, converting LF to CRLF is quite a simple task -- it can even be done by your FTP client, while downloading the file -- and should not be thought of as a deficiency in the current plain-text format. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

