Re: Guessing the encoding of a test file...

2020-03-25 Thread Ben Rubinstein via use-livecode
On 19/03/2020 20:31, Paul Dupuis via use-livecode wrote: There is an enhancement request to support MacRoman decoding under WIndows and vice versa at https://quality.livecode.com/show_bug.cgi?id=22391 if you want to CC yourself to show interest. See also

Re: Guessing the encoding of a test file...

2020-03-22 Thread Paul Dupuis via use-livecode
On 3/22/2020 8:41 AM, Mark Waddingham via use-livecode wrote: On 2020-03-21 14:09, Paul Dupuis via use-livecode wrote: So far the only person who has read my post and replied with what I was looking for was Peter - and although the routine was written in Rebol rather than LiveCode, he kindly

Re: Guessing the encoding of a test file...

2020-03-22 Thread Mark Waddingham via use-livecode
On 2020-03-21 14:09, Paul Dupuis via use-livecode wrote: So far the only person who has read my post and replied with what I was looking for was Peter - and although the routine was written in Rebol rather than LiveCode, he kindly provided a link to information about it. It might have got lost

Re: Guessing the encoding of a test file...

2020-03-21 Thread peterwawood via use-livecode
the macOS version  won't run on macOS Catalina.PeterPS Once again sorry for the top posting. Original message From: Paul Dupuis via use-livecode Date: 21/03/2020 22:11 (GMT+08:00) To: use-livecode@lists.runrev.com Cc: Paul Dupuis Subject: Re: Guessing the encoding of a test

Re: Guessing the encoding of a test file...

2020-03-21 Thread Paul Dupuis via use-livecode
Nope. The reason I refer to the routine as "guessEncoding" is that I absolutely know that it is a "guess" based on the presence of nulls and other bytes for UTF files and by statistical sampling for various characters for MacRoman vs CP1252. We also offer a optional way for the user to pick

Re: Guessing the encoding of a test file...

2020-03-21 Thread Paul Dupuis via use-livecode
On 3/20/2020 8:49 PM, peterwawood via use-livecode wrote: PaulI wrote a simple function to guess the encoding of a file but in Rebol not LiveCode. I'm not sure how it compares with your current function in terms of accuracy. It is being used by a company which does a lot of text processing.

Re: Guessing the encoding of a test file...

2020-03-21 Thread Quentin Long via use-livecode
I strongly suspect that the desired goal, to have a nice, robust algorithm which automagically identifies the encoding of *ABSOLUTELY ANY* text document with zero need for human involvement, simply isn't possible. Because text encoding is intrinsically arbitrary—see also: the many variations on

Re: Guessing the encoding of a test file...

2020-03-20 Thread peterwawood via use-livecode
use-livecode Date: 20/03/2020 23:35 (GMT+08:00) To: use-livecode@lists.runrev.com Cc: Paul Dupuis Subject: Re: Guessing the encoding of a test file... To Sean and Bob,Thank you for your replies. I may not have been clear enough in my original post:We make and sell an App for macOS

Re: Guessing the encoding of a test file... [OT]

2020-03-20 Thread doc hawk via use-livecode
On Mar 20, 2020, at 4:04 PM, Mark Wieder via use-livecode wrote: > > Even Morse code got a new character recently. But does livecode support that character? :) ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to

Re: Guessing the encoding of a test file... [OT]

2020-03-20 Thread Mark Wieder via use-livecode
On 3/20/20 1:47 PM, doc hawk via use-livecode wrote: They created a *new* five bit, shifted code, rather than just using Baudot Even Morse code got a new character recently. -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list

RE: Guessing the encoding of a test file... [OT]

2020-03-20 Thread Ralph DiMola via use-livecode
...@evergreeninfo.net -Original Message- From: use-livecode [mailto:use-livecode-boun...@lists.runrev.com] On Behalf Of doc hawk via use-livecode Sent: Friday, March 20, 2020 4:48 PM To: How to use LiveCode Cc: doc hawk Subject: Re: Guessing the encoding of a test file... [OT] On Mar 20, 2020, at 12:51

Re: Guessing the encoding of a test file... [OT]

2020-03-20 Thread Paul Dupuis via use-livecode
On 3/20/2020 4:47 PM, doc hawk via use-livecode wrote: On Mar 20, 2020, at 12:51 PM, Ralph DiMola via use-livecode wrote: Just for a laugh... one of the more esoteric codings I used in the quasi modern error (besides EBCDIC) was the 5 bit Quotron stock ticker system in the mid 90s. It used

Re: Guessing the encoding of a test file... [OT]

2020-03-20 Thread doc hawk via use-livecode
On Mar 20, 2020, at 12:51 PM, Ralph DiMola via use-livecode wrote: > > Just for a laugh... one of the more esoteric codings I used in the quasi > modern error (besides EBCDIC) was the 5 bit Quotron stock ticker system in > the mid 90s. It used different codes for requesting/receiving quotes

RE: Guessing the encoding of a test file... [OT]

2020-03-20 Thread Ralph DiMola via use-livecode
: Guessing the encoding of a test file... On Mar 20, 2020, at 11:09 AM, Paul Dupuis via use-livecode wrote: > > Okay, now you going for the low blow :-)\ What part of “lawyer” wasn’t clear? B b :_) > Next, you'll be suggesting I need to check for EBCDIC encodings! That will be a start,

Re: Guessing the encoding of a test file...

2020-03-20 Thread doc hawk via use-livecode
On Mar 20, 2020, at 11:09 AM, Paul Dupuis via use-livecode wrote: > > Okay, now you going for the low blow :-)\ What part of “lawyer” wasn’t clear? B b :_) > Next, you'll be suggesting I need to check for EBCDIC encodings! That will be a start, but it’s not done until you include Baudot.

Re: Guessing the encoding of a test file...

2020-03-20 Thread Paul Dupuis via use-livecode
On 3/20/2020 1:11 PM, doc hawk via use-livecode wrote: On Mar 19, 2020, at 1:31 PM, Paul Dupuis via use-livecode wrote: “ASCII" Wait, you’re not going to distinguish between six and seven bit ASCII? :_) Okay, now you going for the low blow :-) Next, you'll be suggesting I need to check

Re: Guessing the encoding of a test file...

2020-03-20 Thread Paul Dupuis via use-livecode
On 3/20/2020 1:44 PM, Richard Gaskin via use-livecode wrote: I would be interested to learn more about the details of the subsequent refinements over the decade since, but also the ROI proposition for today: I'll try to remember to share the current code after this current review. I'm happy

Re: Guessing the encoding of a test file...

2020-03-20 Thread Richard Gaskin via use-livecode
Paul Dupuis wrote: > There are many published algorithms for doing this and we have a past > contractor of ours take a "best practice" algorithm and create a LCS > "guessEncoding function. This replaced a previous guessEncoding > function we had that from Richard Gaskin, which while quite good,

Re: Guessing the encoding of a test file...

2020-03-20 Thread Mark Waddingham via use-livecode
On 2020-03-20 15:34, Paul Dupuis via use-livecode wrote: Why did I ask this? Because I am interested in comparing the accuracy of our current handler to any other that may be available as, users being users, we recently have a user reveal a bug (mis named variable) in our current function that

Re: Guessing the encoding of a test file...

2020-03-20 Thread doc hawk via use-livecode
On Mar 19, 2020, at 1:31 PM, Paul Dupuis via use-livecode wrote: > > “ASCII" Wait, you’re not going to distinguish between six and seven bit ASCII? :_) ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe,

Re: Guessing the encoding of a test file...

2020-03-20 Thread Håkan Liljegren via use-livecode
I know that Mozilla had a library for finding text decoding. I don’t think they use it anymore though. But I know it was translated into several other languages. It was called something like “universal character detection” or something equally sexy. Just typing out of my head, so it might be

Re: Guessing the encoding of a test file...

2020-03-20 Thread Paul Dupuis via use-livecode
To Sean and Bob, Thank you for your replies. I may not have been clear enough in my original post: We make and sell an App for macOS and Windows. It's uses around the world by researchers (not a lot of them as it is a niche product) on their computers. The research applications allows input

Re: Guessing the encoding of a test file...

2020-03-20 Thread Bob Sneidar via use-livecode
If the files submitted to you do not need to retain their original formats for your purposes, why not just convert them all to a standard format? it's my understanding if you open the file using low level file commands without the binfile parameter, LC will convert the data into the local

Re: Guessing the encoding of a test file...

2020-03-20 Thread Mark Waddingham via use-livecode
Rather than throwing ‘the baby out with the bath water’ so to speak... What are the precise cases in which the method you have fails? And why do you expect it to work in those cases? Warmest Regards, Mark Sent from my iPhone > On 19 Mar 2020, at 20:32, Paul Dupuis via use-livecode > wrote:

Re: Guessing the encoding of a test file...

2020-03-19 Thread Sean Cole (Pi) via use-livecode
You won't want to hear this but unfortunately for Windows you are out of luck. Text files of themselves do not have the encoding embedded in them in any form. Once it is written it is stored as a series of one or two byte characters. If you open it as a binfile or a straight file it appears the

Re: Guessing the encoding of a test file...

2020-03-19 Thread Paul Dupuis via use-livecode
Users of our application may use text files any whatever encoding their local system creates them in. We can not tell them to only create such files with a specific encoding. So, we need to detect the encoding of the text file the user selects. As I mentioned, I have an LC script that

Re: Guessing the encoding of a test file...

2020-03-19 Thread Pi Digital via use-livecode
On a mac it’s easy. Use file -I “MyFile.txt” as a shell script. On Windows it’s near impossible without running a whole bunch or arbitrary tests that may or may not be correct - certainly not accurate. What kind of text were you hoping to see? Was you looking for a particular encoding? If