Re: opening txt files

2013-01-17 Thread Nishok Love
This is the first time I've asked a question on use-livecode and I've been 
pleasantly amazed that people have taken the time to give so much useful advice 
- much respect, and thankyou to everyone. I think I now have a solution which 
works, and I've learnt some interesting things too.

In summary
(1) Bob's idea (below) is the way to differentiate between UTF-8 and UTF-16. 
The program can react by ignoring alternate characters if it finds FF FE as the 
first two characters (thanks HTH for some tidy code but note that the first 
character after the FF FE is the valid one).

(2) The comment on the Apple discussion (also below) would seem to be right. In 
case (1) Text Wrangler reports a Unicode UTF-8 file and in case (2) a Unicode 
UTF-16.

(3) TextEdit seems to resolve the encoding issue before displaying the file so 
TextWrangler is better for nit-pickers (thanks for that, Francis :) my wife 
appreciates the extra ammunition!) who want to see everything.

Onwards and upwards,
Nishok

 
 From: Robert Sneidar slylab...@me.com
 
 I will hazard a guess, that when you open the file for reading, you can open 
 binary first and see if the first two characters amount to FE FF, yes? If so, 
 treat as UTF-16. If not, treat as UTF-8. I have not tested this strategy 
 myself, but your second point seems to give the clue to solve this mystery. 
 
 Bob
 
 
 On Jan 16, 2013, at 9:15 AM, Nishok Love wrote:
 
 Thanks, Bob. Your command works but the same results occur. Further 
 investigations here found this 
 
 When Pages is used to export as Text, the resulting file may be of two 
 kinds:
 
 (1) if the document contained only characters included in Apple MacRoman 
 charset, the file is a pure text file based on Apple MacRoman encoding.
 
 (2) if the document contained extraneous characters the created text file 
 take care of this feature and uses the UTF encoding (two bytes per 
 character) and starts with the logical BOM: FE FF.
 
 which I've copied from the discussion on  
 https://discussions.apple.com/message/9518841?messageID=9518841#9518841?messageID=9518841
 
 *


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-17 Thread Klaus on-rev
Hi friends,

Am 16.01.2013 um 18:15 schrieb Nishok Love nishok.l...@virgin.net:

 ...
 So I'm still looking for a way for LiveCode to spot whether it's opening a 
 file in UTF-8 or UTF-16 (or something else - aaarrgh!). Can I access the file 
 header? read from file just gives me the data...

I found an old script that Mark Waddingham supplied in the past when I had some 
problems
reading VCards in 3.0 format (unicode). I think it can be used to open ANY txt 
file.

I do not fully understand it, so I leave it uncommented ;-)
In any case it will convert any given text file to Livecode readable plain text.

Comments are from Mark W.

 I could read the file, count the number of characters and how many of them 
 are spaces and from that I could infer which format is being used. Probably 
 this would be reliable for my purposes - just not very elegant!
 
 Nishok

###
-- vCards are stored as a text file, however, the text encoding used varies
-- depending on the program that exported them.
--
-- We use the following heuristic to detect encoding:
--   1) If there is the byte order mark 0xFEFF then we assume UTF-16BE
--   2) If there is the byte order mark 0xFFFE then we assume UTF-16LE
--   3) If the first byte is 0x00 then we assume UTF-16BE (compatibility
--  with Tiger Address Book)
--   4) Otherwise we assume UTF-8
--
function importVCard pFilename
  -- First load the vCard as binary data - at this stage we don't know
  -- the text encoding of the file and loading as text would cause
  -- inappropriate line ending conversion.
  local tBinaryVCard
  put url (binfile:  pFilename) into tBinaryVCard
  
  -- This variable will hold the vCard encoded in MacRoman (the default
  -- text encoding Revolution uses on Mac OS X)
  local tNativeVCard
  
  -- We now do our checks to detect text encoding
  local tTextEncoding
  if charToNum(char 1 of tBinaryVCard) is 0 then
put UTF16BE into tTextEncoding
  else if charToNum(char 1 of tBinaryVCard) is 0xFE and charToNum(char 2 of 
tBinaryVCard) is 0xFF then
delete char 1 to 2 of tBinaryVCard
put UTF16BE into tTextEncoding
  else if charToNum(char 1 of tBinaryVCard) is 0xFF and charToNum(char 2 of 
tBinaryVCard) is 0xFE then
delete char 1 to 2 of tBinaryVCard
put UTF16LE into tTextEncoding
  else
put UTF8 into tTextEncoding
  end if
  
  if tTextEncoding begins with UTF16 then
-- Work out the processors byte order
local tHostByteOrder
if the processor is x86 then
  put LE into tHostByteOrder
else
  put BE into tHostByteOrder
end if

-- If the byte orders don't match, switch the order of pairs of bytes
if char -2 to -1 of tTextEncoding is not tHostByteOrder then
  repeat with x = 1 to the length of tBinaryVCard step 2
get char x of tBinaryVCard
put char x + 1 of tBinaryVCard into char x of tBinaryVCard
put it into char x + 1 of tBinaryVCard
  end repeat
end if

-- Decode the UTF-16 to native
put uniDecode(tBinaryVCard) into tNativeVCard
  else
-- Use the standard uniDecode/uniEncode pair to decode the UTF-8 encoding
put uniDecode(uniEncode(tBinaryVCard, UTF8)) into tNativeVCard
  end if
  
  -- We now need to normalize line endings to make sure all lines terminate
  -- in 'return' (numToChar(10)).
  local tTextVCard
  put tNativeVCard into tTextVCard
  
  -- First replace Windows CR-LF style endings
  replace numToChar(13)  numToChar(10) with return in tTextVCard
  
  -- Now replace Mac OS CR style endings
  replace numToChar(13) with return in tTextVCard
  
  return tTextVCard
end importVCard

-- The Tiger version of Apple Address Book (4.0.4) exports vCard files
-- as UTF-16 big endian without a BOM if the record contains any non-ASCII
-- characters.
-- If there are non non-ASCII characters, the record is just left as
-- ASCII with no conversion to UTF-16.
-- On Leopard, it seems that Apple Address Book exports vCard files
-- in UTF-8 regardless.
function importAppleAddressVCard pFilename
  -- First load the vCard as binary data - at this stage we don't know
  -- the text encoding of the file and loading as text would cause
  -- inappropriate line ending conversion.
  local tBinaryVCard
  put url (binfile:  pFilename) into tBinaryVCard
  
  -- This variable will hold the vCard encoded in MacRoman (the default
  -- text encoding Revolution uses on Mac OS X)
  local tNativeVCard
  
  -- Okay so now we have the binary data, we need to decide if it is
  -- UTF-16BE or ASCII/UTf-8. This is easy to do since the first character of
  -- a vCard has to be an ASCII character. If the record has been encoded
  -- as UTF-16BE, then this means this will translate as the first byte
  -- being the NUL (0) character.
  if charToNum(char 1 of tBinaryVCard) is 0 then
-- We are UTF-16BE

-- We now know that tBinaryVCard is big endian UTF-16 since Revolution
-- only handles host byte 

Re: opening txt files

2013-01-17 Thread Robert Sneidar
Hey that's nice, thanks!

Bob


On Jan 17, 2013, at 6:32 AM, Klaus on-rev wrote:

 Hi friends,
 
 Am 16.01.2013 um 18:15 schrieb Nishok Love nishok.l...@virgin.net:
 
 ...
 So I'm still looking for a way for LiveCode to spot whether it's opening a 
 file in UTF-8 or UTF-16 (or something else - aaarrgh!). Can I access the 
 file header? read from file just gives me the data...
 
 I found an old script that Mark Waddingham supplied in the past when I had 
 some problems
 reading VCards in 3.0 format (unicode). I think it can be used to open ANY 
 txt file.
 
 I do not fully understand it, so I leave it uncommented ;-)
 In any case it will convert any given text file to Livecode readable plain 
 text.
 
 Comments are from Mark W.

snip

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-16 Thread Francis Nugent Dixon
Hi from Beautiful Brittany,

Hi Nishok,

If you are a nit-picker, and REALLY want to know
why you have this problem, then my response is
simple - I don't know !

If you want a work-around, it's simple - select
your text when you are in Word, and paste it into
a new text file (TextEdit), and save it.
You have pure text.

-Francis



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-16 Thread Nishok Love
Thanks, Bob. Your command works but the same results occur. Further 
investigations here found this 

When Pages is used to export as Text, the resulting file may be of two kinds:

(1) if the document contained only characters included in Apple MacRoman 
charset, the file is a pure text file based on Apple MacRoman encoding.

(2) if the document contained extraneous characters the created text file take 
care of this feature and uses the UTF encoding (two bytes per character) and 
starts with the logical BOM: FE FF.

which I've copied from the discussion on  
https://discussions.apple.com/message/9518841?messageID=9518841#9518841?messageID=9518841

Opening both files with TextEdit (which displays both of them correctly, ie 
without all those extra spaces), duplicating them and then watching the save 
options shows that one file (the one from Pages) is using UTF-16 whilst Word's 
Western (Mac OS Roman) export is in UTF-8. Using GetInfo I can now see that the 
UTF-16 file is twice the size of the other.

In short, text files are not as simple as they used to be!

So I'm still looking for a way for LiveCode to spot whether it's opening a file 
in UTF-8 or UTF-16 (or something else - aaarrgh!). Can I access the file 
header? read from file just gives me the data...

I could read the file, count the number of characters and how many of them are 
spaces and from that I could infer which format is being used. Probably this 
would be reliable for my purposes - just not very elegant!

Nishok


 I am not sure why you are seeing this. I exported a pages newsletter file as 
 text, then ran this command on it:
 
 on mouseUp pMouseBtnNo
answer file Pick a text file with 
 /Users/bobsneidar/Desktop/SneidarNewsletter.txt
put it into theFile
open file theFile for read
read from file theFile until cr
put it
close file theFile
 end mouseUp
 
 I got this in the message box:
 
 2005 Summer Edition
 
 Seems to work.
 
 Bob
 
 
 
 On Jan 15, 2013, at 10:34 AM, NISHOK LOVE wrote:
 
 Hi All
 
 I have a problem when I open .txt files in OSX, and I don't have much (any!) 
 experience of reading files in LiveCode. 
 
 I have a file originally written in Word on Windows. When I export it as a 
 .txt from Word for Mac I just accept the default Mac OS encoding option 
 (Western (Mac OS Roman) and it all works fine when I open the file in my 
 LiveCode.
 
 But when I open the original file in Pages and export it as Plain Text, I 
 get a different result. When I open that file in LiveCode I find a space has 
 been inserted after every character. So Hello world becomes H e l l o   w o 
 r l d. 
 
 I guess this is a problem with the encoding, but how can my LiveCode 
 understand what the incoming file's encoding is and respond accordingly? My 
 LiveCode needs to be able to deal with any kind of text file...
 
 Thanks,
 Nishok Love
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription 
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode
 

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-16 Thread Richmond

On 01/16/2013 07:15 PM, Nishok Love wrote:

Thanks, Bob. Your command works but the same results occur. Further 
investigations here found this

When Pages is used to export as Text, the resulting file may be of two kinds:

(1) if the document contained only characters included in Apple MacRoman 
charset, the file is a pure text file based on Apple MacRoman encoding.

(2) if the document contained extraneous characters the created text file take care of 
this feature and uses the UTF encoding (two bytes per character) and starts with the 
logical BOM: FE FF.

which I've copied from the discussion on  
https://discussions.apple.com/message/9518841?messageID=9518841#9518841?messageID=9518841

Opening both files with TextEdit (which displays both of them correctly, ie 
without all those extra spaces), duplicating them and then watching the save 
options shows that one file (the one from Pages) is using UTF-16 whilst Word's 
Western (Mac OS Roman) export is in UTF-8. Using GetInfo I can now see that the 
UTF-16 file is twice the size of the other.

In short, text files are not as simple as they used to be!

So I'm still looking for a way for LiveCode to spot whether it's opening a file 
in UTF-8 or UTF-16 (or something else - aaarrgh!). Can I access the file 
header? read from file just gives me the data...

I could read the file, count the number of characters and how many of them are 
spaces and from that I could infer which format is being used. Probably this 
would be reliable for my purposes - just not very elegant!

Nishok




 Why not use RTF?

Richmond.

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-16 Thread Robert Sneidar
I did not see an RTF export option in Pages. Besides, I think he is dealing 
with reading text files, the nature of which he does not control. 

Bob


On Jan 16, 2013, at 9:22 AM, Richmond wrote:

 On 01/16/2013 07:15 PM, Nishok Love wrote:
 Thanks, Bob. Your command works but the same results occur. Further 
 investigations here found this
 
 When Pages is used to export as Text, the resulting file may be of two 
 kinds:
 
 (1) if the document contained only characters included in Apple MacRoman 
 charset, the file is a pure text file based on Apple MacRoman encoding.
 
 (2) if the document contained extraneous characters the created text file 
 take care of this feature and uses the UTF encoding (two bytes per 
 character) and starts with the logical BOM: FE FF.
 
 which I've copied from the discussion on  
 https://discussions.apple.com/message/9518841?messageID=9518841#9518841?messageID=9518841
 
 Opening both files with TextEdit (which displays both of them correctly, ie 
 without all those extra spaces), duplicating them and then watching the save 
 options shows that one file (the one from Pages) is using UTF-16 whilst 
 Word's Western (Mac OS Roman) export is in UTF-8. Using GetInfo I can now 
 see that the UTF-16 file is twice the size of the other.
 
 In short, text files are not as simple as they used to be!
 
 So I'm still looking for a way for LiveCode to spot whether it's opening a 
 file in UTF-8 or UTF-16 (or something else - aaarrgh!). Can I access the 
 file header? read from file just gives me the data...
 
 I could read the file, count the number of characters and how many of them 
 are spaces and from that I could infer which format is being used. Probably 
 this would be reliable for my purposes - just not very elegant!
 
 Nishok
 
 
 
 Why not use RTF?
 
 Richmond.
 
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription 
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-16 Thread Robert Sneidar
I will hazard a guess, that when you open the file for reading, you can open 
binary first and see if the first two characters amount to FE FF, yes? If so, 
treat as UTF-16. If not, treat as UTF-8. I have not tested this strategy 
myself, but your second point seems to give the clue to solve this mystery. 

Bob


On Jan 16, 2013, at 9:15 AM, Nishok Love wrote:

 Thanks, Bob. Your command works but the same results occur. Further 
 investigations here found this 
 
 When Pages is used to export as Text, the resulting file may be of two 
 kinds:
 
 (1) if the document contained only characters included in Apple MacRoman 
 charset, the file is a pure text file based on Apple MacRoman encoding.
 
 (2) if the document contained extraneous characters the created text file 
 take care of this feature and uses the UTF encoding (two bytes per character) 
 and starts with the logical BOM: FE FF.
 
 which I've copied from the discussion on  
 https://discussions.apple.com/message/9518841?messageID=9518841#9518841?messageID=9518841
 
 Opening both files with TextEdit (which displays both of them correctly, ie 
 without all those extra spaces), duplicating them and then watching the save 
 options shows that one file (the one from Pages) is using UTF-16 whilst 
 Word's Western (Mac OS Roman) export is in UTF-8. Using GetInfo I can now see 
 that the UTF-16 file is twice the size of the other.
 
 In short, text files are not as simple as they used to be!
 
 So I'm still looking for a way for LiveCode to spot whether it's opening a 
 file in UTF-8 or UTF-16 (or something else - aaarrgh!). Can I access the file 
 header? read from file just gives me the data...
 
 I could read the file, count the number of characters and how many of them 
 are spaces and from that I could infer which format is being used. Probably 
 this would be reliable for my purposes - just not very elegant!
 
 Nishok
 
 
 I am not sure why you are seeing this. I exported a pages newsletter file as 
 text, then ran this command on it:
 
 on mouseUp pMouseBtnNo
   answer file Pick a text file with 
 /Users/bobsneidar/Desktop/SneidarNewsletter.txt
   put it into theFile
   open file theFile for read
   read from file theFile until cr
   put it
   close file theFile
 end mouseUp
 
 I got this in the message box:
 
 2005 Summer Edition
 
 Seems to work.
 
 Bob
 
 
 
 On Jan 15, 2013, at 10:34 AM, NISHOK LOVE wrote:
 
 Hi All
 
 I have a problem when I open .txt files in OSX, and I don't have much 
 (any!) experience of reading files in LiveCode. 
 
 I have a file originally written in Word on Windows. When I export it as a 
 .txt from Word for Mac I just accept the default Mac OS encoding option 
 (Western (Mac OS Roman) and it all works fine when I open the file in my 
 LiveCode.
 
 But when I open the original file in Pages and export it as Plain Text, I 
 get a different result. When I open that file in LiveCode I find a space 
 has been inserted after every character. So Hello world becomes H e l l o   
 w o r l d. 
 
 I guess this is a problem with the encoding, but how can my LiveCode 
 understand what the incoming file's encoding is and respond accordingly? My 
 LiveCode needs to be able to deal with any kind of text file...
 
 Thanks,
 Nishok Love
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your 
 subscription preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode
 
 
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription 
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-15 Thread Robert Sneidar
I am not sure why you are seeing this. I exported a pages newsletter file as 
text, then ran this command on it:

on mouseUp pMouseBtnNo
answer file Pick a text file with 
/Users/bobsneidar/Desktop/SneidarNewsletter.txt
put it into theFile
open file theFile for read
read from file theFile until cr
put it
close file theFile
end mouseUp

I got this in the message box:

2005 Summer Edition

Seems to work.

Bob



On Jan 15, 2013, at 10:34 AM, NISHOK LOVE wrote:

 Hi All
 
 I have a problem when I open .txt files in OSX, and I don't have much (any!) 
 experience of reading files in LiveCode. 
 
 I have a file originally written in Word on Windows. When I export it as a 
 .txt from Word for Mac I just accept the default Mac OS encoding option 
 (Western (Mac OS Roman) and it all works fine when I open the file in my 
 LiveCode.
 
 But when I open the original file in Pages and export it as Plain Text, I get 
 a different result. When I open that file in LiveCode I find a space has been 
 inserted after every character. So Hello world becomes H e l l o   w o r l d. 
 
 I guess this is a problem with the encoding, but how can my LiveCode 
 understand what the incoming file's encoding is and respond accordingly? My 
 LiveCode needs to be able to deal with any kind of text file...
 
 Thanks,
 Nishok Love
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription 
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-15 Thread Peter M. Brigham
I have seen the behavior Nishok describes. It was some time ago and I don't 
remember what the issue was, but I think it was fixable.

-- Peter

Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig

On Jan 15, 2013, at 1:54 PM, Robert Sneidar wrote:

 I am not sure why you are seeing this. I exported a pages newsletter file as 
 text, then ran this command on it:
 
 on mouseUp pMouseBtnNo
answer file Pick a text file with 
 /Users/bobsneidar/Desktop/SneidarNewsletter.txt
put it into theFile
open file theFile for read
read from file theFile until cr
put it
close file theFile
 end mouseUp
 
 I got this in the message box:
 
 2005 Summer Edition
 
 Seems to work.
 
 Bob
 
 
 
 On Jan 15, 2013, at 10:34 AM, NISHOK LOVE wrote:
 
 Hi All
 
 I have a problem when I open .txt files in OSX, and I don't have much (any!) 
 experience of reading files in LiveCode. 
 
 I have a file originally written in Word on Windows. When I export it as a 
 .txt from Word for Mac I just accept the default Mac OS encoding option 
 (Western (Mac OS Roman) and it all works fine when I open the file in my 
 LiveCode.
 
 But when I open the original file in Pages and export it as Plain Text, I 
 get a different result. When I open that file in LiveCode I find a space has 
 been inserted after every character. So Hello world becomes H e l l o   w o 
 r l d. 
 
 I guess this is a problem with the encoding, but how can my LiveCode 
 understand what the incoming file's encoding is and respond accordingly? My 
 LiveCode needs to be able to deal with any kind of text file...
 
 Thanks,
 Nishok Love
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription 
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode
 
 
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription 
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-15 Thread Kay C Lan
What happens when you open the Pages converted file in TextEdit, does
it have the extra spaces? If so the problem is with the conversion
process from Pages, not with LC. If so you need to look at some of the
conversion options Pages offers and see if you can create the file
without the extra spaces.

If TextEdit opens the converted file just fine, then you may need to
download the free TextWrangler:

http://www.barebones.com/products/textwrangler/download.html

You can the select 'Show invisibles' from the 'T' toolbar icon.

If that doesn't reveal anything then use 'Zap Gremlins...' option in
the Text menu. Tick the boxes, use 'Replace with' and put in your own
distinct character. This will then highlight if there are any chars
that Pages is leaving behind that TextEdit is interpreting as
non-visible but LC is interpreting as a space.

If all else fails, assuming you have that selects the file and puts it
in a variable, I'll call tData, and you have a breakpoint immediately
after this, when you look at the variable tData it looks like this: H
E L L O...

So in the next line of you script after the breakpoint put:

--to determine the ASCII number of the bad character
put charToNum(char 2 of tData) into tBadChar

run your script again, when it stops at the breakpoint step once so
the above line is executed then check what is in tBadChar.

Hopefully it won't be 32 - which is what a regular space is. You
should be careful though, as there may be a bad character before the H
so maybe the number you get is for the H. Basically any number between
32-126 is probably valid. Anything outside this range is likely to be
the cause of your problem. So you may need to test char 1, char 2 and
char 3 just to make sure you are looking at the Bad character.

Once you've positively identified the ASCII value of the bad character
then simply add this further line of script after the last line you
added:

--replace bad ASCII with nothing
replace numToChar(tBadChar) with  in tData

Again, if the number is 32 this means Pages is doing the wrong thing
with it's conversion and it's Page's problem, and it would be nigh
impossible for LC to repair this.

HTH

On Wed, Jan 16, 2013 at 2:34 AM, NISHOK LOVE nishok.l...@virgin.net wrote:
 Hi All

 I have a problem when I open .txt files in OSX, and I don't have much (any!) 
 experience of reading files in LiveCode.

 I have a file originally written in Word on Windows. When I export it as a 
 .txt from Word for Mac I just accept the default Mac OS encoding option 
 (Western (Mac OS Roman) and it all works fine when I open the file in my 
 LiveCode.

 But when I open the original file in Pages and export it as Plain Text, I get 
 a different result. When I open that file in LiveCode I find a space has been 
 inserted after every character. So Hello world becomes H e l l o   w o r l d.

 I guess this is a problem with the encoding, but how can my LiveCode 
 understand what the incoming file's encoding is and respond accordingly? My 
 LiveCode needs to be able to deal with any kind of text file...

 Thanks,
 Nishok Love
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription 
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-15 Thread Robert Sneidar
I did that earlier. There are no encoding options available in the export 
dialog. It is possible that based on the system and language settings at the 
time the file is exported that the encoding might be set to Unicode, but I 
couldn't find anything by googling for it. 

Bob Sneidar
IT Manager
Calvary Chapel CM
Sent from iPhone

On Jan 15, 2013, at 21:19, Kay C Lan lan.kc.macm...@gmail.com wrote:

 What happens when you open the Pages converted file in TextEdit, does
 it have the extra spaces? If so the problem is with the conversion
 process from Pages, not with LC. If so you need to look at some of the
 conversion options Pages offers and see if you can create the file
 without the extra spaces.
 
 If TextEdit opens the converted file just fine, then you may need to
 download the free TextWrangler:
 
 http://www.barebones.com/products/textwrangler/download.html
 
 You can the select 'Show invisibles' from the 'T' toolbar icon.
 
 If that doesn't reveal anything then use 'Zap Gremlins...' option in
 the Text menu. Tick the boxes, use 'Replace with' and put in your own
 distinct character. This will then highlight if there are any chars
 that Pages is leaving behind that TextEdit is interpreting as
 non-visible but LC is interpreting as a space.
 
 If all else fails, assuming you have that selects the file and puts it
 in a variable, I'll call tData, and you have a breakpoint immediately
 after this, when you look at the variable tData it looks like this: H
 E L L O...
 
 So in the next line of you script after the breakpoint put:
 
 --to determine the ASCII number of the bad character
 put charToNum(char 2 of tData) into tBadChar
 
 run your script again, when it stops at the breakpoint step once so
 the above line is executed then check what is in tBadChar.
 
 Hopefully it won't be 32 - which is what a regular space is. You
 should be careful though, as there may be a bad character before the H
 so maybe the number you get is for the H. Basically any number between
 32-126 is probably valid. Anything outside this range is likely to be
 the cause of your problem. So you may need to test char 1, char 2 and
 char 3 just to make sure you are looking at the Bad character.
 
 Once you've positively identified the ASCII value of the bad character
 then simply add this further line of script after the last line you
 added:
 
 --replace bad ASCII with nothing
 replace numToChar(tBadChar) with  in tData
 
 Again, if the number is 32 this means Pages is doing the wrong thing
 with it's conversion and it's Page's problem, and it would be nigh
 impossible for LC to repair this.
 
 HTH
 
 On Wed, Jan 16, 2013 at 2:34 AM, NISHOK LOVE nishok.l...@virgin.net wrote:
 Hi All
 
 I have a problem when I open .txt files in OSX, and I don't have much (any!) 
 experience of reading files in LiveCode.
 
 I have a file originally written in Word on Windows. When I export it as a 
 .txt from Word for Mac I just accept the default Mac OS encoding option 
 (Western (Mac OS Roman) and it all works fine when I open the file in my 
 LiveCode.
 
 But when I open the original file in Pages and export it as Plain Text, I 
 get a different result. When I open that file in LiveCode I find a space has 
 been inserted after every character. So Hello world becomes H e l l o   w o 
 r l d.
 
 I guess this is a problem with the encoding, but how can my LiveCode 
 understand what the incoming file's encoding is and respond accordingly? My 
 LiveCode needs to be able to deal with any kind of text file...
 
 Thanks,
 Nishok Love
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription 
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode
 
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription 
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-15 Thread Kay C Lan
On Wed, Jan 16, 2013 at 1:19 PM, Kay C Lan lan.kc.macm...@gmail.com wrote:
 it would be nigh
 impossible for LC to repair this.

Actually I take that back. If Pages places an extra space after EVERY
char, so there are two spaces between words instead of one, there is a
space after the last word on a line but before the carriage return,
and another space after the carriage return and before the first char
of a new line, then it should be just a simple matter of:

--assuming the very first char is invalid
put true into tOdd
repeat for each char tChar in tData
  if (tOdd = true)
put tChar after tOutput
put false into tOdd
  else
put true into tOdd
  end if
end repeat
put tOutput into fld WhereEver

If there is not an extra space after EVERY char, then it becomes much
more difficult.

HTH

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


RE: opening txt files

2013-01-15 Thread Paul D. DeRocco
 From: Kay C Lan
 
 Actually I take that back. If Pages places an extra space after EVERY
 char, so there are two spaces between words instead of one, there is a
 space after the last word on a line but before the carriage return,
 and another space after the carriage return and before the first char
 of a new line, then it should be just a simple matter of:
 
 --assuming the very first char is invalid
 put true into tOdd
 repeat for each char tChar in tData
   if (tOdd = true)
 put tChar after tOutput
 put false into tOdd
   else
 put true into tOdd
   end if
 end repeat
 put tOutput into fld WhereEver
 
 If there is not an extra space after EVERY char, then it becomes much
 more difficult.

Is it possible that the original text file is in Unicode, so that each
character in the ASCII set is followed by a null, and something is
converting the nulls into blanks? Possibly the display routine itself?

-- 

Ciao,   Paul D. DeRocco
Paulmailto:pdero...@ix.netcom.com 


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-15 Thread J. Landman Gay

On 1/15/13 11:39 PM, Kay C Lan wrote:


If there is not an extra space after EVERY char, then it becomes much
more difficult.


If the original was unicode, which was then converted to plain text, 
Pages might be retaining the second byte and inserting a null. In other 
words, it's keeping both of the original bytes but using only the first.


That's what LiveCode does too when you uniEncode text. Maybe uniDecode 
would fix it.


--
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software   | http://www.hyperactivesw.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-15 Thread J. Landman Gay

On 1/15/13 11:49 PM, Paul D. DeRocco wrote:


Is it possible that the original text file is in Unicode, so that each
character in the ASCII set is followed by a null, and something is
converting the nulls into blanks? Possibly the display routine itself?



GMTA. :)

--
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software   | http://www.hyperactivesw.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-15 Thread Kay C Lan
On Wed, Jan 16, 2013 at 1:54 PM, J. Landman Gay
jac...@hyperactivesw.com wrote:
 If the original was unicode,

Again, TextWrangler can help, at the bottom left of the document it
will show the documents encoding which is also a button that allows
you to change it. On my system a new TW document is created as UTF-8,
when I open a .doc in Pages and convert it to plain text and open it
in TW it shows Western (Mac OS Roman).

If I take that same converted document and feed it into LC, it's fine.

If I take that Mac OS Roman document in TW and change the encoding to
UTF-16, then when I import that into LC it includes all sorts of extra
characters between all the valid ones, not just extra spaces. UTF-8
has no extra space but there are the odd incorrectly transposed chars.

If I then take that UTF-16 document and open it in Pages and Export it
as Plain Text TW again shows it's encoding is Western (Mac OS Roman)
and LC has no problems with it.

My suggestion, if you open the converted document in TW and the
encoding isn't Western (Mac OS Roman), then change it, save it, and
see what you get.

At least that's how it works on my system.

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: opening txt files

2013-01-15 Thread Kay C Lan
Bob wrote:
No one has talked about what version of pages.

[on my gmail it got connected to a different thread]

I'm on OS X 10.8.2, LC 5.5.3, Pages 4.3, TextWrangler 4.0.1

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode