I did a similar test for a few German letters "ößüäÖÜÄ"

  require'files'

  a=:fread'c:\temp\ger.txt'

  a. i. a
246 223 252 228 214 220 196

  'ößüäÖÜÄ'fwrite'c:\temp\geruni.txt'
14

  b=:fread'c:\temp\geruni.txt'

  a. i. b
195 182 195 159 195 188 195 164 195 150 195 156 195 132

  c=:fread'c:\temp\geruninote.txt'

  a. i. c
255 254 246 0 223 0 252 0 228 0 214 0 220 0 196 0

  d=:fread'c:\temp\gerunibignote.txt'

  a. i. d
254 255 0 246 0 223 0 252 0 228 0 214 0 220 0 196

  e=:fread'c:\temp\gerutfnote.txt'

  a. i. e
239 187 191 195 182 195 159 195 188 195 164 195 150 195 156 195 132

  b
ößüäÖÜÄ

  e
ößüäÖÜÄ

  a


  b
ößüäÖÜÄ

  c


  d


  e
ößüäÖÜÄ

Same results a,c and d do not display in J
Even if all the files look the same in Notepad and e and b look the same in
J they are not the same

  e=b
|length error
|   e    =b
|[-0]

  e-:b
0

2006/10/12, Björn Helgason <[EMAIL PROTECTED]>:

I have exactly the same problem as you have with my Íslandish letters

As an example I creeate a file with Notepad with "þæö"

It can be using codepages or it can be in Unicode

As it happens with Notepad you can store the file as Unicode, Unicode Big
Endian and Utf-8 as well as ANSI (using codepages)
With ANSI and codepages there are a number of places "þæö" and the rest of
the Íslandish chars can be

Let me create a file with only "þæö" in it and save it as different
options of Notepad allows

   require'files'

   a=:fread'c:\temp\thorn.txt'

   a. i. a
254 230 246

   'þæö'fwrite'c:\temp\thornuni.txt'
6

   b=:fread'c:\temp\thornuni.txt'

   a. i. b
195 190 195 166 195 182

   c=:fread'c:\temp\thornuninote.txt'

   a. i. c
255 254 254 0 230 0 246 0

   d=:fread'c:\temp\thornunibignote.txt'

   a. i. d
254 255 0 254 0 230 0 246

   e=:fread'c:\temp\thornutfnote.txt'

   a. i. e
239 187 191 195 190 195 166 195 182

In Notepad they all look the same
In J only b and e look the same but the a. i. display is not the same
b is created from J as you saw from above the rest from different
encodings by Notepad

   a


   b
þæö

   c


   d


   e
þæö

I am sure you may be even more confused by the above

How this issue is to be resolved - wellllll.....


2006/10/12, Ronan Reilly <[EMAIL PROTECTED]>:
>
> Thanks Chris, Björn and Bill.
>
> That clarifies things.  However, I'm finding that the text strings I
> read
> from the file are not getting converted by applying utf8.  It seems that
> they have to be datatype unicode for this to work, but they are read in
> and
> stored as literals.  Is there any way of coercing datatypes in J to get
> around this?
>
>
> Thanks again,
>
> Ronan
>
> PS:  I'm running J on a Mac if that's relevant
>
>
>
> On 11/10/2006 23:59, "Chris Burke" < [EMAIL PROTECTED]> wrote:
>
> > Ronan Reilly wrote:
> >> I'm trying to display some German text using plot (J601).  The text
> has been
> >> read in from a file using fread, decoded using ". , and stored in a
> table.
> >>
> >> When I extract a word with an umlaut from the table (e.g.,
> Aktualität),
> >> assign it to W1, and plot it using
> >>
> >> pd 'reset;text 0 _1x ',W1,';show'
> >>
> >> the letter with an umlaut does not display correctly.
> >>
> >> However, if I evaluate W1 in the jwd, like so:
> >>
> >>     W1
> >> Aktualität
> >>
> >> and then edit and evaluate it like so
> >>
> >> W2 =: Aktualität
> >>
> >> and then display W2 using
> >>
> >> pd 'reset;text 0 _1x ',W2,';show'
> >>
> >> W2 displays correctly.
> >>
> >> Also
> >>
> >>     $W1
> >> 10
> >>     $W2
> >> 11
> >>
> >> What is going on here?  How can I display the unicode characters
> using plot?
> >
> > In general, J assumes incoming text is in utf8 format. J also supports
> a
> > "unicode" data type, which is 2-byte unicode, see the help for u: .
> >
> > Text as either utf8 or unicode will display correctly in the session,
> > but only utf8 will work in plot.
> >
> > In this example, W2 is in utf8 format, and W1 in 2-byte unicode. You
> > need to convert W1 to utf8, using the utf8 verb.
> >
> > Here is what is happening:
> >
> >    #W2=: 'Aktualität'
> > 11
> >    #W1=: ucp W2
> > 10
> >
> >    W2 -: utf8 W1
> >
> >    datatype W2
> > literal
> >    datatype W1
> > unicode
> >
> >    a.i.W2
> > 65 107 116 117 97 108 105 116 195 164 116
> >    a.i.W1
> > 65 107 116 117 97 108 105 116 228 116
> >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
>
> --
> Professor Ronan Reilly
> Head of Department
> Department of Computer Science
> NUI Maynooth
> Maynooth
> Co. Kildare
> IRELAND
>
> t: +353-1-7083847
> e: [EMAIL PROTECTED]
> w: http://www.cs.nuim.ie; http://cortex.cs.nuim.ie
>
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



--
Björn Helgason, Verkfræðingur
Fugl&Fiskur ehf, Þerneyjarsund 23, Box 127
801 Grímsnes ,t-póst: [EMAIL PROTECTED]
Skype: gosiminn, gsm: +3546985532
Landslags og skrúðgarðagerð, gröfuþjónusta
http://groups.google.com/group/J-Programming


Tæknikunnátta höndlar hið flókna, sköpunargáfa er meistari einfaldleikans

góður kennari getur stigið á tær án þess að glansinn fari af skónum
          /|_      .-----------------------------------.
         ,'  .\  /  | Með léttri lund verður        |
     ,--'    _,'   | Dagurinn í dag                     |
    /       /       | Enn betri en gærdagurinn  |
   (   -.  |        `-----------------------------------'
   |     ) |        (\_ _/)
  (`-.  '--.)       (='.'=)
   `. )----'        (")_(")




--
Björn Helgason, Verkfræðingur
Fugl&Fiskur ehf, Þerneyjarsund 23, Box 127
801 Grímsnes ,t-póst: [EMAIL PROTECTED]
Skype: gosiminn, gsm: +3546985532
Landslags og skrúðgarðagerð, gröfuþjónusta
http://groups.google.com/group/J-Programming


Tæknikunnátta höndlar hið flókna, sköpunargáfa er meistari einfaldleikans

góður kennari getur stigið á tær án þess að glansinn fari af skónum
         /|_      .-----------------------------------.
        ,'  .\  /  | Með léttri lund verður        |
    ,--'    _,'   | Dagurinn í dag                     |
   /       /       | Enn betri en gærdagurinn  |
  (   -.  |        `-----------------------------------'
  |     ) |        (\_ _/)
 (`-.  '--.)       (='.'=)
  `. )----'        (")_(")
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to