RE: [Jprogramming] Beginner--more questions about textual data handling

Sherlock, Ric Wed, 06 Feb 2008 02:29:58 -0800

---PackRat wrote:
> Ah, if life were only so simple!  In subsequently working with the 
> actual data I need to import, I discovered there were several 
> variations in the format of the data, which creates some 
> complications 
> in creating a more general solution.  My general question is 
> whether I 
> need to have multiple scripts to handle each variation, or is 
> there a J 
> way to accommodate the variations so that the J vector/array will end 
> up being the same, regardless of the input variation?


The short answer to this is "Yes", but how to do this will depend on the
formats of the different variations. Ideally you will be able to come up
with a methodology that will work for all your variations, but in a
worst case scenario you should be able to write recognise the type of
file and use a select. case. construct to handle each separately.

> 
> Frankly, I don't remember why I added ". to that statement: 
>    list1 =: ". 'm' fread < 'C:\rfile1.txt'
> Maybe seeing an example or something??  As I recall, it had 
> to do with converting characters to numeric values

Yes ". will convert a literal number to a numeric one, but the dyadic
version is faster and more specific. See
http://www.jsoftware.com/jwiki/Guides/General_FAQ/Numbers_and_Character_
Representations


> I *do* know that I had to add x: because the numbers being 
> read in were 
> long: they were 14-digit library barcodes.  What's 
> interesting is that 
> *ALL* the documentation says J will handle up to about 16 digits 
> without flipping over to exponential notation, yet it failed already 
> with 14 digits.  As I said, interesting.

I think you can get around this by increasing the print precision
(Edit|Configure|Parameters), but for your situation you may be better
off working with the numbers as text.
 
> Well, *this* presented several challenges!  Since the earlier 
> file also 
> contained characters, I thought J would handle this data if I 
> went back 
> to the "non-numeric" reading of data:
>    list1 =: 'm' fread < 'C:\rfile1.txt'
> 
> However, apparently 'm' requires *numeric* data??  

No, 'm' fread 'c:\rfile1.txt' will work fine with literal data.

> I can't seem to 
> find verbs in J that are equivalent to the following Visual 
> Basic-like 
> commands: StringLeft  StringRight  StringMid 

See Bill Lam's reply

> Another question I see coming up shortly is how do I get J to accept 
> the fact that a terminating "x" or "X" (in the above numbers, for 
> example, or in book ISBNs) is a valid "numeric" character, being the 
> result of a base-11 check-digit algorithmic calculation?  Or 
> do I have 
> to consider these "numbers" as *strings* (of characters) instead?

Probably possible to convert this to a base 10 number if required but
depending on your needs it could just be left as text.
 
> And, if I need to think/program in terms of strings (I presume this 
> means boxed data?), will the set operations above work on boxed data, 
> too, or are other definitions needed for boxed textual data?  

Boxing strings is most useful for strings of unequal lengths.

>(My earlier experiments with this script seemed to indicated 
> that you couldn't sort boxed data or perform set operations on the 
> boxed data.

You can sort boxed data.
     /:~ '"b39928282"';'"b29392209"';'"b52343345"'
+-----------+-----------+-----------+
|"b29392209"|"b39928282"|"b52343345"|
+-----------+-----------+-----------+
or the rows of a text array
    ]tmp=. >'"b3992828x"';'"b29392203"';'"b52343343"'
"b3992828x"
"b29392203"
"b52343343"
   /:~tmp
"b29392203"
"b3992828x"
"b52343343"
 
If you want to drop the double quotes in the first and last columns you
could do

   }.@:}:"1 tmp
b3992828x
b29392203
b52343343

If your values are equal length I'd read the file into a text array
(matrix) using
  tmp=. 'm' fread <filename>
If they are unequal length then a better option would be to read the
file into a boxed list using
  tmp=. 'b' fread <filename>

> And one more question (for now!) about data massaging in J: it turns 
> out that another variation in the data is that the first data item in 
> many of the files I wish to read is a column header such as the 
> following:
>    "RECORD #(BIBLIO)"
>  How can I program J to read a file, *omitting* the first data value?

After reading into a noun using either of the above methods, you can
drop the first one using }.
You could test to see if there is a column header as the first record
and only drop it if it is, for example:
   tmp=. (+./'"RECORD' E. {.tmp)}.tmp  NB. use with array
 or
   tmp=. (+./'"RECORD' E. 0{::tmp)}.tmp NB. use with boxed list

One of the things that is nice about J is that many primitives will work
with arrays whether they are numeric or literal.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

RE: [Jprogramming] Beginner--more questions about textual data handling

Reply via email to