[The original subject line was: Beginner--how to change space separator to LF?]
Ric Sherlock wrote: > If you give a simple example of the initial CRLF-separated lists and the > format of a vector post-processing it may be possible to simplify > further. Again, thanks to everyone for the several responses! I previously responded: > OK, here's a simple example: Ah, if life were only so simple! In subsequently working with the actual data I need to import, I discovered there were several variations in the format of the data, which creates some complications in creating a more general solution. My general question is whether I need to have multiple scripts to handle each variation, or is there a J way to accommodate the variations so that the J vector/array will end up being the same, regardless of the input variation? First of all, here's my starting script that will input a file of data with the format: number<CR><LF>number<CR><LF>etc.... =================================================== require 'stdlib files' NB. set operations: sort =: /:~ sortdown =: \:~ dedupe =: ~. setor =: , setand =: e. # [ setnot =: -. list1 =: x: ". 'm' fread < 'C:\rfile1.txt' list2 =: x: ". 'm' fread < 'C:\rfile2.txt' NB. list1 =: 's' fread < 'C:\rfile1.txt' NB. list2 =: 's' fread < 'C:\rfile2.txt' list1 =: dedupe sort list1 list2 =: dedupe sort list2 list3 =: list1 setnot list2 list3 =: dedupe sort list3 'courtesy of Henry Rich (J Programming Forum): list3 =: ; (LF ,~ ":)&.> list3 (toHOST list3) fwrite < 'C:\rfile3.txt' =================================================== My original "read" command was: list1 =: 'm' fread < 'C:\rfile1.txt' Frankly, I don't remember why I added ". to that statement: list1 =: ". 'm' fread < 'C:\rfile1.txt' Maybe seeing an example or something?? As I recall, it had to do with converting characters to numeric values, but, as I look again at the Dictionary, I don't see either the monadic or dyadic definitions fitting the situation. Dyadic would seem to be what I was looking for, but there's no lefthand value ahead of the ". verb. I *do* know that I had to add x: because the numbers being read in were long: they were 14-digit library barcodes. What's interesting is that *ALL* the documentation says J will handle up to about 16 digits without flipping over to exponential notation, yet it failed already with 14 digits. As I said, interesting. That verb sequence worked to read in that particular data variation. However, other data (which *appears* similar, but apparently isn't) failed to read in correctly. (As I recall, "Domain error" was generated.) This data had the following format: <">b<number><"><CR><LF><">b<number><"><CR><LF>etc.... In other words, the file data looked like this (file includes quotes!): "b15649131" "b15649192" "b1564926x" Well, *this* presented several challenges! Since the earlier file also contained characters, I thought J would handle this data if I went back to the "non-numeric" reading of data: list1 =: 'm' fread < 'C:\rfile1.txt' However, apparently 'm' requires *numeric* data?? I switched the flag to 's' (as in the NB. lines) and the read worked OK (that is, the J data looked like the 3 examples above). But now I was faced with, "How do I get rid of the extraneous quotation marks?" Or, eventually, perhaps the letter "b" as well? That's where I'm stuck at now. I know that J works wonderfully with numeric data, but any programming language ought to work just as well with textual data. (There are an awful lot of textual files and databases out there for manipulation and data mining.) As a beginner, I found it extremely challenging to find J help for handling and manipulating "real" *variable-length* textual data (in vectors and arrays) rather than numeric data. I can't seem to find verbs in J that are equivalent to the following Visual Basic-like commands: StringLeft(stringID,number) : return the leftmost <number> characters of <stringID> StringRight(stringID,number) : return the rightmost <number> characters of <stringID> StringMid(stringID,startpos,number) : return <number> characters of <stringID>, starting at the <startpos> character of <stringID>; if <number> is omitted, return the remainder of <stringID>, starting at the <startpos> character of <stringID> syntax #2: StringMid(stringID,startpos,number) = <stringID2> : starting at the <startpos> character of <stringID>, replace <number> characters of <stringID> with the first <number> characters of <stringID2>; if <number> is omitted, replace the remaining characters of <stringID> with the characters of <stringID2>; by the way, <stringID2> can be a string identifier or a literal string; important note: the replacement of characters can never go beyond the length of <stringID>! "Left", "Right", and "Mid" (both forms) are *extremely* important for textual manipulation. Are there J equivalents for these? Another question I see coming up shortly is how do I get J to accept the fact that a terminating "x" or "X" (in the above numbers, for example, or in book ISBNs) is a valid "numeric" character, being the result of a base-11 check-digit algorithmic calculation? Or do I have to consider these "numbers" as *strings* (of characters) instead? And, if I need to think/program in terms of strings (I presume this means boxed data?), will the set operations above work on boxed data, too, or are other definitions needed for boxed textual data? These set operations are extremely important for what I wish to use J for at the moment. (My earlier experiments with this script seemed to indicated that you couldn't sort boxed data or perform set operations on the boxed data. On the other hand, my J knowledge is so meager at the moment that there might be ways, but I just don't know about them yet.) And one more question (for now!) about data massaging in J: it turns out that another variation in the data is that the first data item in many of the files I wish to read is a column header such as the following: "RECORD #(BIBLIO)" Our local library automation system exports database data with column headers so that it's easy to import the data into MS Excel. However, I want to import it into J. How can I program J to read a file, *omitting* the first data value? (That is, without throwing an error message because the column header is different from the rest of the data in the file?) Or do I have to write such preliminary "data cleanup" routines in another programming language first, because J can't handle it? (I really would rather be able to do everything in J, if possible.) I should note that the data written back out at the end needs neither a column header nor quotation marks nor a recordtype prefix character (the "b" in the above sample data), although it might be nice to know if those export additions are possible. Again, as previously, any help, guidance, and insights would be very much appreciated! Thanks in advance! Harvey ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
