Bill, Rob, and Ric:
THANK YOU! THANK YOU! THANK YOU! The information you provided was
great! It'll keep me busy for a while, learning how it all works (and
why) and applying it to my current needs.
Bill Lam wrote:
> PackRat wrote:
> > "Left", "Right", and "Mid" (both forms) are *extremely* important
> > for textual manipulation. Are there J equivalents for these?
>
> This is untrue, J works extremely well for data processing.
My comments seem not to have come across as clearly as I had thought.
I was *NOT* knocking J by any means; it's an extremely powerful
language, and I am constantly amazed (from the other J forums I belong
to, but especially this one) at what it can do. I was noting that
every programming language (including J) is normally expected to be
able to handle both numeric and textual data. *My* problem was that I
was having difficulty finding descriptions and examples of textual
handling because the documentation seems so overwhelmingly numerically
oriented. (And I don't object to that--it's great, but I'd also like
to see more documentation on how to work with variable-length textual
data.) I presumed that J, of course, could do these kinds of things--
it's just that I was having a hard time trying to find the information,
or to put together the "2" and "2" that are out there to come up with
the "4" that I was looking for. As I said, "On the other hand, my J
knowledge is so meager at the moment that there might be ways, but I
just don't know about them yet." Thanks for your examples of how to do
the Left/Right/Mid string thing!
> The reason why you can not find any StringRight, StringMid is that
> it is too trivial to define cover verbs for them.
But wouldn't it be helpful to have references/pointers to these kinds
of things in the documentation for the sake of people coming from other
programming languages? Having worked with non-mathematically-oriented
adult beginners in programming, I'm very sensitive to the needs of
beginning learners. The cardinal rule of teaching is to go from the
known to the unknown; you don't drop a learner in the middle of the
ocean and say, "Sink or swim!" To say in the documentation for "Take",
for example, "For positive values of x, this is the equivalent of Left
or StringLeft commands in other programming languages" would be both
searchable and helpful for those new to J.
> Please read documentation on verb { {. }. and conjunction } for the
> details.
Very helpful--thanks! I just wish the "typical" uses were more
obviously demonstrated. The textual examples (particularly with
"Take") showed exceptional cases (for example, reversed direction
overtake) rather than the norm. Beginners need examples of the norm.
> J is more powerful than VB. how would you do this in VB?
> 5 3 2 1{'abcdef' => 'fdcb'
I'd do it with a series of concatenated Mid$ functions (the whole
statement would be rather verbose and lengthy)--J obviously is far more
concise and flexible. To create a "general case" in VB (where the "5 3
2 1" could be variable in the number of values present) would be rather
more complex, whereas it's really quite simple in J--once I know what
I'm doing! ;-)
Rob Hodgkinson wrote:
> When data is displayed it is according to the Print Precision (see
> 9!:10 '' to view the Print Precision, or see menu Edit/Configure...
> Then the Parameters tab, to set the Print Precision). This changes
> the point at which integers are displayed in exponential notation.
Thanks--I didn't know that tidbit! Boy, it sure would have been nice
if there had been a "see also" cross reference to "Print Precision" in
the x: (Extended Precision) monadic verb (and vice versa). As noted in
the current "j docs" thread in the Chat forum, current internal cross
referencing leaves a lot to be desired. (As a cataloging librarian,
such cross referencing in a library catalog is the "bread and butter" I
deal with on a daily basis, and so it's very frustrating at times to be
dealing with information where such linkages are lacking.)
> Since you now indicate the data is alphanumeric, then probably best
> to keep it as character. You can still sort the characters.
> Here is an example... [omitted]
> The key here is that it is not clear from your 'instances' of data
> what the 'general' rules are for all your data.
Well, let me describe the various data I'm currently working with in
this way (these are all vectors at this point; in the future I hope to
move on to textual arrays):
(1) Sometimes (as with my initial examples), I've already pre-massaged
the data, creating files that have purely numeric values in them (no
quotation marks, no recordtype prefixes).
(2) Sometimes, the "raw" exported data in the files might be the exact
same data, except that the numeric values are enclosed within quotation
marks (to make it easy to import into MS Excel, I presume).
(3) Sometimes, the exported data is alphanumeric (a recordtype
alphabetic character followed by completely numeric data), enclosed
within quotation marks (most likely for easy Excel import), unless I
may have pre-massaged the data. This data is a bit special, because
the numeric portions are the unique key identifiers of records in a
database, the prefix indicating which database: b=bibliographic,
i=item, p=patron, o=order, etc. Since every item in the data file has
the same prefix, it's not really necessary for work with J, and, for
file writing purposes, it's not needed in the output file either--
that's why I asked about the possibility of getting rid of it, too.
(4) Sometimes, the exported data is purely textual, enclosed within
quotation marks as textual delimiters (again, most likely for easy
Excel import).
NOTE1: The numeric portions of data classes 1, 2, and 3 above can end
with "x" (or "X") as a check digit for the accuracy of the remainder of
the number. This is most common for the record identifiers and for 10-
digit ISBN values on books. ("X" stands for a remainder of 10 when
using a MOD 11 algorithm, equivalent to "Residue" in J.)
NOTE2: The "raw" data files (that is, the ones I have not pre-massaged)
also have a first item that is a textual column header, even if the
remaining items are purely numeric (though enclosed within quotation
marks). This is for ease of import into Excel.
Do I understand you correctly when you say I should be able to sort,
dedupe, and do set union, set intersection, and set exclusion if I were
to just use character vectors/arrays all the way through? I thought I
had tried those set operations early on with no success. (On the other
hand, maybe my verb sequences weren't correct back then.)
> This whole process will be less painless if you could do the following...
> * Supply sample input file with a subset of rows that fully describe
> the data (ie 15 rows, each one an instance of all the different
> types?)
I believe this is a list of all possible variations of the data I'm
currently dealing with:
31184017063376 [14-digit barcode, pre-massaged]
"31184017063376" [same as above, but with quotation marks]
1895721156 [10-digit book ISBN]
"1895721156" [same as above, but with quotation marks]
15649131 [8-digit record identifier, pre-massaged]
b15649131 [same as above, but with recordtype prefix]
"b15649131" [same as above, but with quotation marks]
047126847X [10-digit book ISBN with check digit "X"]
"047126847X" [same as above, but with quotation marks]
1564926x [pre-massaged 8-digit record ID with check digit "x"]
b1564926x [same as above, but with recordtype prefix
"b1564926x" [same as above, but with quotation marks]
"AUTHOR: Iverson, Kenneth E." [a single item of pure textual data]
The data above that is enclosed with quotation marks also has a "non-
data" textual column header as the first datum in its file. Here's an
example of that datum:
"RECORD #(BIBLIO)"
> * Specify what you want to do to that data, how to handle 'b123x' etc
> * Specify how you want it written out
> Perhaps a precise solution could then be offered and you could query the
> different ways a solution is achieved.
Essentially, the data should look like the examples above without
quotation marks and without a recordtype prefix, but containing an "x"
check digit, if it exists. This is the data that would be manipulated
with set-related operations and which would be exported (written to
disk).
Ric Sherlock wrote:
> Yes ". will convert a literal number to a numeric one, but the dyadic
> version is faster and more specific. See
> http://www.jsoftware.com/jwiki/Guides/General_FAQ/Numbers_and_Character_
> Representations
Boy, information sure is scattered all over the place, isn't it?
Again, here is where it would be useful to have "see also" cross
references between these various locations in the documentation.
I wrote:
> > However, apparently 'm' requires *numeric* data??
>
> No, 'm' fread 'c:\rfile1.txt' will work fine with literal data.
That's good to know. For whatever reason, I was getting errors that
made me think that it might work only with numbers.
> Boxing strings is most useful for strings of unequal lengths.
That's what I thought, and it's good to know for some future data
endeavors in mind (textual arrays).
> You can sort boxed data. ...[examples omitted]
> If you want to drop the double quotes in the first and last columns you
> could do }.@:}:"1 tmp ...[example omitted]
> If your values are equal length I'd read the file into a text array
> (matrix) using
> tmp=. 'm' fread <filename>
> If they are unequal length then a better option would be to read the
> file into a boxed list using
> tmp=. 'b' fread <filename>
> After reading into a noun using either of the above methods, you can
> drop the first one using }.
> You could test to see if there is a column header as the first record
> and only drop it if it is, for example:
> tmp=. (+./'"RECORD' E. {.tmp)}.tmp NB. use with array
> or
> tmp=. (+./'"RECORD' E. 0{::tmp)}.tmp NB. use with boxed list
Wow! Great information!! Thanks!
> One of the things that is nice about J is that many primitives will
> work with arrays whether they are numeric or literal.
That's what I figured, but I just was having a darned hard time trying
to find information about textual vectors/arrays.
Again, thanks to you all for giving me so many leads to work with! All
of this information has been *SO* helpful!!
Harvey
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm