For the collective wisdom:

In trying to figure out why some newer data I'm working with is making
Oleg's (Apr 13 forum msg) and Henry Rich's (Dec 18 forum msg)
tacit/explicit scripts fail, I think I've come to the conclusion
(perhaps/probably incorrect) that the scripts (rather than the data) are
where the problem lies.

Both scripts seem to fail when there is a "key" column and more than a
single column of values--that is, when there are multiple columns of
values.  What I need the script(s) to do is to successively append new
columns of data using the first ("key") column to determine which rows
get new values in the new column.

Of course, the central all-important command is "y=. x m} y", but I
can't seem to figure out how to change only PART of a row.  (I do
realize that one can change a part of a row and then amend the entire
row--"how?" is the question.)  Essentially, what I need to be able to do
is to use the LAST ("_1") column of one array to alter the LAST column
of another array, but only if the "key" values in the FIRST columns are
the same (according to the value of "m" above, which lists all row
numbers where this is true).

To exactly describe my scenario, a new column of zeros is created for an
existing sorted "master" array of one or more columns (let's say the new
column is a 5th column) that contains all possible expected "key" values
in its first column.  Then the values of the last (2nd) column of a new
2-column sorted array fill that 5th (last) column of the first array
when the keys in their first columns are the same.  (Note: both arrays
contain unique key values, and the second, usually smaller array
contains only SOME or perhaps even all of the keys in the first array.)
That's my need, but, unless I'm doing something wrong, these scripts (by
Oleg and Henry) seem not to do that, although that was the intent of my
original request way back.  To help understand what I'm doing, consider
the existing array to be a cumulative array of monthly data (each column
is a month's data) and the new array is a new month's data to be
appended.  The challenge is that the new month's data has zero
suppression involved (in other words, if a given row's value is zero
that month, it's suppressed and does not appear in the data
file)--that's why amending needs to happen on matched keys in the first
column of each array.  (It's not simply an append operation.)

As previously requested in this forum for problem solving purposes, here
are some sample data files and scripts to demonstrate the problem:

========================================================================
=

File "~user\data\testmaster.csv":
"b18934225"
"b18934286"
"b18934304"
"b18934468"
"b18935618"
"b1893741x"

File "~user\data\testnew1.csv":
"b18934225","08-28-2008"
"b18934286","08-28-2008"
"b18934468","09-01-2008"
"b1893741x","10-14-2008"

File "~user\data\testnew2.csv":
"b18934225","12-17-2008"
"b18934304","10-21-2008"
"b1893741x","11-25-2008"

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

NB.  I have the following verbs in a script
NB.  named "~user\temp\test.ijs":


require 'files strings csv'

fillin=: 4 : 0
NB. source: Oleg Kobchenko (J Programming Forum, 13 Apr 2008)
x [`(i.&:(0&{"1)~)`] } y
)

update=: 4 : 0
NB. source: Henry Rich (J Programming Forum, 18 Dec 2008)
NB. (as modified according to Oleg's debugging suggestions)
yy=. (0 {"1 y)
xx=. (0 {"1 x)
updrows=. yy i. xx
y=. x updrows } y
)

fillitin=: 4 : 0
NB. this is my script that calls the above as part
NB. of what I'm trying to accomplish
NB. (some commands are expanded for data tracking purposes)
d1=. y
NB. n is the number of data columns AFTER the first:
n=. (1{$d1)-1
d2=. x
d2=. |: d2
NB. add as many columns of literal zeros to d2 as d1 has data columns:
d3=. |: d2, ( n # ,: ((1{$d2) $ <'0') )
NB. d1 fillin d3    NB. Oleg's tacit code
d1 update d3    NB. Henry's explicit code
)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

NB. direct commands in the J interpreter:


load '~user\temp\test.ijs'

NB. read/load "master" file for subsequent processing:
tblall=: readcsv (jpath '~user\data\testmaster.csv')

NB. read/load component files:
tbl1=: readcsv (jpath '~user\data\testnew1.csv')
tbl2=: readcsv (jpath '~user\data\testnew2.csv')

NB. merge component files into a single file using "master" (tblall):
]tbla=. tblall fillitin tbl1   NB. works fine
]tblb=. tbla fillitin tbl2     NB. this fails because >2 columns?

========================================================================
=

Based on trying to figure out what the dictionary seemed to mean for
various things, I tried command variations like
   y=. (_1{"1 x) updrows } (_1{"1 y)
or
   y=. x (updrows _1)} y
in order to deal with the LAST columns only, but no luck.  At best, I
got ONLY a last column without anything else.

I'm stumped at this point, and I really need to be able to do what I
described earlier in this message.  Can anyone suggest some avenues
toward a solution?  Thanks in advance!

Harvey

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to