I'm trying to learn how to use J to do the kinds of data manipulation
that I currently do in R. As an example, I tried to read in a file of
quarterly home price index (HPI) data, linearly interpolate it to
monthly, and write it out.
I got a partial solution to work. The data contains series for all
states, plus the US as a while, and I'm currently just interpolating the
series for the US. Ideally, I'd generate output for all states as well,
but I couldn't figure out how to do that.
Also, this just doesn't feel "fluent." Picking the columns out into
variables seems clumsy. I suspect there's a better way to do the
interpolation. Etc.
I'd appreciate any style hints, suggestions, or pointers to examples to
read.
Here's what I came up with, with the actual data load replaced with a
literal subset of the data:
load 'tables/dsv'
load 'tables/csv'
load 'files'
NB. Load the data
NB. htStr=. readcsv '~/hpi_fiserv.csv'
htData=. 0 : 0
area,source,year,qtr,hpi
CA,Fiserv,2009,1, 128.170000
CA,Fiserv,2009,2, 128.650000
CA,Fiserv,2009,3, 135.700000
CA,Fiserv,2009,4, 138.870000
CA,Fiserv,2010,1, 138.580000
CA,Fiserv,2010,2, 143.650000
CA,Fiserv,2010,3, 142.620000
CA,Fiserv,2010,4, 138.020000
NY,Fiserv,2009,1, 176.120000
NY,Fiserv,2009,2, 176.700000
NY,Fiserv,2009,3, 179.810000
NY,Fiserv,2009,4, 178.210000
NY,Fiserv,2010,1, 174.640000
NY,Fiserv,2010,2, 178.780000
NY,Fiserv,2010,3, 179.750000
NY,Fiserv,2010,4, 174.400000
US,Fiserv,2009,1, 129.180000
US,Fiserv,2009,2, 133.180000
US,Fiserv,2009,3, 137.510000
US,Fiserv,2009,4, 136.000000
US,Fiserv,2010,1, 132.100000
US,Fiserv,2010,2, 138.230000
US,Fiserv,2010,3, 135.640000
US,Fiserv,2010,4, 130.380000
)
htStr=. fixcsv htData
ht=. makenum htStr
NB. Extract the columns, so I can manipulate them.
area=. >}.0{"1 ht
year=. >}.2{"1 ht
qtr=. >}.3{"1 ht
hpi=. >}.4{"1 ht
NB. month index (numbers of months since 2000-01)
NB. quarters are centered, 2000-Q1=2000-02=1, etc.
month=. (1+3*<:qtr)+12*year-2000
NB. Pick out the series for the US.
isUS=. area-:"1 'US'
hpiUS=. isUS#hpi
monthUS=. isUS#month
yearUS=. isUS#year
NB. For testing, pick out the values from 2010
h=. (yearUS=2010)#hpiUS
m=. (yearUS=2010)#monthUS
NB. h=. 132.1 138.23 135.64 130.38
NB. m=. 121 124 127 130
NB. Linearly interpolate to monthly.
hm=. (+/ % #)0 1 2|.!.''"0 _ ]3#h
mm=. ({.m)+i.3*#m
NB. Last two values are junk, drop them.
hm=. _2}.hm
mm=. _2}.mm
NB. Output the interpolated version to a file.
('month,hpi',LF,makecsv mm,.hm) fwrites 'test.csv'
In particular, I'm not sure how to represent a table of data. For
example, given the definitions above, would it be better to use
something like
(;:'month hpi'),:m;h
or the transpose of that? Are there any utilities around for
manipulating tables of data like that?
Thanks,
Johann
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm