After reading this, I realized I've been lucky so far. I have production
systems running since 2004 which gets data from SQL server using VB and
converts it to an object array (2 dimension - rows by cols) then pushes
the data into a J session. The J session will then copy the same data
into a boxed array (inverts). I usually make another copy where I
convert all "text" columns into an entry by making a HUGE vector index
like so:
NB. Always expect the 0 { LOOKUP is an empty string
LOOKUP_z_=: '';'0';'1';'2';'3';'4';'5';'6';'7';'8';'9';' '

NB. =========================================================
NB.*searchLOOKUP v Converts a string or a boxed string into a number
NB.
NB. y is: <string>|<boxed string>
NB. return: numeric list
NB. NOTE: uses LOOKUP_z_ as string storage
searchLOOKUP=: 3 : 0
data=. boxopen y
LOOKUP_z_=: ~. LOOKUP_z_ , ~. data
LOOKUP_z_ i. data
)

So for example, I may have the following data
   ['DEPT USAGE DIVISION'=. i. 3
0 1 2
   [data=. (;: 'HR IT HR ACCT HR');100 100 300 100 50;<;: 'DIV1 DIV1
DIV2 DIV2 DIV2'
+------------------+------------------+--------------------------+
|+--+--+--+----+--+|100 100 300 100 50|+----+----+----+----+----+|
||HR|IT|HR|ACCT|HR||                  ||DIV1|DIV1|DIV2|DIV2|DIV2||
|+--+--+--+----+--+|                  |+----+----+----+----+----+|
+------------------+------------------+--------------------------+

NB. So to convert to numeric values
   searchLOOKUP each data {~ DEPT,DIVISION
+--------------+--------------+
|14 15 14 16 14|17 17 18 18 18|
+--------------+--------------+

NB. Now the global variable has the following values
   LOOKUP_z_
++-+-+-+-+-+-+-+-+-+-+-+--+--+----+----+----+
||0|1|2|3|4|5|6|7|8|9| |HR|IT|ACCT|DIV1|DIV2|
++-+-+-+-+-+-+-+-+-+-+-+--+--+----+----+----+

NB. Now create a numeric copy of data
NB. My personal preference is convert it to a 2 dimensional matrix (rows
X cols)
   [datan=. |:>(searchLOOKUP each data {~ keys) (keys=. DEPT,DIVISION) }
data
14 100 17
15 100 17
14 300 18
16 100 18
14  50 18

NB. Since I already have all the text into the LOOKUP variable, I can
just erase the 
NB. original "data" variable and save some memory space
   erase 'data'
1

With all the data to be processed as numbers and maintaining only unique
text values into a global vector ... I have never HIT the 1GB memory
limit of J on a 32 bit machine. Just yesterday, I was informed of a
problem where the VB system slowed down since it was trying to bring
down 75,000 records with 28 columns from a server in Cebu, Philippines
(the client is here in DG, China) ... gave it to J. Then J turns around
and gives the VB object 120,000 rows to insert. Hehehehe. It brought the
VB.NET client down (I had to apply a patch where VB.Net would loop
through the result of J and insert by 5,000 records). 

In retrospect, one of the contributing factors for my luck is that I've
design the J application server to be used atomically. Which means I
don't keep/use 1 J session running. Instead, I would create an instance
of J for each client request then as soon as the J process is done ... I
release it from memory. Which means my front end system is a Windows
Service written in C++. Of course, during peak times (like the week
before the 15th and 30th  of the month) ... the server would be running
multiple sessions of J.

Actually, starting last year, with the upgrades in hardware by the
factory ... I've been slowly moving the processing to the client machine
and effectively off-loading process from the server. There are drawbacks
since the clients have lower specs than the server. The biggest one is
that the operation is now a bit slower but the overhead of transporting
data between the client machine, web server, database server and the J
application server was cut. This works well for me since my application
runs on WAN/Intranet where the bandwidth is expensive.

Final note, my clients are usually running the cheapest 32 bit clone
machines with 512MB ram on Windows XP Pro. There are still some Windows
98 and Windows Me (eeewwwww) and some MacOSX G4 running Windows 98 and
XP on Microsoft Virtual PC. 

r/Alex

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Robert Raschke
Sent: Wednesday, February 27, 2008 6:24 PM
To: [email protected]
Subject: Re: [Jprogramming] hardware limits

Here's how I pull data from a DB, stash it in column based binary J
files, and map them using jmf.  After that I can just use the inverted
table mechanisms explained so nicely on the J wiki.

I would assume you can do similar with your CSV problem.


require 'dates dd files jmf printf'

DSN =: 'dsn=something;uid=user;pwd=password'

STMT =: 'SELECT date_field, float_field, int_field FROM a_table'

i2b =: 2 & (3!:4)
writeints =: (fappend~ i2b)~

f2b =: 2 & (3!:5)
writefloats =: (fappend~ f2b)~

getdata =: 4 : 0
        stmt =. x
        dsn =. y
        ch =. ddcon dsn
        'Executing SQL:\n%s' printf <stmt
        sh =. stmt ddsel ch
        ddbind sh
        total =. 0
        while. 0 <: ddfetch sh do.
                rcnt =. 0 { ddrow sh            NB. ddrow appears to
return an array of one element!
                'Saving %d rows.' printf <rcnt
                (tsrep dts_jdd_ rcnt {. dddata sh,1) writefloats
'DATE.ijn'
                (rcnt {. , dddata sh,2) writeints 'INT.ijn'
                (rcnt {. , dddata sh,3) writefloats 'FLOAT.ijn'
                total =. total + rcnt
        end.
        ddend sh
        dddis ch
        'Saved a total of %d rows.' printf <total
        total
)

mapdata =: 3 : 0
        JFL map_jmf_ 'DATE';'DATE.ijn'
        JINT map_jmf_ 'INT';'INT.ijn'
        JFL map_jmf_ 'FLOAT';'FLOAT.ijn'
)

STMT getdata DSN
mapdata ''

+/ INT

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to