On Fri, Aug 1, 2008 at 9:50 AM, Matthew Brand <[EMAIL PROTECTED]> wrote:
> I was wondering whether it is possible to write an adverb that auto
> splits whatever is coming in to it on the basis of availible memory or to
> make some kind of chunkifiaction happen automatically... but don't think it
> would be easy to do. One might input a set of asserts to guide the splitting
> process and use info about the ram size to determine wether it is neccesary
> and how many splits to do.
The concept of available memory seems rather ambiguous, given that the
problem has to do with J using available virtual memory.
However, using a pre-determined block size is not all that bad. The only
issue, really, is that you want to work on complete records (complete
lines) rather than on arbitrarily sized blocks.
Anyways, I would suggest using something like this:
blocksOfLines=: 2 :0
:
NB. u: monad
NB. n: base block size
NB. y: src file name
NB. x: dest file name
blks=. (,.~ [EMAIL PROTECTED]) -.&0 n (| ,~ [ #~ <[EMAIL PROTECTED]) 1!:4<y
pfx=. ''
'' 1!:2<x
for_blk. blks do.
raw=. pfx,1!:11 y;blk
end=. raw ([EMAIL PROTECTED] 0:^:< >:@i:) LF
pfx=. end }. raw
(u end {. raw) 1!:3<x
end.
if.#pfx do.(u pfx) 1!:3<x end.
1!:4<x
)
chun=: blocksOfLines 1e6
For example:
'test.out' ([: ; <@('> ' , ]);.2) chun 'test.in'
would create a file test.out which copied everything from test.in and
prepended the character sequence '> ' in front of each line. And
it would do this against blocks of approximately 1 million characters
each. (Note that I have assumed, for this example, that the file
was well formed -- that the final character was a line feed).
FYI,
--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm