Thanks Brian,

I don't require random access to the data. I only need sequential access. That is why the repeat for each operator works so fast --less than a microsecond per data item. I'm not going to match that with anything other than RAM.

Dennis

On Apr 12, 2005, at 10:06 PM, Brian Yennie wrote:

Dennis,

I have to agree with Pierre here. If you are looking for random access to many thousands of records taking up gigabytes of memory, a database engine is, IMO, the only logical choice.

A simple MySQL/PostgreSQL/Valentina/etc database indexed by line number (or stock symbol) would be very fast.

Without indexing your data or fitting all of it into random-access in-memory data structures, you're fighting a painful battle. If you algorithm is scaling out linearly, you'll just run too slow, and if your data size is doing the same you'll run out of memory. On the other hand, database engines can potentially handle _terabytes_ of data and give you random access in milliseconds. You simply won't beat that in Transcript.

One thing you could consider if you don't want a whole database engine to deal with, is the feasibility of indexing the data yourself - which will give you some of the algorithmic benefits of a database engine. That is, make one pass where you store the offsets of each line in an index, and then use that to grab lines. Something like (untested):

## index the line starts and ends
put 1 into lineNumber
put 1 into charNum
put 1 into lineStarts[1]
repeat for each char c in tData
    if (c = return) then
       put (charNum - 1) into lineEnds[lineNumber]
       put (charNum + 1) into lineStarts[lineNumber + 1]
       add 1 to lineNumber
    end if
    add 1 to charNum
end repeat
if (c <> return) then put charNum into lineEnds[lineNumber]

## get line x via random char access
put char lineStarts[x] to lineEnds[x] of tData into lineX

- Brian

Thanks Pierre,

I considered that also. A Database application would certainly handle the amount of data, but they are really meant for finding and sorting various fields, not for doing the kind of processing I am doing. The disk accessing would slow down the process.

Dennis

On Apr 12, 2005, at 5:27 PM, Pierre Sahores wrote:


_______________________________________________ use-revolution mailing list [email protected] http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to