Dennis,

I through together a real rough test- just some tables with random values 0-1000.

SELECT (v10000 + v2500) FROM values10000,values2500 LIMIT 500000;

This ran for me in about 10-15 seconds off of a G4 XServe with 1GB memory. Note that I only had enough free memory to do chunks of 500,000 so it looks like around a minute for all 2,500,000 combinations.

This with all of the database on-disk and no optimizations / tweaking... so it should be an upper limit.

How is the most recent scripted solution timing out for you?

- Brian

Brian,

Can you create a 10000 by 2500 by 100 array of random numbers (sparsely populated i.e. some values are null) in SQL?

Can you put the sum of two random accessed numbers (except if one is null then the sum is null) into a third random location?

How fast can you do this last thing 25000000 times in SQL.
This is the simplest of the things I am doing.
If you can do that in SQL in less than a minute, you've got my attention :-)


Dennis

On Apr 13, 2005, at 12:36 PM, Brian Yennie wrote:

Dennis,

I guess it's half of each, as I see it (and I misread some of this I think).
You only need sequential access to the lines and items, but random access would solve your problems. Random access is even faster than sequential, and can do everything sequential can.


From your syntax example:

access each line X in arrayX--initial setup of pointers and X value
access each item Y in arrayY --initial setup of pointers and Y value
repeat for number of lines of arrayX times --same as a repeat for each
put X & comma & Y & return after ArrayXY --merged array
next line X --puts the next line value in X
next item Y --if arrayY has fewer elements than arrayX, then empty is supplied, could also put "End of String" in the result
end repeat

In SQL:

SELECT CONCAT(lineValue, ',' , itemValue) FROM lines, items ORDER BY lineNumber, itemNumber;

Give your DB enough RAM to work with and this will fly. The fields will be indexed once- that's what you are basically saying by "access" ... "index". Then they'll be retrieved super fast and the sort should be trivial if the records are already in order in your database. You could work with a dozen tables at once if you wanted.

If your still committed to a scripted solution, could you post some of your actual code- even if the calculations are hidden? Are you using nested repeat for each? This group (myself included) has been known to tweak these sorts of things pretty well in the past when there is real code to work with...


- Brian


Thanks Brian,

I don't require random access to the data. I only need sequential access. That is why the repeat for each operator works so fast --less than a microsecond per data item. I'm not going to match that with anything other than RAM.

Dennis

On Apr 12, 2005, at 10:06 PM, Brian Yennie wrote:

Dennis,

I have to agree with Pierre here. If you are looking for random access to many thousands of records taking up gigabytes of memory, a database engine is, IMO, the only logical choice.

A simple MySQL/PostgreSQL/Valentina/etc database indexed by line number (or stock symbol) would be very fast.

Without indexing your data or fitting all of it into random-access in-memory data structures, you're fighting a painful battle. If you algorithm is scaling out linearly, you'll just run too slow, and if your data size is doing the same you'll run out of memory. On the other hand, database engines can potentially handle _terabytes_ of data and give you random access in milliseconds. You simply won't beat that in Transcript.

One thing you could consider if you don't want a whole database engine to deal with, is the feasibility of indexing the data yourself - which will give you some of the algorithmic benefits of a database engine. That is, make one pass where you store the offsets of each line in an index, and then use that to grab lines. Something like (untested):

## index the line starts and ends
put 1 into lineNumber
put 1 into charNum
put 1 into lineStarts[1]
repeat for each char c in tData
    if (c = return) then
       put (charNum - 1) into lineEnds[lineNumber]
       put (charNum + 1) into lineStarts[lineNumber + 1]
       add 1 to lineNumber
    end if
    add 1 to charNum
end repeat
if (c <> return) then put charNum into lineEnds[lineNumber]

## get line x via random char access
put char lineStarts[x] to lineEnds[x] of tData into lineX

- Brian

Thanks Pierre,

I considered that also. A Database application would certainly handle the amount of data, but they are really meant for finding and sorting various fields, not for doing the kind of processing I am doing. The disk accessing would slow down the process.

Dennis

On Apr 12, 2005, at 5:27 PM, Pierre Sahores wrote:


_______________________________________________ use-revolution mailing list [email protected] http://lists.runrev.com/mailman/listinfo/use-revolution



_______________________________________________ use-revolution mailing list [email protected] http://lists.runrev.com/mailman/listinfo/use-revolution


_______________________________________________ use-revolution mailing list [email protected] http://lists.runrev.com/mailman/listinfo/use-revolution



_______________________________________________ use-revolution mailing list [email protected] http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to