Re: which is faster for searching?

Alex Tweedly Sun, 31 Aug 2014 10:11:58 -0700


Hi Hugh,

Sorry - I'll be more assertive and simplistic in my description - at aslight risk of over-stating the case.


No, the second variable method does definitely uses extra memory.

What I was saying was that that can be unimportant, because extra memory"simply" means that your OS has to do more paging, and that is OK solong as we satisfy two conditions

1. your max virtual memory is big enough (and it's been a decade sincethat *should* be a problem, other than maybe on phones, RPi, etc).

2. your access to that memory is primarily serial (i.e. not much pagingaround) which it will be in this case.

(and of course 3. you remember to either "delete variable xxx" or "putempty into xxx" so the memory can be removed/garbage collected soonafter you've done with it).

And I then claimed/explained that this memory/paging overhead wasinsignificant because either of the other ways can cause tremendousamounts of data copying - and the time to do that would *far* outweighthe memory overhead.

I'll try to create the speed test code to demonstrate that and send itlater tonight.


-- Alex

On 31/08/2014 09:37, FlexibleLearning.com wrote:

Hi Alex

Am struggling to follow your comments and conclusions. Are you saying it
would be more memory efficient to store the output in a second variable...

repeat for each line L in tVar
     if <data condition on  line x of tVar> then
       put L &cr after stdout
     end if
end repeat
if last char of stdout=cr then delete last char of stdout
return stdout

If there are no matching lines (worst case), then this would require a 100%
memory overhead.

I am obviously mis-reading you.

Hugh Senior
FLCo


From: Alex Tweedly <a...@tweedly.net>
To: use-livecode@lists.runrev.com
Subject: Re: which is faster for searching?

Two comments ....

1. The two methods below produce different results - the latter one
removes lines that were empty in the original input data, the other
doesn't (unless that happens to match "data condition on line x").

2. In the case of large (or huge) datasets, Hugh's concern about
avoiding memory overhead may be valid - but those cases are exactly the
ones where you most need to be concerned about efficiency.

If, for example, the input data consists of 10,000,000 lines and perhaps
each line was 100 chars, then we have 1Gb of data. Let's say we retain
50% of the lines (say every alternate one) - then the "repeat for each"
method adds 1/2 Gb of virtual memory, and requires copying only 1/2Gb of
data during this operation.

However, although both the methods below don't add the virtual memory
requirement, they copy a LOT of data - each time we delete (or empty) a
line, that causes all the subsequent data to be copied 'in place', so it
will require (approx) 2,500,000 Gb of data copying (5M * 1/2Gb average
remaining data size). So we copy 2.5Pb of data - that's going to cost us
a whole lot more time than any paging needed for 1/2Gb of extra virtual
memory.

-- Alex.


On 30/08/2014 08:45, FlexibleLearning.com wrote:

Peter Haworth <p...@lcsql.com> wrote

There's another situation where I use repeat with even though it's a

little

slower than repeat for and I also alter the contents of the data I'm
repeating through without any problems.

repeat with x=the number of lines in tVar down to to 1
     if <data condition on  line x of tVar> then
        delete line x of tVar
     end if
end repeat

This is an insightful observation. Nice one, Pete!

My stock method (and presumably the method you allude to above) is...

repeat for each line L in tVar
      add 1 to x
      if <data condition on  L> then put "" into line x of tVar
end repeat
filter tVar without empty

Both methods operate on a single data set and avoid putting the output

into

a second variable which, for large datasets, involve an unnecessary memory
overhead..

Hugh Senior
FLCo



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: which is faster for searching?

Reply via email to