Marty Knapp wrote:


I've been tinkering with this a bit and wanted to ask a few more questions. Again, my data set is 8 items with all but one being a number. I need to be able to select a subset by analysing 1 or more of these items. My current data set is approx 128,000 records. When I filter the data on the item that contains words, it's pretty fast - about 1 second (I have an old Mac G4-single processor 867mgz, and running Rev 2.2.1) The speed is exactly the same whether I use the above method or just 'filter theData with "*word*"'

That seems terribly slow to me :-)

I have a file with 120,000 records which look much like

2,908,597,451700,398,340,zxcv,3.5

and filtering that with "*zxcv*" takes about 80 - 85 msecs (i.e. less than a tenth of a second) on my 2-year old laptop.

(Note - only 1 in 1000 of the records match, so 120 total matches)

Doing the same thing in a repeat for loop takes slightly *less* time avg 77 ms - here's the code :

on mouseUp
   put URL ("file:D:/Our Documents/Alex/RunRev/asdf.txt") into t
   put the millisecs into tStart
   filter t with "*" & "zxcv" & "*"
   put t into fld "Field"
put the number of lines in fld "Field" && the millisecs - tStart & cr after msg
   put t into fld "Field"
put URL ("file:D:/Our Documents/Alex/RunRev/asdf.txt") into t
   put the millisecs into tStart
   repeat for each line L in t
       if "zxcv" is in L then
           put L & cr after t1
       end if
   end repeat
   put "got" & cr & t1 into fld "FIeld"
put the number of lines in fld "Field" && the millisecs - tStart & cr after msg end mouseUp


Changing that to check which item has the "zxcv" makes it even faster - 66 msec
code was:

   put the millisecs into tStart
   repeat for each line L in t
       if "zxcv" = item 7 of L then
           put L & cr after t1
       end if
   end repeat

I then tried it with a number (and using a variable instead of a constant), and it slowed down slightly to 92 msec

on tryit pNum
   put URL ("file:D:/Our Documents/Alex/RunRev/asdf.txt") into t
   put the millisecs into tStart
   repeat for each line L in t
       if  item 4 of L = pNum then
           put L & cr after t1
       end if
   end repeat
   put "got" & cr & t1 into fld "FIeld"
put the number of lines in fld "Field" && the millisecs - tStart & cr after msg end tryit

So then I tried making the item number and the value vary

Changing it to a more common match ( >= 451700 giving 11881 matches) still only took 278 msec

When I use the 'repeat for each' to evaluate for 'word' it takes 1.5 minutes. Where it really gets slow is evaluating the numbers. Most often what I need is a range of numbers, so would use greater than, less than, or both. Typically I would evaluate 2 of the numbers, but need to be able to evaluate all 8 items if needed. The numbers range from 0 to 8 digits, some whole numbers some fractional. I just did a test evaluating one number and it took 3.5 minutes. When I evaluated 2 numbers it took 5.8 minutes.

There's something odd going on.


Do these numbers sound right? Or am I being a bozo somehow! I've been thinking that I should consider using either Valentina or altSQLite. Any input there? Is one more suited, easier or ???

Those numbers don't sound right to me. Can you send a sample of the data (just 2 or 3 records) to make sure I've not misinterpreted what the data looks like ? And then maybe send the code snippet that is taking so long ....

Thanks

--
Alex Tweedly       http://www.tweedly.net



--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.12.8/162 - Release Date: 05/11/2005

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to