Hehe, probably a combination of rubbish grep (i used regex function in a 
text editor) and vaccuming a 4GB table at the same time.

@echo off
setlocal
set starttime=%time%
egrep --count 
"(....W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.)," 
my-30-million-rows-of-data.txt
set stoptime=%time%
echo Started: %starttime%
echo Ended: %stoptime%

results in:
24561
Started:  9:00:58.82
Ended:  9:01:34.29

36-ish seconds. obviously the regex needs a bit of work as there are 
supposed to be around 200,000 matches.

interesting nonetheless, never used grep before...useful.

k

John Machin wrote:
> On 23/02/2009 5:14 PM, Kim Boulton wrote:
>   
>> Hello,
>>
>> Thanks
>>
>> The grep regex on the text file found around 10,000 lines over 5 minutes 
>> (out of a total possible 200,000 rows), at which time I stopped it, 
>> interesting experiment anyway :-)
>>     
>
> Uh-huh ... so you'd estimate that it would take 5 minutes * (200K rows / 
> 10k rows) = 100 minutes to get through the lot, correct?
>
> I tried an experiment on a 161Mb CSV file with about 1.1M 
> name-and-address-etc rows in it. Because none of the patterns in your 
> query are likely to match my data, I added an extra pattern that would 
> select about 22% of the records (ended up with 225K output rows), 
> putting it at the end to ensure it got no unfair advantage from a regex 
> engine that tested each pattern sequentially.
>
> BTW, I had to use egrep (or grep -E) to get it to work.
>
> Anyway, it took about 6 seconds. Scaling up by number of input records: 
> 6 * 30M / 1M = 180 seconds = 3 minutes. Scaling up by file size: 6 * 500 
> / 161 = 19 seconds. By number of output rows: 6 * 200 / 225 ... forget 
> it. By size of output rows: ... triple forget it.
>
> Conclusion: something went drastically wrong with your experiment. 
> Swapping? Other processes hogging the disk or the CPU? A really duff grep??
>
> Anyway, here's my environment: 2.0 GHz single-core AMD Turion (64 bit 
> but running 32-bit Windows XP SP3), using GNU grep 2.5.3 from the 
> GnuWin32 project; 1 GB memory.
>
> Cheers,
> John
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>   
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to