On 23/02/2009 8:14 PM, Kim Boulton wrote:
> Hehe, probably a combination of rubbish grep (i used regex function in a 
> text editor) and vaccuming a 4GB table at the same time.

google("scientific method") :-)

> 
> @echo off
> setlocal
> set starttime=%time%
> egrep --count 
> "(....W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.)," 
> my-30-million-rows-of-data.txt
> set stoptime=%time%
> echo Started: %starttime%
> echo Ended: %stoptime%
> 
> results in:
> 24561
> Started:  9:00:58.82
> Ended:  9:01:34.29
> 
> 36-ish seconds. obviously the regex needs a bit of work as there are 
> supposed to be around 200,000 matches.

Probably a big contributing factor is that my regex is based on you 
getting rid of the commas in the part number. If the above input file is 
in your original format, you need to sprinkle commas about madly; the 
first subpattern would become:
.,.,.,.,W,[CEF],[SZ]

Note that your average record size is 16 to 17 bytes. If you lose 6 
commas, it will be 10 to 11 bytes per record i.e. it can be reduced from 
about 500Mb to about 320Mb ... quite a useful saving in processing time 
as well as disk space.

> 
> interesting nonetheless, never used grep before...useful.

Sure is.


Cheers,
John
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to