Re: search pattern on sed or grep

Tim Chase Fri, 04 May 2007 08:42:47 -0700

> I'm very sorry to bother the list with this problem but I've been 
> searching in the web the couples hours to find an answer and still 
> haven't find any.


It's a common sed problem and in the FAQ

http://sed.sourceforge.net/sed1line.txt

(scan for the "emulates grep" bit)

Fortunately, the vim list has a lot of friendly regexp wonks like
myself :)

> The problem is that I have a txt file of 3.5GB containing all the info 
> of Human chromosome 6. I want to save into one another file all lines 
> that have the pattern rs10946398 (occurring only ones). I know that vi 
> cannot handle files so big. I used ed in Fedora5 but this too cannot 
> stream it. I hope that grep or sed can do this but cannot figure how to. 
> I tried the following for sed but doesn't work:
> 
> sed '/rs10946398/p' chr6.txt
> 
>  Can someone help?

you should be able to use

  grep 'rs10946498' chr6.txt > out.txt
or either of the sed variants:
  sed -n '/rs10946398/p' chr6.txt > out.txt
  sed '/rs10946398/!d' chr6.txt > out.txt

If you misspelled and meant "occurring only once", you can then
post-process the above with

  grep 'rs10946498' chr6.txt | grep -v 'rs10946498.*rs10946498' >
out.txt

Sed might allow it in one pass with something like

  sed -e '/rs10946398/!d' -e '/rs10946398.*rs10946398/d'
chr6.txt > out.txt

The first sed variant (-n) doesn't print anything ('-n') unless
it matches the pattern.  The 2nd sed variant deletes lines that
don't match the given pattern ("!d") akin to a ":v" command in vim.

I suspect that neither vi/vim nor ed streams the file and that
both suffer similar problems in that they try to map the whole
3.5GB file into memory which is a cruel thing to do to an OS. :)

If you just want lines with the pattern in it, I'd use the grep
command.  If you want lines with the pattern in it once and only
once, I'd go with the last sed command.

My $0.02

-tim

Re: search pattern on sed or grep

Reply via email to