I am struggling with sed and gawk but I guess that it'd be possible to
employ vim in the command line (it's to make a script that will be
automatically launched every 24 hours) but I don't have any idea of
how to do it...

How could I select the blocks (see file ahead) of a text file (say
SSSS.txt) in which some particular words appear?
Imagine that I want to keep the blocks containing words like "black",
"supermassive", "red", "intermediate", "relativistic"...
 and delete the rest of blocks (and also the header and bottom of the file)

Well, my first thought would be to have a destroyable copy of the text:

   cat file | vim -

Then, clean up the stuff we don't want

   1,/received/d
   $?^\s*For subscribe options?,$d

to strip off the header and footer.

My first-pass solution will end up with duplicate results if more than one of your keywords appear in the same "block" but on diff. lines:

   :let @a=''
   :g/red\|relativistic/?^\s*astro-ph?,/^\s*astro-ph/-y A
   :%d
   :put a
   :1d
   :wq name_of_output.txt


You can alter that 2nd line for whatever keywords you want:

   red\|relativistic\|black\|supermassive\|intermediate

If case doesn't matter, you can tack "\c" onto your search pattern to ignore case:

   red\|black\|supermassive\c

I don't know how it behaves with branching, so you might have to wrap the whole thing in parens first to make them all case-insensitive (maybe not):

   \(red\|black\|supermassive\)\c

If you want to highlight your hits as well, you can tweak it like

:g/red\|relativistic/s!!<b>&</b>!g|?^\s*astro-ph...

which, given that you seem to want to HTMLize your results (as hinted at below), will bold each hit.

What would be the command line with vim? (or are there other possibilities?)

While you could hack all that into a command line, it might be easier to put those lines in a script, say "foo.vim", and then just source that script on the command line:

   cat input.txt | vim -s foo.vim -

I would also like how to reemplace the

astro-ph/0604565 with <a href=" http://xxx.lanl.gov/pdf/astro-ph/0604565</a>

for all numbers, not only for 0604565 ...

after the ":1d" (that's "one dee", not "ell dee") line, you could put something like

:%s!^\s*astro-ph/\(\d\+\)!<a href="http://xxx.lanl.gov/pdf/astro-ph/\1";>&</a>

(all on one line in case my mailer bungs it). Your HTML was a little funky there, so I made some assumptions and cleaned it up a little: The "\1" in the replacement is the number, and the "&" in the replacement is the whole original text (the "astro-ph:#######" bit), so you'll have an HTML link with the original text as the clickable bit.

I'm sorry I couldn't come up with a clean way to snag just the unique paragraphs easily without having an instance show up as its own result-block.

Anyways, it's at least one sorta-solution to what you describe.

-tim




Reply via email to