I am struggling with sed and gawk but I guess that it'd be possible to
employ vim in the command line (it's to make a script that will be
automatically launched every 24 hours) but I don't have any idea of
how to do it...
How could I select the blocks (see file ahead) of a text file (say
SSSS.txt) in which some particular words appear?
Imagine that I want to keep the blocks containing words like "black",
"supermassive", "red", "intermediate", "relativistic"...
and delete the rest of blocks (and also the header and bottom of the file)
Well, my first thought would be to have a destroyable copy
of the text:
cat file | vim -
Then, clean up the stuff we don't want
1,/received/d
$?^\s*For subscribe options?,$d
to strip off the header and footer.
My first-pass solution will end up with duplicate results if
more than one of your keywords appear in the same "block"
but on diff. lines:
:let @a=''
:g/red\|relativistic/?^\s*astro-ph?,/^\s*astro-ph/-y A
:%d
:put a
:1d
:wq name_of_output.txt
You can alter that 2nd line for whatever keywords you want:
red\|relativistic\|black\|supermassive\|intermediate
If case doesn't matter, you can tack "\c" onto your search
pattern to ignore case:
red\|black\|supermassive\c
I don't know how it behaves with branching, so you might
have to wrap the whole thing in parens first to make them
all case-insensitive (maybe not):
\(red\|black\|supermassive\)\c
If you want to highlight your hits as well, you can tweak it
like
:g/red\|relativistic/s!!<b>&</b>!g|?^\s*astro-ph...
which, given that you seem to want to HTMLize your results
(as hinted at below), will bold each hit.
What would be the command line with vim? (or are there other possibilities?)
While you could hack all that into a command line, it might
be easier to put those lines in a script, say "foo.vim", and
then just source that script on the command line:
cat input.txt | vim -s foo.vim -
I would also like how to reemplace the
astro-ph/0604565 with <a href=" http://xxx.lanl.gov/pdf/astro-ph/0604565</a>
for all numbers, not only for 0604565 ...
after the ":1d" (that's "one dee", not "ell dee") line, you
could put something like
:%s!^\s*astro-ph/\(\d\+\)!<a
href="http://xxx.lanl.gov/pdf/astro-ph/\1">&</a>
(all on one line in case my mailer bungs it). Your HTML was
a little funky there, so I made some assumptions and cleaned
it up a little: The "\1" in the replacement is the number,
and the "&" in the replacement is the whole original text
(the "astro-ph:#######" bit), so you'll have an HTML link
with the original text as the clickable bit.
I'm sorry I couldn't come up with a clean way to snag just
the unique paragraphs easily without having an instance show
up as its own result-block.
Anyways, it's at least one sorta-solution to what you describe.
-tim