fredag 29 januari 2016 skrev Salman Halim <salmanha...@gmail.com>:

>
> On Jan 28, 2016 11:56 PM, "Chris Collision" <cfcollis...@gmail.com
> <javascript:_e(%7B%7D,'cvml','cfcollis...@gmail.com');>> wrote:
> >
> >
> >
> > On Thu, Jan 28, 2016 at 8:04 PM, Rik <amphib...@gmail.com
> <javascript:_e(%7B%7D,'cvml','amphib...@gmail.com');>> wrote:
> >>
> >> On Thursday, January 28, 2016 at 10:56:59 PM UTC-5, Chris Collision
> wrote:
> >> > Well-formatted text isn't the same thing as natural language; I do
> not think this is doomed to fail.  Is there a way to use paragraph /
> sentence motions to do this "exploding"?  I have played around for a few
> minutes but have not had much luck.  Perhaps an expert can take this
> farther.
> >> >
> >> >
> >> > On Thu, Jan 28, 2016 at 7:40 PM, Rik <amph...@gmail.com
> <javascript:_e(%7B%7D,'cvml','amph...@gmail.com');>> wrote:
> >> > On Wednesday, January 27, 2016 at 12:49:39 PM UTC-5, Chris Lott wrote:
> >> >
> >> > > I'd like to take a paragraph like the following:
> >> >
> >> > >
> >> >
> >> > > This is a paragraph. Wow! What do I do now?
> >> >
> >> > >
> >> >
> >> > > And break it into individual lines, ala:
> >> >
> >> > >
> >> >
> >> > > This is a paragraph.
> >> >
> >> > > Wow!
> >> >
> >> > > What do I do now?
> >> >
> >> > >
> >> >
> >> > > StackExchange revealed this regex that seems to work well matching
> the proper lines:
> >> >
> >> > > [.!?][])"']*\($\|[ ]\)
> >> >
> >> > >
> >> >
> >> > > So I can do this:
> >> >
> >> > > :%s/[.!?][])"']*\($\|[ ]\)/XXX\r\r/g
> >> >
> >> > >
> >> >
> >> > > But obviously need something where XXX is!
> >> >
> >> > >
> >> >
> >> > > c
> >> >
> >> >
> >> >
> >> > Regex cannot handle the complexity of natural language, and thus you
> are doomed to fail, Mr.
> >> >
> >> >
> >> >
> >> > Lott.
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Rik
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > --
> >> >
> >> > You received this message from the "vim_use" maillist.
> >> >
> >> > Do not top-post! Type your reply below the text you are replying to.
> >> >
> >> > For more information, visit http://www.vim.org/maillist.php
> >> >
> >> >
> >> >
> >> > ---
> >> >
> >> > You received this message because you are subscribed to the Google
> Groups "vim_use" group.
> >> >
> >> > To unsubscribe from this group and stop receiving emails from it,
> send an email to vim_use+u...@googlegroups.com
> <javascript:_e(%7B%7D,'cvml','vim_use%2bu...@googlegroups.com');>.
> >> >
> >> > For more options, visit https://groups.google.com/d/optout.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > -Collision
> >> > @cfcollision | http://idontevenownatelevision.com/ |
> http://tinyletter.com/collision | 503.997.1907
> >>
> >> 1. Please do not top-post.
> >>
> >> That would be parsed by the proposed regex as
> >>   1.
> >>   Please do not top-post.
> >>
> >> Doomed to fail, Mr. Lott.
> >>
> >> That would be parsed as:
> >>   Doomed to fail, Mr.
> >>   Lott.
> >>
> >> Do you see any problem here?
> >>
> >> --
> >> rik
> >>
> >> --
> >> --
> >> You received this message from the "vim_use" maillist.
> >> Do not top-post! Type your reply below the text you are replying to.
> >> For more information, visit http://www.vim.org/maillist.php
> >>
> >> ---
> >> You received this message because you are subscribed to the Google
> Groups "vim_use" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an email to vim_use+unsubscr...@googlegroups.com
> <javascript:_e(%7B%7D,'cvml','vim_use%2bunsubscr...@googlegroups.com');>.
> >> For more options, visit https://groups.google.com/d/optout.
> >
> >
> >
> > Please forgive me for top-posting: I forgot that gmail does that by
> default.  Won't happen again.  Would this incredibly difficult
> counterexample of yours be solved by the venerable convention of using two
> spaces after sentence-ending punctuation?
> >
> >
> >
> > --
> > -Collision
> > @cfcollision | http://idontevenownatelevision.com/ |
> http://tinyletter.com/collision | 503.997.1907
>
> The convention of two spaces after sentences isn't something you can hold
> most people to these days. It might be easier to just rejoin lines that end
> in common abbreviations such as Mr., Dr., etc.
>
> Of course, the case of the "etc." is more interesting because it could be
> the end of a sentence or it may not. Typically, if it's not the end of a
> sentence, it's followed by a comma. And, if it is the end of a sentence, it
> is followed by whitespace and a capital letter.
>
> You may also have to contend with Ave., Blvd., St. (Saint or Street). I
> think it will require more logic than is afforded by a single substitute
> operation, as suggested before.
>
> Suddenly, the two-space option looks pretty good. :)
>

Most of those abbreviations begin with a capital letter and are only <= 4
letters long. While a sentence certainly may end in something like "Bob."
that may be an acceptable overkill. Then you need to look out for "e.g.",
"i.e.", "viz." and a few more which I can't remember off the top of my
head. That should be possible with lookbehind, something like (untested
since I'm AFC)

s#\v%(%(%(\u\l{1,3}|i\.e|e\.g|viz)\.?)@<![.?!]\)?)@<=#\r\r#g

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to