fredag 29 januari 2016 skrev Salman Halim <salmanha...@gmail.com>: > > On Jan 28, 2016 11:56 PM, "Chris Collision" <cfcollis...@gmail.com > <javascript:_e(%7B%7D,'cvml','cfcollis...@gmail.com');>> wrote: > > > > > > > > On Thu, Jan 28, 2016 at 8:04 PM, Rik <amphib...@gmail.com > <javascript:_e(%7B%7D,'cvml','amphib...@gmail.com');>> wrote: > >> > >> On Thursday, January 28, 2016 at 10:56:59 PM UTC-5, Chris Collision > wrote: > >> > Well-formatted text isn't the same thing as natural language; I do > not think this is doomed to fail. Is there a way to use paragraph / > sentence motions to do this "exploding"? I have played around for a few > minutes but have not had much luck. Perhaps an expert can take this > farther. > >> > > >> > > >> > On Thu, Jan 28, 2016 at 7:40 PM, Rik <amph...@gmail.com > <javascript:_e(%7B%7D,'cvml','amph...@gmail.com');>> wrote: > >> > On Wednesday, January 27, 2016 at 12:49:39 PM UTC-5, Chris Lott wrote: > >> > > >> > > I'd like to take a paragraph like the following: > >> > > >> > > > >> > > >> > > This is a paragraph. Wow! What do I do now? > >> > > >> > > > >> > > >> > > And break it into individual lines, ala: > >> > > >> > > > >> > > >> > > This is a paragraph. > >> > > >> > > Wow! > >> > > >> > > What do I do now? > >> > > >> > > > >> > > >> > > StackExchange revealed this regex that seems to work well matching > the proper lines: > >> > > >> > > [.!?][])"']*\($\|[ ]\) > >> > > >> > > > >> > > >> > > So I can do this: > >> > > >> > > :%s/[.!?][])"']*\($\|[ ]\)/XXX\r\r/g > >> > > >> > > > >> > > >> > > But obviously need something where XXX is! > >> > > >> > > > >> > > >> > > c > >> > > >> > > >> > > >> > Regex cannot handle the complexity of natural language, and thus you > are doomed to fail, Mr. > >> > > >> > > >> > > >> > Lott. > >> > > >> > > >> > > >> > -- > >> > > >> > Rik > >> > > >> > > >> > > >> > > >> > > >> > -- > >> > > >> > -- > >> > > >> > You received this message from the "vim_use" maillist. > >> > > >> > Do not top-post! Type your reply below the text you are replying to. > >> > > >> > For more information, visit http://www.vim.org/maillist.php > >> > > >> > > >> > > >> > --- > >> > > >> > You received this message because you are subscribed to the Google > Groups "vim_use" group. > >> > > >> > To unsubscribe from this group and stop receiving emails from it, > send an email to vim_use+u...@googlegroups.com > <javascript:_e(%7B%7D,'cvml','vim_use%2bu...@googlegroups.com');>. > >> > > >> > For more options, visit https://groups.google.com/d/optout. > >> > > >> > > >> > > >> > > >> > > >> > -- > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > -Collision > >> > @cfcollision | http://idontevenownatelevision.com/ | > http://tinyletter.com/collision | 503.997.1907 > >> > >> 1. Please do not top-post. > >> > >> That would be parsed by the proposed regex as > >> 1. > >> Please do not top-post. > >> > >> Doomed to fail, Mr. Lott. > >> > >> That would be parsed as: > >> Doomed to fail, Mr. > >> Lott. > >> > >> Do you see any problem here? > >> > >> -- > >> rik > >> > >> -- > >> -- > >> You received this message from the "vim_use" maillist. > >> Do not top-post! Type your reply below the text you are replying to. > >> For more information, visit http://www.vim.org/maillist.php > >> > >> --- > >> You received this message because you are subscribed to the Google > Groups "vim_use" group. > >> To unsubscribe from this group and stop receiving emails from it, send > an email to vim_use+unsubscr...@googlegroups.com > <javascript:_e(%7B%7D,'cvml','vim_use%2bunsubscr...@googlegroups.com');>. > >> For more options, visit https://groups.google.com/d/optout. > > > > > > > > Please forgive me for top-posting: I forgot that gmail does that by > default. Won't happen again. Would this incredibly difficult > counterexample of yours be solved by the venerable convention of using two > spaces after sentence-ending punctuation? > > > > > > > > -- > > -Collision > > @cfcollision | http://idontevenownatelevision.com/ | > http://tinyletter.com/collision | 503.997.1907 > > The convention of two spaces after sentences isn't something you can hold > most people to these days. It might be easier to just rejoin lines that end > in common abbreviations such as Mr., Dr., etc. > > Of course, the case of the "etc." is more interesting because it could be > the end of a sentence or it may not. Typically, if it's not the end of a > sentence, it's followed by a comma. And, if it is the end of a sentence, it > is followed by whitespace and a capital letter. > > You may also have to contend with Ave., Blvd., St. (Saint or Street). I > think it will require more logic than is afforded by a single substitute > operation, as suggested before. > > Suddenly, the two-space option looks pretty good. :) >
Most of those abbreviations begin with a capital letter and are only <= 4 letters long. While a sentence certainly may end in something like "Bob." that may be an acceptable overkill. Then you need to look out for "e.g.", "i.e.", "viz." and a few more which I can't remember off the top of my head. That should be possible with lookbehind, something like (untested since I'm AFC) s#\v%(%(%(\u\l{1,3}|i\.e|e\.g|viz)\.?)@<![.?!]\)?)@<=#\r\r#g -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.