On Jul 14, 9:05 am, Chris <[EMAIL PROTECTED]> wrote: Misleading subject.
[] brackets or "square brackets" {} braces or "curly brackets" () parentheses or "round brackets" > I'm trying to delimit sentences in a block of text by defining the > end-of-sentence marker as a period followed by a space followed by an > uppercase letter or end-of-string. ... which has at least two problems: (1) You are insisting on at least one space between the period and the end-of-string (this can be overcome, see later). (2) Periods are often dropped in after abbreviations and contractions e.g. "Mr. Geo. Smith". You will get three "sentences" out of that. > > I'd imagine the regex for that would look something like: > [^(?:[A-Z]|$)]\.\s+(?=[A-Z]|$) > > However, Python keeps giving me an "unbalanced parenthesis" error for > the [^] part. It's nice to know that Python is consistent with its error messages. > If this isn't valid regex syntax, If? It definitely isn't valid syntax. The brackets should delimit a character class. You are trying to cram a somewhat complicated expression into a character class, or you should be using parentheses. However it's a bit hard to determine what you really meant that part of the pattern to achieve. > how else would I match > a block of text that doesn't the delimiter pattern? Start from the top down: A sentence is: anything (with some qualifications) followed by (but not including): a period followed by either 1 or more whitespaces then a capital letter or 0 or more whitespaces then end-of-string So something like this might do the trick: >>> sep = re.compile(r'\.(?:\s+(?=[A-Z])|\s*(?=\Z))') >>> sep.split('Hello. Mr. Chris X\nis here.\nIP addr 1.2.3.4. ') ['Hello', 'Mr', 'Chris X\nis here', 'IP addr 1.2.3.4', ''] -- http://mail.python.org/mailman/listinfo/python-list