------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1315 Summary: \r, \n and $ matching seems to be illogical or not fully documented. Product: PCRE Version: N/A Platform: All OS/Version: All Status: NEW Severity: bug Priority: high Component: Code AssignedTo: [email protected] ReportedBy: [email protected] CC: [email protected] Hi. I'm a bit confused by the following, which I can't either explain by what's documented in pcrepattern(3)'s "NEWLINE CONVENTIONS" section. I'm under UNIX, so $ == LF $ file=cr_at_file_end $ hd $file 00000000 41 0d |A.| 00000002 $ pcregrep '\r[^\n]' $file ; echo $? 1 ====> WHY? THere is no \n after the 0d $ pcregrep '\r[^$]' $file ; echo $? 1 ====> WHY? I guess a line terminated not be the end-of-line character(s) but rather by the end-of-file is also considered to match $ there. (*) $ pcregrep '\n' $file ; echo $? 1 => CLEAR $ pcregrep '\r' $file ; echo $? 0 => CLEAR $ pcregrep '$' $file ; echo $? 1 ====> WHY? If the above (*) is true, than this should also match, right? $ file=cr_not_at_file_end $ hd $file 00000000 41 0d 41 |A.A| 00000003 $ pcregrep '\r[^\n]' $file ; echo $? A0 => CLEAR $ pcregrep '\r[^$]' $file ; echo $? A0 => CLEAR $ pcregrep '\n' $file ; echo $? 1 => CLEAR $ pcregrep '\r' $file ; echo $? A0 => CLEAR $ pcregrep '$' $file ; echo $? 1 ====> WHY? If the above (*) is true, than this should also match, right? $ file=lf_at_file_end $ hd $file 00000000 41 0a |A.| 00000002 $ pcregrep '\r[^\n]' $file ; echo $? 1 => CLEAR $ pcregrep '\r[^$]' $file ; echo $? 1 => CLEAR $ pcregrep '\n' $file ; echo $? 1 ====> WHY? There is a 0a. $ pcregrep '\r' $file ; echo $? 1 => CLEAR $ pcregrep '$' $file ; echo $? 1 ====> WHY? There even IS the end-of-line character, not to talk about (*) $ file=lf_not_at_file_end $ hd $file 00000000 41 0a 41 |A.A| 00000003 $ pcregrep '\r[^\n]' $file ; echo $? 1 => CLEAR $ pcregrep '\r[^$]' $file ; echo $? 1 => CLEAR $ pcregrep '\n' $file ; echo $? 1 ====> WHY? There is a 0a. $ pcregrep '\r' $file ; echo $? 1 => CLEAR $ pcregrep '$' $file ; echo $? 1 ====> WHY? If the above (*) is true, than this should also match, right? $ file=crlf_at_file_end $ hd $file 00000000 41 0d 0a |A..| 00000003 $ pcregrep '\r[^\n]' $file ; echo $? 1 => CLEAR $ pcregrep '\r[^$]' $file ; echo $? 1 => CLEAR $ pcregrep '\n' $file ; echo $? 1 ====> WHY? There is a 0a. $ pcregrep '\r' $file ; echo $? A 0 => CLEAR $ pcregrep '$' $file ; echo $? 1 ====> WHY? There even IS the end-of-line character, not to talk about (*) $ file=crlf_not_at_file_end $ hd $file 00000000 41 0d 0a 41 |A..A| 00000004 $ pcregrep '\r[^\n]' $file ; echo $? 1 => CLEAR $ pcregrep '\r[^$]' $file ; echo $? 1 => CLEAR $ pcregrep '\n' $file ; echo $? 1 ====> WHY? There is a 0a. $ pcregrep '\r' $file ; echo $? A 0 => CLEAR $ pcregrep '$' $file ; echo $? 1 ====> WHY? There even IS the end-of-line character, not to talk about (*) Now as you can see, behaviour is a bit strange: - (*) It seems that the last line of the file is implicitly appended by an end-of-line character (which is fine of course) when there is none. - sometimes, \n behaves as $, sometimes not; IMHO... WTF?! - single "$" is not matched regardless of whether there is an end of line character or not So either this is wrong or better said illogical behaviour, or there might be something missing (hope I haven't overseen anything) in the documentation. In the later case I'd guess at least: I. That the _current_ end of line character(s) are implicitly added to the file's end when there are none II. Why single '$' doesn't match (which is especially weird if (I) is true) III. Why \n sometimes behaves like $, sometimes not... (of course the same for the other end of line sequences) Thanks, Chris. -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
