Re: [pcre-dev] PCRE compilation error on HPUX

2020-08-06 Thread ph10--- via Pcre-dev
On Mon, 3 Aug 2020, Gaurav Mittal11 wrote: > I am compiling PCRE 8.44 on HP-UX B.11.31 U ia64 with below options. > > > export CC=/opt/aCC/bin/aCC > export CFLAGS="+DD64 -mt" > export CPPFLAGS="+DD64 -mt" > export LDFLAGS="-L/usr/lib/hpux64/" > > > It is compiling successfully from my own

[pcre-dev] 10.35 released

2020-05-09 Thread ph10
I have just put 10.35 tarballs in the usual place: https://ftp.pcre.org/pub/pcre/pcre2-10.35.tar.gz https://ftp.pcre.org/pub/pcre/pcre2-10.35.tar.bz2 https://ftp.pcre.org/pub/pcre/pcre2-10.35.tar.zip Since the release candidate, there has only been one change to the library code (adding support

Re: [pcre-dev] Inconsistent behavior of some quantified groups

2020-05-04 Thread ph10
On Mon, 4 May 2020, ND via Pcre-dev wrote: > /\A(?:\1b|(?=(a)))*\z/ > ab > No match > > > Both patterns must successfully match after second iteration. Perl does exactly the same as PCRE. The problem is that analysing the pattern to discover that matching nothing in one branch might make a

Re: [pcre-dev] PCRE2 10.35-RC1 testing release is available

2020-04-24 Thread ph10
On Fri, 24 Apr 2020, Petr Pisar via Pcre-dev wrote: > I think it's a mistake in PCRE2 code. Oh, I was looking at something completely different. The PCHARS and PCHARSV macros print character strings in different bit-widths by calling appropriate width-specific functions. In many cases they are

Re: [pcre-dev] PCRE2 10.35-RC1 testing release is available

2020-04-24 Thread ph10
On Fri, 24 Apr 2020, Petr Pisar via Pcre-dev wrote: > > I have committed some revised code. Does this solve the issue? > > > It does not. The compiler is too smart (or dumb). The warning has changed > into: Oh, how annoying. There's another way of solving this, but it's more complicated, which

Re: [pcre-dev] 3x-4x slowdown in pcre_match

2020-04-23 Thread ph10
On Thu, 16 Apr 2020, enh via Pcre-dev wrote: > done. i've attached a patch that includes both the configure and cmake > bits. i tested both with CC=gcc and CC=clang (for representative > failure and success cases respectively). Patch now applied and committed. However, I did have to make a

Re: [pcre-dev] PCRE2 10.35-RC1 testing release is available

2020-04-23 Thread ph10
On Thu, 16 Apr 2020, Petr Pisar via Pcre-dev wrote: > I noticed a new warning with GCC 10: > > gcc -DHAVE_CONFIG_H -I. -I./src "-I./src" -pthread -O2 -g -pipe -Wall > -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS > -fexceptions -fstack-protector-strong

Re: [pcre-dev] 3x-4x slowdown in pcre_match

2020-04-22 Thread ph10
On Tue, 21 Apr 2020, enh via Pcre-dev wrote: > it seems to be a fairly random mix of monospaced and proportional text > for me. for example, the first line "Index:" is proportional, but then > the --- and +++ lines are monospaced, and it goes back and forth a > lot. (in both Chrome and Firefox.)

Re: [pcre-dev] 3x-4x slowdown in pcre_match

2020-04-21 Thread ph10
On Mon, 20 Apr 2020, enh via Pcre-dev wrote: > > Thank you. I will deal with this in a day or two (diverted elsewhere at > > the moment) along with several other minor tweaks that have just > > arrived. > > thanks! (i was worried that the patch got mangled by the mailing list > because it looks

Re: [pcre-dev] 3x-4x slowdown in pcre_match

2020-04-16 Thread ph10
On Wed, 15 Apr 2020, enh via Pcre-dev wrote: > -PCRE2_SPTR stack_frames_vector[START_FRAMES_SIZE/sizeof(PCRE2_SPTR)]; > +PCRE2_SPTR stack_frames_vector[START_FRAMES_SIZE/sizeof(PCRE2_SPTR)] > __attribute__((uninitialized)); > mb->stack_frames = (heapframe *)stack_frames_vector; > > i'm happy to

Re: [pcre-dev] PCRE2 10.35-RC1 testing release is available

2020-04-16 Thread ph10
On Thu, 16 Apr 2020, Petr Pisar via Pcre-dev wrote: > All tests pass with JIT enabled where available on GNU/Linux on these > platforms: Thank you. > I noticed a new warning with GCC 10: I'm still on gcc 9.3.0 (Arch Linux), which doesn't show that. I'll do something about it. Maybe Arch will

[pcre-dev] PCRE2 10.35-RC1 testing release is available

2020-04-15 Thread ph10
I've just put the 10.35-RC1 testing release here: https://ftp.pcre.org/pub/pcre/Testing/pcre2-10.35-RC1.tar.gz https://ftp.pcre.org/pub/pcre/Testing/pcre2-10.35-RC1.tar.bz2 https://ftp.pcre.org/pub/pcre/Testing/pcre2-10.35-RC1.tar.zip Bugs are fixed and there are a few new features: see NEWS and

Re: [pcre-dev] Problems and questions about locales

2020-03-20 Thread ph10
On Thu, 5 Mar 2020, I wrote: > > In dftables, add a -b option to save the table buffer after computation in > > binary format instead of the C format ; > > the file argument is unchanged. > > That is a useful idea; I will consider it. OK, I have now done that and committed the patch. While I

Re: [pcre-dev] Problems and questions about locales

2020-03-05 Thread ph10
On Wed, 4 Mar 2020, Patrice Guérin wrote: > To be consistent with the pcre2_maketables_free() function in terms of > alloc/free usage, > provide a pcre2_maketables_reserve() function : > > PCRE2_EXP_DEFN uint8_t * PCRE2_CALL_CONVENTION > pcre2_maketables_reserve(pcre2_general_context *gcontext,

Re: [pcre-dev] signal 7 (code 1) (Invalid address alignment) on 32bit device

2020-03-03 Thread ph10
On Thu, 27 Feb 2020, Dvir L via Pcre-dev wrote: > I've tried to upgrade to 8.44, and got the same result. > I didn't include the full pattern in the previous e-mail. Its something > like - [A-Za-z]{1}[A-Za-z\d_]*\. Needless to say, that works fine on my 64-bit Linux box. Sorry I can't offer any

Re: [pcre-dev] signal 7 (code 1) (Invalid address alignment) on 32bit device

2020-02-26 Thread ph10
On Wed, 26 Feb 2020, Dvir L via Pcre-dev wrote: > I'm using pcre 8.34 on a 32 bit Android device. PCRE1 (the 8.xx series) is obsolete and will probably never have another release (8.44 is recently out). All development happens in PCRE2 (the 10.xx series). As it is now 5 years since PCRE2 came

Re: [pcre-dev] Problems and questions about locales

2020-02-25 Thread ph10
On Mon, 17 Feb 2020, I wrote: > On Fri, 14 Feb 2020, Patrice Guérin wrote: > > > At my opinion, pcre2_maketables() is independant of 8/16/32 bits since it's > > defined as uint8_t (ie bytes). > > For the same reason, I think there is no endianness issue in the computation > > of the table. > >

Re: [pcre-dev] Question regarding regex complexity, catastrophic backtrack and jit/no_jit

2020-02-21 Thread ph10
On Fri, 21 Feb 2020, Kilian Kilger via Pcre-dev wrote: > Matching with jit, it was very easy to produce an example which > exceeds the available resources: We take the pattern > "(*LIMIT_MATCH=10)(x+x+x+x+)+y" and as subject we take a string of > length 10 containing only the letter "x". > >

Re: [pcre-dev] Problems and questions about locales

2020-02-17 Thread ph10
On Fri, 14 Feb 2020, Patrice Guérin wrote: > At my opinion, pcre2_maketables() is independant of 8/16/32 bits since it's > defined as uint8_t (ie bytes). > For the same reason, I think there is no endianness issue in the computation > of the table. > Saving and loading in binary should be ok. I

Re: [pcre-dev] Problems and questions about locales

2020-02-14 Thread ph10
On Thu, 13 Feb 2020, Patrice Guérin wrote: > I'm facing some problems with the locale character table definitions. Locales are a nightmare. We will all be able to rejoice when Unicode is everywhere. I'm afraid I know very little about locales, and as I'm a Linux user, I know nothing about

Re: [pcre-dev] Question regarding matching invalid unicode

2020-02-14 Thread ph10
On Fri, 14 Feb 2020, Kilian Kilger via Pcre-dev wrote: > we try to use PCRE2 to match UCS-2 encoding, i.e. UTF-16 without any > check for "broken" surrogates or any other invalid unicode. In UCS-2 > encoding every character is 2 bytes and every 2-byte sequence is > accepted as a valid character.

[pcre-dev] PCRE1 release 8.44

2020-02-12 Thread ph10
I have just put the PCRE1 8.44 release here: https://ftp.pcre.org/pub/pcre/pcre-8.44.tar.gz https://ftp.pcre.org/pub/pcre/pcre-8.44.tar.bz2 https://ftp.pcre.org/pub/pcre/pcre-8.44.tar.zip It is a year since the last PCRE1 release. There are only 7 logged changes, and only two of them fix a real

Re: [pcre-dev] pcre2_jit_match man page documentation lacking critical info

2020-02-11 Thread ph10
On Mon, 10 Feb 2020, Rob Harrison wrote: > Please can you add the information about not supporting Null Terminated > Strings to the man page for pure_jit_match to avoid others also hitting the > same brick wall? Done. Thank you for pointing out this omission; sorry that you had to spend so

Re: [pcre-dev] PCRE2 SVN 1193 requires updates to remaining testoutput8 files

2019-12-29 Thread ph10
On Tue, 24 Dec 2019, Ralf Junker wrote: > With PCRE2 SVN revision 1193, pcre2text changes the output of testinput8. > > For LINK_SIZE=2, the corresponding result files have been adjusted > accordingly. > > For LINK_SIZE=3 and LINK_SIZE=4 these files must still be updated: > >

Re: [pcre-dev] PCRE2 SVN 1193 requires updates to remaining testoutput8 files

2019-12-24 Thread ph10
On Tue, 24 Dec 2019, Ralf Junker wrote: > For LINK_SIZE=2, the corresponding result files have been adjusted > accordingly. > > For LINK_SIZE=3 and LINK_SIZE=4 these files must still be updated: Thanks for noticing that. I didn't test with those link sizes (not realizing it mattered), but this

Re: [pcre-dev] New header file location

2019-12-02 Thread ph10
On Mon, 2 Dec 2019, Ze'ev Atlas wrote: > When I started, Philip had assured me that pcre2_jit_compile.c is > indeed in src but does nothing in my context.  He had also assired me > that most of the rest of jit related code would be in src/sljit.  I do > not understand why the change of heart. 

Re: [pcre-dev] New header file location

2019-12-02 Thread ph10
On Mon, 2 Dec 2019, Zoltán Herczeg wrote: > those files are intended to be there. They are pcre-jit specific, Alongside pcre2_jit_compile.c etc... Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Win32 JIT Access Violation

2019-11-19 Thread ph10
On Tue, 19 Nov 2019, Zoltán Herczeg wrote: > Anyway I suspect Philip wants to release PCRE2 as soon as possible, so > if you don't mind we could track this down after the release. I see you have fixed this. Thanks to both of you for getting that done. Shall I go ahead with 10.34 now? Actually,

Re: [pcre-dev] Compiler warnings in JIT with NEON instructions

2019-11-16 Thread ph10
On Mon, 11 Nov 2019, Petr Pisar via Pcre-dev wrote: > Frankly I don't believe there is a way of solving it and I'd just keep the > warning there. Using C99 conformant compilers is the correct way. E.g. passing > -std=c99 to GCC with glibc fixes the warning. I'd just document it somwehere > in

[pcre-dev] Advice needed: website hosting

2019-11-14 Thread ph10
I've just discovered that the University of Cambridge web hosting service, where I've had a small web site for distributing some of my non-PCRE software, has been closed down. So ... what advice can anybody give me about finding somewhere to distribute a few software packages via a web site? I

Re: [pcre-dev] Compiler warnings in JIT with NEON instructions

2019-11-12 Thread ph10
On Tue, 12 Nov 2019, Zoltán Herczeg wrote: > Patch landed. Thank you for fixing all issues. Yes, many thanks to everybody. Do we need another RC, or should I just go ahead with a full release in a few days' time? Philip -- Philip Hazel -- ## List details at

Re: [pcre-dev] Compiler warnings in JIT with NEON instructions

2019-11-08 Thread ph10
On Thu, 7 Nov 2019, Petr Pisar via Pcre-dev wrote: > I can see GCC 4.8.5 prints these warnings on 32-bit PowerPC: > > gcc -DHAVE_CONFIG_H -I. -I./src "-I./src" -pthread -O2 -g -pipe -Wall > -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong > --param=ssp-buffer-size=4

Re: [pcre-dev] JIT fails with NEON instructions

2019-11-08 Thread ph10
On Wed, 6 Nov 2019, Sebastian Pop via Pcre-dev wrote: > Maybe we could add this to the ChangeLog? Done, and committed. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] JIT fails with NEON instructions

2019-11-06 Thread ph10
On Wed, 6 Nov 2019, Zoltán Herczeg wrote: > Philip, I think you can create another RC. There is now a new RC here: https://ftp.pcre.org/pub/pcre/Testing/pcre2-10.34-RC2.tar.gz https://ftp.pcre.org/pub/pcre/Testing/pcre2-10.34-RC2.tar.bz2

[pcre-dev] Testing Release 10.34-RC1

2019-10-17 Thread ph10
I have just made available a Release Candidate for PCRE2 10.34 here: https://ftp.pcre.org/pub/pcre/Testing/pcre2-10.34-RC1.tar.gz https://ftp.pcre.org/pub/pcre/Testing/pcre2-10.34-RC1.tar.bz2 https://ftp.pcre.org/pub/pcre/Testing/pcre2-10.34-RC1.tar.zip NOTE: this is a different FTP site than

Re: [pcre-dev] Result of pcre2_get_startchar() undefined as of SVN 1176

2019-10-16 Thread ph10
On Tue, 15 Oct 2019, Ralf Junker wrote: > As of SVN revision 1176, pcre2_get_startchar() may return an arbitrary, > undefined result. ... > Without further testing, the same problem seems to be present for JIT > matching at around line 6215. Thank you for picking this up. I have done the

Re: [pcre-dev] Unable to build PCRE 8.43 using Visual studio 2013.

2019-10-05 Thread ph10
On Thu, 3 Oct 2019, Sathish Kumar Subramani via Pcre-dev wrote: > I am trying to build PCRE 8.43 version in my windows platform using Visual > studio 2013. I am facing error with 'snprintf': identifier not found in > pcregrep.c file. Could you please provide if any patch available to build >

Re: [pcre-dev] Calculate minimum subject length

2019-08-27 Thread ph10
On Mon, 26 Aug 2019, I wrote: > On Sun, 18 Aug 2019, ND via Pcre-dev wrote: > > > May be when meet (*ACCEPT) find_minlength must simply drop further > > calculations for current branch. So the current value of "branchlength" will > > be immediately considered as a minimum length of whole branch.

Re: [pcre-dev] Calculate minimum subject length

2019-08-26 Thread ph10
On Sun, 18 Aug 2019, ND via Pcre-dev wrote: > May be when meet (*ACCEPT) find_minlength must simply drop further > calculations for current branch. So the current value of "branchlength" will > be immediately considered as a minimum length of whole branch. That was not easily possible in the

Re: [pcre-dev] Max_lookbehind issues

2019-08-20 Thread ph10
On Sat, 10 Aug 2019, ND via Pcre-dev wrote: > I would appreciate if at first we reach a consensus on these suggestions > before make any rewrite of partial matching that you gonna do. I do not intend to make any more changes to the partial matching code. I may do further updates to the

[pcre-dev] PCRE release site

2019-08-20 Thread ph10
The ftp.csx.cam.ac.uk site, from which PCRE has been distributed, has been discontinued. For various reasons this has happened rather suddenly, which is why no notice was given here. However, the site that holds all PCRE releases from 8.00 onwards remains active, and that is where new releases

Re: [pcre-dev] pcre2test allusedtext issue

2019-08-10 Thread ph10
On Sat, 27 Jul 2019, ND via Pcre-dev wrote: > /b(? abc > 0: ab >< > > Why "a" showed as text that was consulted during a successful pattern match, > but "c" not? There was a bug. I have fixed it. Thanks for noticing. Philip -- Philip Hazel -- ## List details at

Re: [pcre-dev] Max_lookbehind issues

2019-08-10 Thread ph10
On Sat, 27 Jul 2019, ND via Pcre-dev wrote: > There are some kinds of problems that exist with max_lookbehind: It was always a hack to try to make is possible to do multi-segment matching using the normal matching function, something for which it was not designed. > - bugs > - performance

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-08-08 Thread ph10
On Sat, 3 Aug 2019, ND via Pcre-dev wrote: > May be it can be useful to have ability to set a limits of lookbehind search > for performance reasons. > I can imagine a rule: If nonfixedlength lookbehind immediately preceded by > capture group, then it is restricted to start position of this group.

Re: [pcre-dev] pcre2test allusedtext issue

2019-08-01 Thread ph10
On Mon, 29 Jul 2019, 虚空幻影 via Pcre-dev wrote: > As follows is my test. > > ./pcre2test > > PCRE2 version 10.33 > > 2019-04-16 > > re> /b(? > data> abc > > No match > > data> > > > > I tried to test your case, but the result is different from yours, why? You are using 10.33.

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-08-01 Thread ph10
On Wed, 31 Jul 2019, Zoltán Herczeg wrote: > You have already convinced me to drop MOVE :) > The question is whether we keep the other construct. Or "rematching" a > capturing block in an assertion like fashion would solve this problem better. I don't think that would solve the original

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-31 Thread ph10
On Wed, 31 Jul 2019, Zoltán Herczeg wrote: > If we consider the following pattern: > /(*napla:a|a)+/ > > is the same as: > /(?:(*napla:a|a))+/ > > Then we have an empty match if I understand  correctly the behavior of > this new construct. Oh, sorry, I was thinking of

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-31 Thread ph10
On Wed, 31 Jul 2019, Zoltán Herczeg wrote: > You are right. Since you can put it into a group, it is not possible > to prevent repetitions. However the rule that empty matches break > (non-fixed) loops may solve this problem. ... but it's not an empty match. > I start to understand why perl

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-30 Thread ph10
On Tue, 30 Jul 2019, Zoltán Herczeg wrote: > > (*MOVE) is a small addition and solves ND's non-atomic assertion > > requirement. Perhaps we can just start with (*MOVE). > > Yes, if we choose this option to implement. It occurs to me that (*MOVE) gives scope for infinite loops:

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-30 Thread ph10
On Tue, 30 Jul 2019, Zoltán Herczeg wrote: > Thinking about practical use cases. With the proposed changes, doing a > submatch is quite overcomplicated: > > (*:A)submatch(*:B)(*MOVE:A)(*SETEND:B)match-submatch-again(*MOVE:B)(*SETEND) > > Perhaps the other idea, use capturing brackets for this

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-29 Thread ph10
On Mon, 29 Jul 2019, Zoltán Herczeg wrote: > > May be it not quite effective and still have restrictions but is useful. > > Is it simple to add such functionality? > > Definitely not easy in JIT. Not easy in the interpreter either. > I have an alternative solution which might be able to solve

Re: [pcre-dev] Detecting starting code units

2019-07-27 Thread ph10
On Sat, 27 Jul 2019, ND via Pcre-dev wrote: > It seems last code unit "c" is not detected and so start optimization don't > work: > > > PCRE2 version 10.34-RC1 2019-04-22 > /\Aabc/info,auto_callout > Capture group count = 0 > Max lookbehind = 1 > Compile options: auto_callout > Overall options:

Re: [pcre-dev] Partial match at end of subject

2019-07-24 Thread ph10
On Wed, 24 Jul 2019, ND via Pcre-dev wrote: > In terms of multisegment matching this may be say: partial hard match occurs > when current segment is not last and it's content not enough to exactly > determine, what match (or nomatch) would have WHOLE subject from this start > position. Yes, more

Re: [pcre-dev] Partial match at end of subject

2019-07-22 Thread ph10
On Sun, 21 Jul 2019, ND via Pcre-dev wrote: > /(?![ab]).*/ > ab\=ph > 0: > > /c*+/ > ab\=ph,offset=2 > 0: The characteristic of these is that the pattern can match an empty string. I have now added this condition (which was easily done with no repeated test) and those patterns now give

Re: [pcre-dev] Partial match at end of subject

2019-07-22 Thread ph10
On Sun, 21 Jul 2019, ND via Pcre-dev wrote: > New algorithm still have another parts of discussed oversight. For example it > returns full match instead of partial in following cases: > > /(?![ab]).*/ > ab\=ph > 0: > > /c*+/ > ab\=ph,offset=2 > 0: The answer to that may lie in thinking about

Re: [pcre-dev] Partial match at end of subject

2019-07-21 Thread ph10
I have just committed a patch that makes some small changes to the way partial matches are handled in the interpreter. I hope Zoltán will in due course pick these up for the JIT. (There are new tests at the end of testinput2 which have no_jit set at the moment.) The changes are really quite

Re: [pcre-dev] Partial match at end of subject

2019-07-18 Thread ph10
On Wed, 17 Jul 2019, ND via Pcre-dev wrote: Let us ignore for the moment whether there should be a new option or not, and try to figure out what new logic might be needed. I am going to experiment with the suggestion I made earlier: If a hard partial match is possible, return PCRE2_PARTIAL if

Re: [pcre-dev] Partial match at end of subject

2019-07-17 Thread ph10
On Mon, 15 Jul 2019, ND via Pcre-dev wrote: > This option is added ten years ago EXACTLY for multisegment matching. > Please read a very first proposal post and thread about it. Thats how > partial_hard is born: > https://lists.exim.org/lurker/message/20090524.142622.cb850f3a.en.html Your memory

Re: [pcre-dev] Detecting starting code units

2019-07-17 Thread ph10
On Sat, 13 Jul 2019, I wrote: > > May be "[^a]" can use the same algorithm as "[^ab]"? > > [^a] is optimized into a different (faster) opcode; I will see if this > can easily produce the same starting code units as [^ab] for tidyness. I > do not expect it will do much for performance. Having

Re: [pcre-dev] Subject length lower bound calculation

2019-07-16 Thread ph10
On Tue, 16 Jul 2019, ND via Pcre-dev wrote: > /(*napla:^x|^y)/I > Capture group count = 0 > May match empty string > Compile options: > Overall options: anchored > Starting code units: x y > Subject length lower bound = 0 > > We have starting code unit. Isn't Subject length lower bound must be

Re: [pcre-dev] Partial match at end of subject

2019-07-15 Thread ph10
On Mon, 15 Jul 2019, I wrote: > However, there does exist the PCRE2_NOTEOL option. At the moment, this > is applied only to the $ meta character, not \z or \Z. Perhaps it > should. Or perhaps an entirely new option PCRE2_NOTEOS (not end of subject) should be invented, to stop \z ever

Re: [pcre-dev] Partial match at end of subject

2019-07-15 Thread ph10
On Sun, 14 Jul 2019, I wrote: > I am still not entirely convinced this change should be made. And thinking about it overnight has not changed my mind. Requesting a partial match was never intended to have the implication "this is not the end segment". However, there does exist the

Re: [pcre-dev] Partial match at end of subject

2019-07-14 Thread ph10
On Sat, 13 Jul 2019, ND via Pcre-dev wrote: > At its core \z is positive lookahead assertion that want to inspect next > character of subject. I must admit I had not thought of it like that. I considered it just to be "are we at the end of the subject?". > I propose following algorithm (for

Re: [pcre-dev] Some words about assertion docs

2019-07-14 Thread ph10
On Sat, 13 Jul 2019, ND via Pcre-dev wrote: > Is there people that successfully compile PCRE2 under MSVC 2019 to tell with > them? > Is there detailed doc how compile PCRE2 with MSVC 2019? I am sorry that I cannot help, but I don't even use Windows, let alone MSVC. All the information I put in

Re: [pcre-dev] Some words about assertion docs

2019-07-14 Thread ph10
On Sat, 13 Jul 2019, Zoltán Herczeg wrote: > Somehow it doesn't feel right to call this new construct as an > "assertion", which normally checks whether a condition is true. I > think the nature of this new construct is closer to "script run" which > adds an extra task after a bracket is matched.

Re: [pcre-dev] Some words about assertion docs

2019-07-13 Thread ph10
On Sat, 13 Jul 2019, ND via Pcre-dev wrote: > Unfortunately PCRE2 svn version is not compiled for me with Microsoft Visual > Studio 2019 on Windows 7x64. Can you compile the released source versions? (There shouldn't be any difference, but I just wondered.) Philip -- Philip Hazel -- ##

Re: [pcre-dev] Partial match at end of subject

2019-07-13 Thread ph10
On Sat, 13 Jul 2019, ND via Pcre-dev wrote: > PCRE2_PARTIAL_HARD is intended for multisegment matching. I think when this > option is set it means: this subject IS incomplete, it's only a non-last part > of a certain "entire" subject. It was never intended to mean "this subject is incomplete",

Re: [pcre-dev] Detecting starting code units

2019-07-13 Thread ph10
On Sat, 13 Jul 2019, ND via Pcre-dev wrote: > PCRE try to detect starting code units in attempt to apply a start > optimization. > As we can see from next two examples, it detects starting code units for > "[^ab]", but don't doing this for "[^a]". I think it looks a bit curiously. > May be "[^a]"

Re: [pcre-dev] Partial match at end of subject

2019-07-13 Thread ph10
On Sat, 13 Jul 2019, ND via Pcre-dev wrote: > It seems this example introduce not partial matching but "regular" matching > bug. > > PCRE2 version 10.33 2019-04-16 > /(?<=(?=(?<=a)))b/ > ab > No match > > > While Perl is correctly match "b". Probably the same bug as in your previous message.

Re: [pcre-dev] Partial match at end of subject

2019-07-13 Thread ph10
On Fri, 12 Jul 2019, ND via Pcre-dev wrote: > > > > >PCRE2 version 10.33 2019-04-16 > > > >/(?<=(?=.(?<=x)))/ > > > >ab\=ph > > > >Partial match: b > > > > Why it matched b? > >Again, it has inspected at least one character, and if you add "x" itmatches. > > But I not try to add x. I inspect

Re: [pcre-dev] Some words about assertion docs

2019-07-13 Thread ph10
On Tue, 9 Jul 2019, I wrote: > I have put this on the wish list, but until I look at the code, I have > no idea whether it will be easy or straightforward to implement in the > interpreter. I will try to investigate soon. If it turns out to be > possible in the interpreter, it will up to

Re: [pcre-dev] Partial match at end of subject

2019-07-12 Thread ph10
On Fri, 12 Jul 2019, ND via Pcre-dev wrote: > This is about my second example. > But it seems first example have another issue: > > >PCRE2 version 10.33 2019-04-16 > >/(?<=(?=.(?<=x)))/ > >ab\=ph > >Partial match: b > > Why it matched b? Again, it has inspected at least one character, and if

Re: [pcre-dev] Partial match at end of subject

2019-07-12 Thread ph10
On Thu, 11 Jul 2019, ND via Pcre-dev wrote: > I guess you told about second example (in first example "x" don't adds). I > believed empty match at the end of string is not counted as partial. This is a documentation issue. Instead of "empty match" read "match in which no characters are

Re: [pcre-dev] Partial match at end of subject

2019-07-11 Thread ph10
On Wed, 10 Jul 2019, ND via Pcre-dev wrote: > PCRE2 version 10.33 2019-04-16 > /(?<=(?=.(?<=x)))/ > ab\=ph > Partial match: b > > > /(?<=.(?=x))/ > ab\=ph > Partial match: b > < > > Isn't both results should be "no match" instead of "partial match"? Why? "Partial match" means

Re: [pcre-dev] JIT don't detect endless subroutine recursion

2019-07-10 Thread ph10
On Wed, 10 Jul 2019, Zoltán Herczeg wrote: > > /(?0)/ > As far as I remember, these are detected by the parser. Some of them are detected by the parser in PCRE1, but not all of them, so there is a runtime check. Looks like I decided to leave it all to runtime in PCRE2. The error message "nested

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-09 Thread ph10
On Mon, 8 Jul 2019, ND via Pcre-dev wrote: > And if we disregards Perl's bugs then it seems (*COMMIT) in Perl works in a > following manner: > > 1. Backtracking can't move to the left of COMMIT (this is PCRE behaviour too) > 2. If COMMIT occurs then no advance match to any other position of

Re: [pcre-dev] Some words about assertion docs

2019-07-09 Thread ph10
On Sun, 7 Jul 2019, ND via Pcre-dev wrote: > If it's simple to add a Non-atomic positive lookaheads then how are you about > put it to PCRE wishlist please. > It can be looks like > (*non_atomic_positive_lookahead:...) > (*napla:...) I have put this on the wish list, but until I look at the

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-03 Thread ph10
On Tue, 2 Jul 2019, ND via Pcre-dev wrote: > It seems a Perl is so buggy or have really different conception of (*COMMIT) > then PCRE. I am waiting for further information from the Perl developers, but I suspect that I won't want to change PCRE2, except perhaps to add more detail to the

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-02 Thread ph10
On Tue, 2 Jul 2019, I wrote: > > PCRE2 version 10.33 2019-04-16 > > /\A(?:.(*COMMIT))*c/ > > abcd > > No match > > > > But Perl reports that this is successful match "abc". > > I think this is also a Perl bug and I will report it. A Perl developer has admitted there is some ambiguity, but

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-02 Thread ph10
On Tue, 2 Jul 2019, Zoltán Herczeg wrote: > Perhaps the misunderstanding comes from the fact that we are talking > about the pattern and they talk about the matching process. So (*THEN) > simply starts a backtrack, and when an alternation is encountered, it > switches to the next alternative.

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-02 Thread ph10
On Tue, 2 Jul 2019, Zoltán Herczeg wrote: > If you are right about the internal working of (*THEN), then this verb > has a very unclear and inconsistent behavior, which is very hard to > track for a user. And it totally contradicts the Perl documentation, in particular, this sentence: Note

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-02 Thread ph10
On Mon, 1 Jul 2019, ND via Pcre-dev wrote: > As you participate in Perl regex development can you take a look at another > Perl bug please: I do not participate in Perl regex development. I just report bugs when I find them, using the perlbug command. You could do this yourself. (And you seem

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-01 Thread ph10
On Sun, 30 Jun 2019, ND via Pcre-dev wrote: > PCRE2 version 10.33 2019-04-16 > /\A(?:.|..)(*THEN)c/ > abc > No match > > > Perl is match "abc". > I suppose "next innermost alternative" is interpreted differently by PCRE and > Perl. > > If so, may be PCRE should go Perl way in this matter? I

Re: [pcre-dev] Max lookbehind calculation

2019-06-25 Thread ph10
On Sun, 23 Jun 2019, I wrote: > I woke up in the middle of last night with an idea as to how it could > easily be made better, but I haven't looked at the code yet. I am busy > with other things today and tomorrow, but then I will see if my midnight > bright idea actually works. I have

Re: [pcre-dev] Start optimizations with partial match

2019-06-23 Thread ph10
On Sun, 23 Jun 2019, ND via Pcre-dev wrote: > On 2019-06-23 04:33, ND wrote: > >Or this calculations occurs at compile time while partial matching flag is > >set at matchtime? That is correct. > Oh! Now I read docs about it. > It seems that PARTIAL are compiletime option only for JIT. So it

Re: [pcre-dev] Max lookbehind calculation

2019-06-23 Thread ph10
On Sat, 22 Jun 2019, ND via Pcre-dev wrote: > It may be less unusial if we use a simple assertions: That is probably what most people do. > I agree that max lookbehind value corresponds to docs. > But this is not an end in itself. We keep in mind that max lookbehind value > calculation intended

Re: [pcre-dev] Max lookbehind calculation

2019-06-22 Thread ph10
On Sat, 22 Jun 2019, ND via Pcre-dev wrote: > PCRE2 version 10.33 2019-04-16 > /(?<=(?<=a)b)c.*/info > Capture group count = 0 > Max lookbehind = 1 > First code unit = 'c' > Subject length lower bound = 1 > abc\=ph > Partial match: bc > < > > Why max lookbehind=1, but not 2?

Re: [pcre-dev] Some words about assertion docs

2019-06-22 Thread ph10
On Sat, 22 Jun 2019, ND via Pcre-dev wrote: > Sorry for my bad English. > I need to find word that is closest to the end of text and occurs at least 10 > times in that text. Yes, I understand that now. I will think about it. Philip -- Philip Hazel -- ## List details at

Re: [pcre-dev] Document SKIP position before or equal start_offset

2019-06-22 Thread ph10
On Sat, 22 Jun 2019, ND via Pcre-dev wrote: > >If (*SKIP) is used inside a lookbehind to specify a new starting > >position... > > I suggest to remove "inside a lookbehind". > A new starting position that is not later than the starting point of the > current match may occur without lookbehind:

Re: [pcre-dev] Some words about assertion docs

2019-06-22 Thread ph10
On Sat, 22 Jun 2019, ND via Pcre-dev wrote: > Your example is not working right (let's change 10 to 3 for simplicity): > > /\A.*\b(\w++)(?>.*?\b\1\b){2}/ > word1 word1 word2 word2 word2 word1 > 0: word1 word1 word2 word2 word2 > 1: word2 > > We want to capture "word1" as most closer to the end

Re: [pcre-dev] several messages

2019-06-22 Thread ph10
On Sat, 22 Jun 2019, ND via Pcre-dev wrote: > /\A(?:a|(?=b)|.){50}\z/ > abc > 0: abc > > when engine in a strange way decides that it was exactly 50 repetitions. That is not an unlimited repeat, so there is no special action for matching an empty string. Therefore, (?=b) matches 47 times. A

Re: [pcre-dev] Some words about assertion docs

2019-06-22 Thread ph10
On Fri, 21 Jun 2019, ND via Pcre-dev wrote: > Imagine that we have a text. There are some words in this text that occurs at > least 10 times. We want to find from they a word that is most closer to the > end of text. > > If lookahead assertion is non-possessive then we can use this pattern: > >

Re: [pcre-dev] several messages

2019-06-22 Thread ph10
On Sat, 22 Jun 2019, ND via Pcre-dev wrote: > Successfull match of "X*\z" means that PCRE says: X CAN be successfully > repeated until the very end of subject (let's the match is "abc" for example). > When we use "X*" we want to say: repeat X as much as it can. Yes, but there is special

Re: [pcre-dev] several messages

2019-06-21 Thread ph10
On Mon, 17 Jun 2019, ND via Pcre-dev wrote: > Chapter ISSUES WITH MULTI-SEGMENT MATCHING of pcre2partial.html includes item > 2 with description how to process with lookbehind assertions. > > I think it's important to add to this algorithm a some words about "no match": > If result of partial

Re: [pcre-dev] Quantifying backtracking verbs

2019-06-20 Thread ph10
On Wed, 19 Jun 2019, ND via Pcre-dev wrote: > >At present, lookarounds do not take part in minimum length calculations, > > I see lookarounds takes part: first and last code units are searched in > lookarounds too. I wasn't quite precise. Lookarounds are not scanned when computing a minimum

Re: [pcre-dev] Typo in pcre2test docs about partial match

2019-06-20 Thread ph10
On Mon, 17 Jun 2019, ND via Pcre-dev wrote: > In pcre2test docs in chapter RESTARTING AFTER A PARTIAL MATCH there is > example: > > data> 23ja\=P,dfa > > What matching option "P" is? May be it should be corrected to "ph" or "ps"? Thank you. Yes, that should be "ps". This is a hangover from

Re: [pcre-dev] Document SKIP position before or equal start_offset

2019-06-20 Thread ph10
On Mon, 17 Jun 2019, ND via Pcre-dev wrote: > I don't find in docs behaviour of SKIP when corresponding position is before > or equal start_offset. > It seems that in this case a "bumpalong" advance is 1, not SKIP or associated > MARK position. Yes, that is true. The code contains this comment:

Re: [pcre-dev] Clearing documentation about infinite loops

2019-06-20 Thread ph10
On Sun, 16 Jun 2019, ND via Pcre-dev wrote: > PCRE2 version 10.33 2019-04-16 > /(?:a|(?=b)|.)*\z/ > abc > 0: abc > > May be docs need some clarification about what happened at that point. > After lookahead assertion (?=b) matches, loop is not broken. It seems a > backtracking occurs as if group

Re: [pcre-dev] Quantifying backtracking verbs

2019-06-20 Thread ph10
On Sun, 16 Jun 2019, ND via Pcre-dev wrote: > A following example was included in docs (pcre2pattern.html) : > > A(*ACCEPT)??BC > > But this example does not show what we can do with (*ACCESS)?? that can't > doing well with another PCRE facilities. > I suggest to show in docs another example

Re: [pcre-dev] Some words about assertion docs

2019-06-20 Thread ph10
On Thu, 20 Jun 2019, Zoltán Herczeg wrote: > > (?=x|y) looks much more ergonomical than (?:(?=x)|(?=y)) > > They behave the same way, so pick whatever you prefer. (?:(?=X)|(?=Y))Z means "if X matches, try to match Z; if that fails, if Y matches try to match Z". In the simple case the second

  1   2   3   4   5   >