Re: [pcre-dev] Capture not reset inside recursion

2021-06-06 Thread Zoltán Herczeg
a pretty nice Perl bug, maybe you could report it to them. Regards, Zoltan Eredeti levél Feladó: Zoltán Herczeg < hzmes...@freemail.hu (Link -> mailto:hzmes...@freemail.hu) > Dátum: 2021 június 6 07:21:30 Tárgy: Re: [pcre-dev] Capture not reset inside recursion Címzett:

Re: [pcre-dev] Capture not reset inside recursion

2021-06-05 Thread Zoltán Herczeg
The title is misleading, that feature is a JavaScript thing: /(?:(a)b|\1)+/ matches aba in Perl, but not in JavaScript. Anyway it looks like the problem here is ()? clears the capturing bracket in Perl when the empty case is selected while restores its previous value in PCRE2. Matching

Re: [pcre-dev] Question about .*

2021-05-11 Thread Zoltán Herczeg
I have two comments: The .* matches to an emtpy string, so it can match to the position of "\r" in a "\r" string. The length of the match is 0, so it does not match to the "\r" itself. The PCRE2_DOTALL option allows dot to match to a newline. Regards, Zoltan Eredeti levél

Re: [pcre-dev] Need help with implementing a parser (binary operators) - pcre2

2020-12-25 Thread Zoltán Herczeg
Hi, in my experiences this is the easiest way to parse expressions: main-expression = pre-primary-expression primary-expression post-primary-expression pre-primary-expression - mostly unary operators, new operator, etc. primary-expression - identifiers, keywords (e.g. this/true/null),

Re: [pcre-dev] PCRE2 10.36 Released

2020-12-05 Thread Zoltán Herczeg
I remember a pretty serious bug that was fixed in JIT so I recommended an update to everybody who use JIT. Regards, Zoltan Eredeti levél Feladó: Philip Hazel via Pcre-dev < pcre-dev@exim.org (Link -> mailto:pcre-dev@exim.org) > Dátum: 2020 december 4 15:42:30 Tárgy:

Re: [pcre-dev] Strangely long matching times. Could anyone help to explain?

2020-11-30 Thread Zoltán Herczeg
> However, it is surprising that JIT times grow linearly with subject > length, whereas the interpreter's grow exponentially. I posted a link which explains the reason of the difference. The /aa.*?bba/ is converted to /aa.*?(*SKIP)bba/ internally, and the latter can be executed in linear time.

Re: [pcre-dev] Strangely long matching times. Could anyone help to explain?

2020-11-29 Thread Zoltán Herczeg
Hi, is this measured with JIT enabled? I wrote an introduction about the JIT compiler before: https://zherczeg.github.io/sljit/pcre2_jit.html The single character optimization described in the paragraph containing the (*SKIP) verb should handle it. Regards, Zoltan Eredeti levél

Re: [pcre-dev] Preparation for using JIT under S390

2020-11-14 Thread Zoltán Herczeg
Hi Ze'ev, you can follow the discussion here: https://github.com/zherczeg/sljit/issues/89 The current aim of the project is Linux only, and no plans for mainframe support. Regards, Zoltan Eredeti levél Feladó: Ze'ev Atlas via Pcre-dev < pcre-dev@exim.org (Link ->

Re: [pcre-dev] Release Candidate 10.36-RC1

2020-11-09 Thread Zoltán Herczeg
Hi Petr, yes, it is way too experimental to enable it in production. Regards, Zoltan   Eredeti levél Feladó: Petr Pisar via Pcre-dev < pcre-dev@exim.org (Link -> mailto:pcre-dev@exim.org) > Dátum: 2020 november 9 13:42:03 Tárgy: Re: [pcre-dev] Release Candidate 10.36-RC1

Re: [pcre-dev] Getting crash when searching binary data with case-insensitive option

2020-09-15 Thread Zoltán Herczeg
Hi, besides the input, could you upload a minimized single .c file where we can reproduce what you are exaclty doing. I don't have mac, but a single file program should work the same way. Regards, Zoltan   Eredeti levél Feladó: Thomas Tempelmann via Pcre-dev <

Re: [pcre-dev] [Bug 2635] Port PCRE2 JIT to Linux on IBMz (s390x)

2020-08-26 Thread Zoltán Herczeg
Hi  Ze'ev, the only OS dependent part of JIT is the executable code allocator. Now the compiler has 3 of them, one might work on your system. Regards, Zoltan Eredeti levél Feladó: Ze'ev Atlas via Pcre-dev < pcre-dev@exim.org (Link -> mailto:pcre-dev@exim.org) > Dátum: 2020

[pcre-dev] Executable allocator question: temporary files

2020-04-22 Thread Zoltán Herczeg
Hi, PCRE2 has an optional executable allocator, which allocates temporary files on the disk, where it stores machine executable code. The tmp directory can be (optionally) set using the TMPDIR environment variable. The temporary file is created by open(... O_TMPFILE  ...) or mkostemp(). There

Re: [pcre-dev] Question regarding regex complexity, catastrophic backtrack and jit/no_jit

2020-02-23 Thread Zoltán Herczeg
> Matching with jit, it was very easy to produce an example which > exceeds the available resources: We take the pattern > "(*LIMIT_MATCH=10)(x+x+x+x+)+y" and as subject we take a string of > length 10 containing only the letter "x". Philip summarizes this well. In case of your example, 5 x-s

Re: [pcre-dev] New header file location

2019-12-01 Thread Zoltán Herczeg
> Could they be safely moved to the correct location please. Hi, those files are intended to be there. They are pcre-jit specific, Regards, Zoltan   -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Win32 JIT Access Violation

2019-11-19 Thread Zoltán Herczeg
Ok for me. If anybody has time, please test the latest code. Thank you, Zoltan Eredeti levél Feladó: p...@hermes.cam.ac.uk (Link -> mailto:p...@hermes.cam.ac.uk) Dátum: 2019 november 19 17:26:12 Tárgy: Re: [pcre-dev] Win32 JIT Access Violation Címzett: Zoltán Herczeg <

Re: [pcre-dev] Win32 JIT Access Violation - bisect results

2019-11-19 Thread Zoltán Herczeg
hink it is related. It feels like uninitialized memory access or buffer overrun. I hope you can make use of this information. Feel free to ask fore more. Many thanks, Ralf PS: Please release whenever you and Philip feel is the right time. On 19.11.2019 08:48, Zoltán Herczeg wrote: > I suspect someth

Re: [pcre-dev] Win32 JIT Access Violation

2019-11-19 Thread Zoltán Herczeg
ev] Win32 JIT Access Violation Címzett: pcre-dev@exim.org < pcre-dev@exim.org (Link -> mailto:pcre-dev@exim.org) > On 15.11.2019 09:01, Zoltán Herczeg wrote: > thank you for the report. I don't have C++Builder, so I would need > some help. The pattern is quite big, is it possibl

Re: [pcre-dev] Win32 JIT Access Violation

2019-11-15 Thread Zoltán Herczeg
Hi Ralf, thank you for the report. I don't have C++Builder, so I would need some help. The pattern is quite big, is it possible to simplify it? Also bisecting the change which broke it would be a great help as well. Regards, Zoltan   Eredeti levél Feladó: Ralf Junker <

Re: [pcre-dev] Compiler warnings in JIT with NEON instructions

2019-11-12 Thread Zoltán Herczeg
> > Please find attached the patch with recommended fixes from Petr. > > > Thanks. It fixes all the ARM JIT warnings. Patch landed. Thank you for fixing all issues. Regards, Zoltan   -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] JIT fails with NEON instructions

2019-11-06 Thread Zoltán Herczeg
tum: 2019 november 6 06:25:22 Tárgy: Re: [pcre-dev] JIT fails with NEON instructions Címzett: Zoltán Herczeg < hzmes...@freemail.hu (Link -> mailto:hzmes...@freemail.hu) > Hi Zoltán, please find attached a patch that fixes the problem. For utf-16 and utf-32 the vectorization factor is a

Re: [pcre-dev] JIT fails with NEON instructions

2019-11-04 Thread Zoltán Herczeg
with NEON instructions Címzett: Zoltán Herczeg < hzmes...@freemail.hu (Link -> mailto:hzmes...@freemail.hu) >   The problem occurs on Ubuntu 18.04 and A-72 as well when configuring with --enable-pcre2-16 and --enable-pcre2-32 each exposes one fail. I will send a patch to fix these i

Re: [pcre-dev] JIT fails with NEON instructions

2019-10-30 Thread Zoltán Herczeg
Hi Sebastian, could you check this failure? Regards, Zoltan Eredeti levél Feladó: Petr Pisar via Pcre-dev < pcre-dev@exim.org (Link -> mailto:pcre-dev@exim.org) > Dátum: 2019 október 30 18:25:42 Tárgy: Re: [pcre-dev] JIT fails with NEON instructions Címzett: pcre-dev@exim.org

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-08-02 Thread Zoltán Herczeg
 Hi, > I was faced with a need of nonfixed length lookbehind two times: > 1. when data came by stream of 24kB blocks and I need to find a last >numeric in each of it > /.{24000}(?<=(\d++)\D*+)/g Even if this would work, the result of this would be always the last position of the subject, and

[pcre-dev] Apple + JIT

2019-08-02 Thread Zoltán Herczeg
Hi, recently I have seen from Mac users that the JIT does not work on newer OS-es. I just found the following link, where PCRE is explicitly mentioned: https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_security_cs_allow-jit Does this help? If yes, it would be

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-08-01 Thread Zoltán Herczeg
> I think MOVE verb like a goto operator in programming languages impacts > the clarity of pattern structure and make it error-prone. It is > undesirable in my opinion. Yes. The idea has already dropped. > I think it will be better to use standard "(?<=)" for lookbehinds, not The problem is

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-31 Thread Zoltán Herczeg
> > You are right. Since you can put it into a group, it is not possible > > to prevent repetitions. However the rule that empty matches break > > (non-fixed) loops may solve this problem. > ... but it's not an empty match. If we consider the following pattern: /(*napla:a|a)+/ is the same as:

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-31 Thread Zoltán Herczeg
> as normal groups, not as assertion groups. What happens when they are > repeated must be defined - or maybe they should not be allowed to > repeat, because once again that might be an easy way to infinite loops. You are right. Since you can put it into a group, it is not possible to prevent

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-30 Thread Zoltán Herczeg
> Let me see if I understand that: does (*match:{1}pattern) mean "apply > the pattern to the string that is currently captured by group 1"? > Without looking at the interpreter code, I'm not sure if this is easy or > hard to implement. This is just throwing some ideas. We could also decide that

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-30 Thread Zoltán Herczeg
this: (submatch)(*match:{1}pattern) is easier. Inside the {}, a name can be presented as well. The (*MOVE) could be kept for moving the string pointer around. Let me know your opinion. Regards, Zoltan   Eredeti levél Feladó: Zoltán Herczeg < hzmes...@freemail.hu (Link -> mailto

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-29 Thread Zoltán Herczeg
> > (*SETEND:mark_name) > >   - This verb changes the end position to the position recorded by the last > > mark which name is > mark_name. If the position is smaller than the current string position, it is > set to the current string > position. > By "end position" do you mean "end of subject"?

Re: [pcre-dev] Remove some restrictions of lookbehind assertions

2019-07-29 Thread Zoltán Herczeg
> May be it not quite effective and still have restrictions but is useful. > Is it simple to add such functionality? Definitely not easy in JIT. I have an alternative solution which might be able to solve many of the issues raised here. We already have (*SKIP:name), so we need to record

Re: [pcre-dev] Partial match at end of subject

2019-07-15 Thread Zoltán Herczeg
> I am still not entirely convinced this change should be made. Zoltán, > what do you think? It would involve making changes to JIT, of course. I don't really see where we are heading now, and the situation feels chaotic. I wouldn't increase maintenace burden without strong reasons. Regards,

Re: [pcre-dev] Some words about assertion docs

2019-07-13 Thread Zoltán Herczeg
> It turned out to be very easy to implement in the interpreter, but there > was quite a lot of necessary but straightforward work to add new opcodes > and process them in the various scans of compiled patterns. Also, the > documentation took some time. Somehow it doesn't feel right to call this

Re: [pcre-dev] JIT don't detect endless subroutine recursion

2019-07-10 Thread Zoltán Herczeg
> /(?0)/ As far as I remember, these are detected by the parser. Regards, Zoltan   -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-03 Thread Zoltán Herczeg
> A Perl developer has admitted there is some ambiguity, but suggests that > (*COMMIT) just means "never advance the starting point". That pattern > can find a match without advancing the starting point. The documentation says two rules: 1) It's a zero-width pattern similar to (*SKIP) , except

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-02 Thread Zoltán Herczeg
> Note that if this operator is used and NOT inside of an alternation > then it acts exactly like the "(*PRUNE)" operator. > But it doesn't. Perhaps the misunderstanding comes from the fact that we are talking about the pattern and they talk about the matching process. So (*THEN) simply starts a

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-02 Thread Zoltán Herczeg
If you are right about the internal working of (*THEN), then this verb has a very unclear and inconsistent behavior, which is very hard to track for a user. I think it should made obsolete and removed eventually. Regards, Zoltan   -- ## List details at

Re: [pcre-dev] JIT regression

2019-06-28 Thread Zoltán Herczeg
> Sorry but I can't build pcre2test for Windows with Visual Studio directly > from svn. There are no simple "press and play" possibility to build. Docs > about building by VS are miserly and obscure. Some optimizations are not enabled on Windows compared to Linux which affects your test case.

Re: [pcre-dev] JIT regression

2019-06-25 Thread Zoltán Herczeg
Hi, > It seems JIT is 16 times!! slower than interpreter for such simple pattern. I did some improvements on the SSE2 accelerated search and /(?s).*/ search. You can try them now. However I have never seen such big differences in my measurements. The memchr can use AVX2, so it is still faster,

Re: [pcre-dev] Some words about assertion docs

2019-06-20 Thread Zoltán Herczeg
> (?=x|y) looks much more ergonomical than (?:(?=x)|(?=y)) They behave the same way, so pick whatever you prefer. Regards, Zoltan   -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Some words about assertion docs

2019-06-19 Thread Zoltán Herczeg
> But why assertions are atomic? I guess answer is: "Because it does in > Perl". But why? Assertions are like "if" statements in structured languages. A condition part of an "if" is never retried. Regards, Zoltan   -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Query Regarding GNUWin32 License used in PCRE-6.3

2019-06-05 Thread Zoltán Herczeg
>  We found that PCRE internally uses GNUWin32. Where? Regards, Zoltan   -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] JIT regression

2019-05-27 Thread Zoltán Herczeg
Hi, that is strategical difference. You don't know the input from the pattern, and your input has no a-d characters. The interpreter only searches 'a', while jit searches two characters: 'a' and 'd' which distance is two. The latter is more complicated, but works better for random input. You

Re: [pcre-dev] PCRE2 10.33-RC1 available for testing

2019-03-05 Thread Zoltán Herczeg
ink -> mailto:airw...@gmail.com) > Dátum: 2019 március 5 10:55:27 Tárgy: Re: [pcre-dev] PCRE2 10.33-RC1 available for testing Címzett: Zoltán Herczeg < hzmes...@freemail.hu (Link -> mailto:hzmes...@freemail.hu) >   Hi Zoli, thanks for clarification, On Tue, Mar 05, 2019 at 10:44:36AM +0100, Zo

Re: [pcre-dev] PCRE2 10.33-RC1 available for testing

2019-03-05 Thread Zoltán Herczeg
Hi Ervin, sparc64 has not supported yet, and I have no plans to do it in the foreseeable future. However, if somebody is interested to do it, just submit a patch to the JIT compiler: https://github.com/zherczeg/sljit I will migrate the code to pcre/pcre2 after it is landed. If somebody knows

[pcre-dev] Announcing the repan project

2019-02-20 Thread Zoltán Herczeg
Hi, there have been many requests on this list which were regular expression related but not exactly pcre related. We have been worked on some of them, e.g. glob matching, but we usually haven't fully finished them and honestly never felt like they should be part of pcre. I always suggested

Re: [pcre-dev] PCRE_STUDY_JIT_COMPILE option bug?

2019-01-26 Thread Zoltán Herczeg
Hi Ervin, The result value is misinterpreted by showresult (). PCRE_ERROR_NOMATCH is -1, not 0. The 0 value represents that a match is found, but the ovector is too small to store all capturing bracket positions. So increasing #define OVCOUNT from 30 to 40 "solves" the JIT case. For some

Re: [pcre-dev] Extracting trigrams from patterns: opinions wanted

2019-01-03 Thread Zoltán Herczeg
Hi, my opinion is that this is kind of beyond the scope of pcre. For some time (years actually, but I had no time to start it) I have been thinking that the world would need a regex optimizer framework. It could do a lot of things except pattern matching. It would build an AST from the

[pcre-dev] Support invalid UTF subject strings by PCRE2-JIT

2018-09-17 Thread Zoltán Herczeg
Dear PCRE2 users, since PCRE 10.32 has been released, it is time for announcing a new major feature for PCRE2-JIT: supporting invalid UTF subject strings. This feature can be enabled by passing PCRE2_JIT_INVALID_UTF option to pcre2_jit_compile(). It is recommended to use pcre2_jit_match()

Re: [pcre-dev] Serialization format versioning

2018-06-25 Thread Zoltán Herczeg
> I can understand that. But I would point out that PCRE2's current notion > of serialization is quite limited compared to what that word usually > implies (cf. Java, .NET object serialization), so this is not likely to > be the only time that an application developer finds the functionality >

Re: [pcre-dev] Serialization format versioning

2018-06-22 Thread Zoltán Herczeg
Hi, > In my use case, however, the application has binary data files > [containing serialized regexes] under /usr/share/foo/, and no provision > is available to cache under /var/, nor any other writable disk location. > PCRE2 can be updated at any time due to security vulnerabilities, but > the

Re: [pcre-dev] Serialization format versioning

2018-06-21 Thread Zoltán Herczeg
Hi, to tell the truth, when the serialization was created the use case we were discussing was different from the use case below. I consider serialized forms inherently unsecure. I would never recommend to accept any regexes in binary forms for any application. Instead, I would recommend to

Re: [pcre-dev] Support for invalid UTF-8 strings?

2018-04-14 Thread Zoltán Herczeg
Hi, there are many problems with invalid utf8 strings. Here is some thoughts from previous discussions: - What is type of an invalid character. Is 0xe9 a latin é letter, or something else? - What is a lowercase/uppercase pair of an invalid character. - Moving around: if you start a match from

Re: [pcre-dev] Non-capturing group overhead (100x)

2017-12-04 Thread Zoltán Herczeg
Hi, do you use JIT? The engine has special single character optimizations to make /.*/ fast. The generic /(?:anything)*/ is much slower. The numbers below are reasonable. E.g. using capturing bracket is the slowest: /(.)*/, since you need to store extra data. Yes, you can make pattern

Re: [pcre-dev] Extracting trigrams from PCRE syntax

2017-11-28 Thread Zoltán Herczeg
Hi, for some time I have been thinking about creating a regex optimizer which does everything except matching. It could analyze and optimize patterns. Unfortunately I don't have time working on it., so it is unlikely it will be available in the foreseeable future :( Regards, Zoltan  

Re: [pcre-dev] How am I supposed to use PCRE2 JIT in the face of (*NO_JIT) ?

2017-11-23 Thread Zoltán Herczeg
Hi, > This was a patch to Git as you might have guessed: > https://public-inbox.org/git/20171122133630.18931-2-ava...@gmail.com/T/#u > Do you think I need to go back and rework that so I do #1 as well? A > pedantic reading of the manpage would probably suggest so, but I can't > see how the JIT

Re: [pcre-dev] How am I supposed to use PCRE2 JIT in the face of (*NO_JIT) ?

2017-11-22 Thread Zoltán Herczeg
Hi, hm, interesting problem. I don't remember how this should work. Anyway, I checked the code: pcre2_jit_compile: if ((re->flags & PCRE2_NOJIT) != 0) return 0; So it returns with success if NOJIT is present. That is perhaps misleading. I wouldn't mind changing this. pcre2_jit_match: if

Re: [pcre-dev] Git 2.14.0 released with PCRE 2 support

2017-08-09 Thread Zoltán Herczeg
Hi, this is great news! Thank you very much for the hard work on the integration! Regards, Zoltan   Eredeti levél Feladó: Ævar Arnfjörð Bjarmason via Pcre-dev < pcre-dev@exim.org (Link -> mailto:pcre-dev@exim.org) > Dátum: 2017 augusztus 9 01:47:52 Tárgy: [pcre-dev] Git 2.14.0

Re: [pcre-dev] PCRE2 10.30-RC1 test release

2017-07-23 Thread Zoltán Herczeg
Hi, I fixed the PPC warning. Thank you for the report. I will not be available this week, and can fix other issues next week. Regars, Zoltan   -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] How do I support pcre1 JIT on all versions?

2017-06-01 Thread Zoltán Herczeg
I would simply use the PCRE version number to detect jit_exec at compile time: #if (PCRE_MAJOR 8 || (PCRE_MAJOR 8 PCRE_MINOR 32)) #define PCRE_JIT_EXEC_AVAILABLE #endif Thatll only work if pcre isnt compiled with --disable-jit, if it is even on the latest svn trunk linking

Re: [pcre-dev] How do I support pcre1 JIT on all versions?

2017-05-31 Thread Zoltán Herczeg
Hi, I would simply use the PCRE version number to detect jit_exec at compile time: #if (PCRE_MAJOR 8 || (PCRE_MAJOR 8 PCRE_MINOR 32)) #define PCRE_JIT_EXEC_AVAILABLE #endif Regards, Zoltan Eredeti levél Feladó: Ævar Arnfjörð Bjarmason avarabgmail.com (Link -

Re: [pcre-dev] I'm adding PCRE v2 support to Git. It's a bit slower than v1

2017-04-18 Thread Zoltán Herczeg
>>>I.e. no difference in v1 & v2 anymore. The log case though shows pcre1 >>>being 2% faster than pcre2: I don't know the reason then, needs more investigation. Btw I have started to improve the first character search optimization in JIT. It is still in progress (corner cases), although you can

Re: [pcre-dev] I'm adding PCRE v2 support to Git. It's a bit slower than v1

2017-04-13 Thread Zoltán Herczeg
Hi, >JIT is enabled in Debian on pcre builds for some architectures (roughly, >those where it's believed to work ;-) ). It was true in old-pcre (Debian >calls this pcre3) at the point I took over looking after pcre for >Debian, and so I made it so for pcre2 also. This is great news indeed! The

Re: [pcre-dev] I'm adding PCRE v2 support to Git. It's a bit slower than v1

2017-04-11 Thread Zoltán Herczeg
Hi, >I couldn't find out how to get the compile flags for those, but >presumably it's some comparable middle-of-the-road value, probably >-O2. I'll try compiling from svn & report back. Likely -O2. I am surprised that JIT is enabled in default builds. I am sure it wasn't in the past. >> This

Re: [pcre-dev] I'm adding PCRE v2 support to Git. It's a bit slower than v1

2017-04-10 Thread Zoltán Herczeg
Hi Ævar, this is really awesome news! I am happy that you choose pcre for git. >I did some basic performance benchmarks between v1 and v2 of PCRE. >Depending on whether we use git-grep or git-log v2 is 1% to 10% slower >than v1 when both use JIT. I would like to see the compilation flags for

Re: [pcre-dev] [Bug 2030] Exact match mode

2017-02-06 Thread Zoltán Herczeg
In case of anchored matches, there is no need to scan the input for possible starting characters. Several other simplifications (optimizations) are possible as well. So the generated code is different. You can compile the pattern twice though and use the appropriate one. Regards, Zoltan

Re: [pcre-dev] PCRE2 on Coverity Scan

2017-01-17 Thread Zoltán Herczeg
...@gmail.com> írta: >Hello, > >On Tue, Dec 20, 2016 at 4:03 PM, Zoltán Herczeg <hzmes...@freemail.hu> wrote: >> Personally I wouldn't mind transferring the ownership of the project, so if >> you wish to be the owner, I can give the it to you. And you can decide how &

Re: [pcre-dev] PCRE2 on Coverity Scan

2016-12-20 Thread Zoltán Herczeg
>Forgot to reply to this part -- would you prefer to keep using the >existing "pcre" project, and upload PCRE2 builds there (given PCRE1 is >reaching EOL anyhow), or keep using "pcre" for PCRE1 and request a new >project for PCRE2? Personally I wouldn't mind transferring the ownership of the

Re: [pcre-dev] PCRE2 on Coverity Scan

2016-12-19 Thread Zoltán Herczeg
Hi, I tried this tool a long time ago, and uploaded pcre perhaps twice, but I wasn't satisfied with its output since it hasn't reported any relevant issue. It produced a huge report though, and took a lot of time to check everything. >*** CID 11125: Null pointer dereferences (FORWARD_NULL)

Re: [pcre-dev] Fwd: Bug#840354: src:pcre3: FTBFS on powerpc (G4 CPU)

2016-10-21 Thread Zoltán Herczeg
, Zoltan Christoph Biedl <debian.a...@manchmal.in-ulm.de> írta: >Zoltán Herczeg wrote... > >> Another idea just came to my mind. This issue could be cache flush issue, >> since the CPU executes instructions from the instruction cache, while gdb >> prints instructions

Re: [pcre-dev] Fwd: Bug#840354: src:pcre3: FTBFS on powerpc (G4 CPU)

2016-10-20 Thread Zoltán Herczeg
/20160406.071510.a067ef1f.en.html Might be the cause of the issue. Regards, Zoltan "Zoltán Herczeg" <hzmes...@freemail.hu> írta: >Hi Christoph, > >>Very likely not. I saw SIGILL from other code, gdb pointed right to >>the place. So, just as another example: &g

Re: [pcre-dev] Fwd: Bug#840354: src:pcre3: FTBFS on powerpc (G4 CPU)

2016-10-20 Thread Zoltán Herczeg
Hi Christoph, >Very likely not. I saw SIGILL from other code, gdb pointed right to >the place. So, just as another example: Just asking :) Yes, SIGILL should be precise on all cpus. >| (gdb) disassemble 0xb7fe40a8,0xb7fe40c8 >| Dump of assembler code from 0xb7fe40a8 to 0xb7fe40c8: >|

Re: [pcre-dev] PCRE2 alpha testers wanted

2016-10-03 Thread Zoltán Herczeg
Hi Philip, great news! This really a great step ahead. Thank you for the hard work! I will do some JIT testing in the coming weeks. Regards, Zoltan p...@hermes.cam.ac.uk írta: >To PCRE2 users: > >Over the last few months I have been refactoring the way pcre2_compile() >works. The new code,

Re: [pcre-dev] PCRE2 String Concatenation regex

2016-07-25 Thread Zoltán Herczeg
Hi, this mailing list is about pcre library development, not general regular expression questions. Reddit is the right place for those tricky questions: https://www.reddit.com/r/regex/ I try to answer your question though. Since the number of "string" tags are unknown, I would convert this in

Re: [pcre-dev] pcre latest source pls

2016-04-19 Thread Zoltán Herczeg
Hi, svn co svn://vcs.exim.org/pcre/code/trunk svn co svn://vcs.exim.org/pcre2/code/trunk Regards, Zoltan Lokesh Ubuntu írta: >Hi, > >Am looking for pcre latest(later than 8.38) source, if possible could you >pls help on? It would be really appreciated!! > >Thanks in

Re: [pcre-dev] Fw: Microprocessor Optimization Primer

2016-04-01 Thread Zoltán Herczeg
Hi, thank you. I was curious whether System Z has a power CPU, but it looks like it is an exotic system. Regards, Zoltan "Ze'ev Atlas" írta: >ZoltanFew years back, when I was working on porting PCRE to z/OS , you thought >about developing JIT  for that stuff.  Both Philip

Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g

2016-03-21 Thread Zoltán Herczeg
>Does it mean that (*SKIP:label) looks for the (*MARK:label) in the regex >execution stack to figure out where to bump along to? Exactly. It searches the last MARK in the regex stack which name matches and restart the match from there. E.g.: when /x(*:a)x(*:a)(*SKIP:a)(*FAIL)|./ matches to

Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g

2016-03-19 Thread Zoltán Herczeg
Hi Thanh, I think these questions are better suited to https://www.reddit.com/r/regex anyway, I think the /g causes the regex to match all characters. Without that it probably just matches only one character, as you can see in regex101. The first half of your second assumption is correct, the

Re: [pcre-dev] PCRE2 and thread safety of jit compilation?

2016-01-05 Thread Zoltán Herczeg
>The biggest problem is having this working (and claim as supported) >amongst the huge number of compilers and platforms that PCRE2 runs on. >That's not an easy task. this is exactly my problem :) Perhaps we could start by supporting some platforms, and gradually cover more with the community

Re: [pcre-dev] PCRE2 and thread safety of jit compilation?

2016-01-04 Thread Zoltán Herczeg
>Do you mean that PCRE internally will read/save the JIT-compiled data >inside the pcre2_code using atomic operations? Exactly. The JIT compiler produces a pointer, and storing that pointer is an atomic operation. When that pointer is stored, it becomes "active". If compilation fails, the

Re: [pcre-dev] PCRE2 and thread safety of jit compilation?

2016-01-04 Thread Zoltán Herczeg
Hi, the JIT compilation itself should be thread safe as before (no global variables are used for compilation). You can compile multiple patterns on multiple threads in the same time, and you can even use a pattern while JIT compilation is in progress on another thread. In theory at least, let

Re: [pcre-dev] PCRE2 and thread safety of jit compilation?

2016-01-04 Thread Zoltán Herczeg
>lock(mutex); >extra = pcre_study(code, ...); >unlock(mutex); >/* then proceed to match as usual */ you can do this now as well. Just replace pcre_study to pcre2_jit_compile. There is no need to check JIT availability, the call will return with an error in that case (and you can silently ignore

Re: [pcre-dev] PCRE2 and thread safety of jit compilation?

2016-01-04 Thread Zoltán Herczeg
) of those changes. It would be better to discuss proposed patches and land the best solution. Regards, Zoltan "Giuseppe D'Angelo" <dange...@gmail.com> írta: >On Mon, Jan 4, 2016 at 8:58 PM, Zoltán Herczeg <hzmes...@freemail.hu> wrote: >> Probably a write barrier wo

Re: [pcre-dev] JIT on Solaris 10 x86 faling in PCRE2

2015-11-05 Thread Zoltán Herczeg
Hi, it seems SSE2 caused problems there. The log does not reveal too much about the problem other than some segmentation fault. Regards, Zoltan Dagobert Michelsen írta: >Hi, > >I noticed that JIT is failing for some time now on Solaris 10 x86 for >both x86 and amd64: > >

Re: [pcre-dev] JIT is silently off

2015-10-26 Thread Zoltán Herczeg
Hi, I am sorry I forgot to reply. Philip guess was right, JIT can only use 64K machine stack, and it cannot compile the pattern if it runs out of it. The exact rules are not documented since they may change any time when new optimizations are introduced (e.g. a capturing bracket may use 2 or 3

Re: [pcre-dev] JIT is silently off

2015-10-26 Thread Zoltán Herczeg
>It's very bad news for me. >Yesterday I test my patterns (each have around 1,5-2M size) and find that >JIT don't work with all of they. >Unfortunately this patterns can't be splitted due to it's automatically >construction. >Is there way to grow this 64K stack size or another way to use JIT?

Re: [pcre-dev] Adding SSE2 support to PCRE2-JIT

2015-08-26 Thread Zoltán Herczeg
May have a question about pcre executable portability? Not all x86 machines support SSE2. Is it there a run-time check for SSE2 support with a fall-back code, or is it a compile time option? The SSE2 code path is optional. It is checked at runtime using CPUID instruction on x86-32 (it is always

Re: [pcre-dev] Adding SSE2 support to PCRE2-JIT

2015-08-26 Thread Zoltán Herczeg
May have a question about pcre executable portability? Not all x86 machines support SSE2. Is it there a run-time check for SSE2 support with a fall-back code, or is it a compile time option? The SSE2 code path is optional. It is checked at runtime using CPUID instruction on x86-32 (it is always

[pcre-dev] Adding SSE2 support to PCRE2-JIT

2015-08-24 Thread Zoltán Herczeg
Hi, this is just a notification that I recently added x86 SSE2 support to PCRE2-JIT. This improves the performance of first character search. The code is mostly ready, but testing may reveal some issues. I also expect some false positive valgrind reports, since the code performs 16 byte

Re: [pcre-dev] Powerpc optimisation

2015-08-20 Thread Zoltán Herczeg
don't know how to code that in pcre. Fred On Sat, 6 Jun 2015 18:33:28 +0200 (CEST), Zoltán Herczeg hzmes...@freemail.hu wrote: Hi Frederic, I just realized that results on that page are two years old. So I updated the engines to their most recent versions and uploaded new results

Re: [pcre-dev] Using JIT stacks for multiple patterns

2015-07-25 Thread Zoltán Herczeg
Hi Christoph, I'm having a single threaded application. How could it happen that patterns are not matched sequentially? AIUI, nested calls to pcre_exec() or pcre_jit_exec() would have to be made, which could only happen if this is done from a callback, such as the pcre_jit_callback passed to

Re: [pcre-dev] New 8.3x PCRE update for CVE issues?

2015-06-06 Thread Zoltán Herczeg
Hi Jacob, Philip Hazel decides when the next PCRE release will come out. He seems very busy now, perhaps with real life things. You know this project is completely volunteer development which has advantages and disadvantages. He may prefer to fix the remaining known issues first. Anyway, you

Re: [pcre-dev] Powerpc optimisation

2015-06-06 Thread Zoltán Herczeg
from the last pattern was decreased to 27 ms from 190 ms. Regards, Zoltan Zoltán Herczeg hzmes...@freemail.hu írta: Hi Frederic, thank you for measuring PCRE on PPC. The results are quite interesting. It seems to me that those patterns are slower whose require heavy backtracking. I mean where

Re: [pcre-dev] Powerpc optimisation

2015-06-05 Thread Zoltán Herczeg
Hi Frederic, thank you for measuring PCRE on PPC. The results are quite interesting. It seems to me that those patterns are slower whose require heavy backtracking. I mean where fast-forward (skipping) algorithms cannot be used (or they match too frequently). The /[a-zA-Z]+ing/ is a good

Re: [pcre-dev] Powerpc optimisation

2015-05-24 Thread Zoltán Herczeg
Hi Frederic, the sljit compiler uses SSE2 only for floating point computation, and not for SIMD computation. PCRE-JIT only uses integer computation, so it does not even use the SSE2 part. We had discussions about using SIMD for regex matching, but nothing was implemented so far. The

Re: [pcre-dev] question usage of pcre_exec

2015-05-22 Thread Zoltán Herczeg
Hi, you need to use capturing brackets or MARK control verbs to get info about the match. I think MARK is more convenient, since you don't need to iterate over the ovector. Instead you can simply read the mark pointer in the extra data. Regards, Zoltan t t russ_9...@hotmail.com írta: Hello,

Re: [pcre-dev] Release candidate for 10.10

2015-04-23 Thread Zoltán Herczeg
Thank you for the help. The fix is landed in r1551. Regards, Zoltan Petr Pisar ppi...@redhat.com írta: On Thu, Apr 23, 2015 at 11:51:36AM +0200, Zoltán Herczeg wrote: I think the following instruction is the problem: 0x03ffb7cc0010: stp x29, x30, [sp,#-56]! sp must be 16 byte

Re: [pcre-dev] Release candidate for 10.10

2015-04-23 Thread Zoltán Herczeg
I think the following instruction is the problem: 0x03ffb7cc0010: stp x29, x30, [sp,#-56]! sp must be 16 byte aligned all the time. Petr, I will send you a patch. Let me know if it fixes the problem. Regards, Zoltan -- ## List details at

Re: [pcre-dev] Release candidate for 10.10

2015-04-22 Thread Zoltán Herczeg
instruction might help for me to understand the issue. Regards, Zoltan Petr Pisar ppi...@redhat.com írta: On Thu, Feb 26, 2015 at 07:13:39PM +0100, Zoltán Herczeg wrote: I don't think these are related. The problem is in test2, not in test19 (the serialization test). Sparc64 JIT has not implemented

Re: [pcre-dev] size of pcre_uint32 and pcre_int32

2015-03-26 Thread Zoltán Herczeg
Hi, yes, pcre_int32 and pcre_uint32 are always 32 bit long. However, there are other types: int_fast32_t / int_least32_t which might be more than 32 bit long and provides faster 32 bit computation. PCRE does not use them (at the moment). See here:

  1   2   3   >