Re: Hyperscan regexp engine
On Thu, 27 Jun 2019 17:39:01 +0200, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se> wrote: You can still want things which do not exist. :-) But given the attitude of the author I don't expect I'll be wanting either that or hyperscan. If _you_ want something which is only of interest to "network [companies] looking to scan 5,000 complex regexes in streaming mode" (who does that?), feel free to integrate it, but keep it on Monger please. OK, I see your standing. There are two clarifications needed though. First, "the author" of Hyperscan is not only Geoff Langdale, but quite a big team. Second, the library is useful not only for network security hardware. It's used by GitHub (https://github.blog/2018-10-17-behind-the-scenes-of-github-token-scanning/) and in general is just faster than current regexp engines' state of art. Top view of constraints and areas of improvements is described on https://www.hyperscan.io/2015/10/20/match-regular-expressions/. br, tj.
Re: Hyperscan regexp engine
You can still want things which do not exist. :-) But given the attitude of the author I don't expect I'll be wanting either that or hyperscan. If _you_ want something which is only of interest to "network [companies] looking to scan 5,000 complex regexes in streaming mode" (who does that?), feel free to integrate it, but keep it on Monger please.
Re: Hyperscan regexp engine
On Thu, 27 Jun 2019 15:59:01 +0200, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se> wrote: So I interpret this as "ure3" (whatever that is) being the thing we actually want, not this garbage fire? On https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/ the author in the commets writes on March 27, 2019: """ I would like “ure3” to be open. I also will need help. I designed Hyperscan but did not build it all by myself; [...] If I build a new regex matching system it will be a lot smaller and simpler, but it will still be a huge effort. """ And on April 15, 2019: """ if I get to it. """ So well, given the fact that the "ure3" doesn't exist, I doubt it is something we should or even could want. br, tj.
Re: Hyperscan regexp engine
Looking at the ycombinator page, it sounds like even ure3 would not support 32-bit arches. Sounds like a non-starter to me.
Re: Hyperscan regexp engine
So I interpret this as "ure3" (whatever that is) being the thing we actually want, not this garbage fire?
Re: Hyperscan regexp engine
On Thu, 27 Jun 2019 13:58:02 +0200, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se> wrote: /paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/ For modern CPU:s? Looks more like they are targeting a certain 1970:s architecture: cpuid(1, 0, , , , ); Did you test it on RISC-V? From https://news.ycombinator.com/item?id=19270199: """ I suppose "Some Modern CPUs" was too long-winded a title? As I said, it doesn't take a genius to understand Intel's motivations. They bought the project, after all. Not being @ Intel anymore, I don't have access to the older stuff, and even if I did, the codebase has diverged significantly since. Without going on too much of a tirade - the experience of developing for all those platforms really sucked. Almost all the non-x86 platforms had significant bugs in their toolchains. One of the MIPS variants (particular architecture elided to spare the guilty) had bugs in their gcc intrinsics in a way that suggested that no-one had ever done any significant third-party dev on the platform. Big-endian was also a huge PITA. It was a ton of work to keep all those systems alive, and our machine rack looked like a zoo of dev boards and weirdo devices. In the "ure3" system I mention, I would make retargetability/portability to other systems a first-class goal. One way of achieving this is not having such a huge profusion of methods and complexity. Hyperscan is over-engineered for many use cases if you aren't a network company looking to scan 5,000 complex regexes in streaming mode at hopefully maximal performance. """ Also: 114 kLOC for a regexp matcher. Srsly? For a 40x speed boost? Worth at least trying. BR, tj
Re: Hyperscan regexp engine
Tomasz Jamroszczak wrote: >On Thu, 27 Jun 2019 11:35:04 +0200, Stephen R. van den Berg > wrote: >>> There's the https://github.com/intel/hyperscan regexp library >>>created over 10 years by algorithm start-up >>As a matter of fact, I have looked at it, and if nobody beats me to it, >>I might integrate support for it (should not be hard, given the fact that >>it is API-compatible with PCRE). > That's good news. Do you have any timeframe? A bit hard to say, because other work sometimes drags on a lot longer than anticipated (e.g. debugging the Shuffler took 14 days longer than expected). But let's say that before the end of July I most likely have had a look at it. -- Stephen.
Re: Hyperscan regexp engine
On Thu, 27 Jun 2019 11:35:04 +0200, Stephen R. van den Berg wrote: Tomasz Jamroszczak wrote: There's the https://github.com/intel/hyperscan regexp library created over 10 years by algorithm start-up https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/ then bought by Intel and further developed. The development did everything the right way and the lib boasts a lot of optimizations and speedup. Is there a plan to import the lib to Pike? Would it be easy and straight-forward to do so? As a matter of fact, I have looked at it, and if nobody beats me to it, I might integrate support for it (should not be hard, given the fact that it is API-compatible with PCRE). That's good news. Do you have any timeframe? BR, tj.
Re: Hyperscan regexp engine
Tomasz Jamroszczak wrote: > There's the https://github.com/intel/hyperscan regexp library >created over 10 years by algorithm start-up >https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/ >then bought by Intel and further developed. The development did >everything the right way and the lib boasts a lot of optimizations >and speedup. Is there a plan to import the lib to Pike? Would it be >easy and straight-forward to do so? As a matter of fact, I have looked at it, and if nobody beats me to it, I might integrate support for it (should not be hard, given the fact that it is API-compatible with PCRE). -- Stephen.