Re: Hyperscan regexp engine

2019-06-28 Thread Tomasz Jamroszczak
On Thu, 27 Jun 2019 17:39:01 +0200, Marcus Comstedt (ACROSS) (Hail  
Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se>  
wrote:



You can still want things which do not exist.  :-)  But given the
attitude of the author I don't expect I'll be wanting either that or
hyperscan.  If _you_ want something which is only of interest to
"network [companies] looking to scan 5,000 complex regexes in
streaming mode" (who does that?), feel free to integrate it, but keep
it on Monger please.



OK, I see your standing.  There are two clarifications needed though.   
First, "the author" of Hyperscan is not only Geoff Langdale, but quite a  
big team.  Second, the library is useful not only for network security  
hardware.  It's used by GitHub  
(https://github.blog/2018-10-17-behind-the-scenes-of-github-token-scanning/)  
and in general is just faster than current regexp engines' state of art.   
Top view of constraints and areas of improvements is described on  
https://www.hyperscan.io/2015/10/20/match-regular-expressions/.


br,
tj.


Re: Hyperscan regexp engine

2019-06-27 Thread Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum
You can still want things which do not exist.  :-)  But given the
attitude of the author I don't expect I'll be wanting either that or
hyperscan.  If _you_ want something which is only of interest to
"network [companies] looking to scan 5,000 complex regexes in
streaming mode" (who does that?), feel free to integrate it, but keep
it on Monger please.


Re: Hyperscan regexp engine

2019-06-27 Thread Tomasz Jamroszczak
On Thu, 27 Jun 2019 15:59:01 +0200, Marcus Comstedt (ACROSS) (Hail  
Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se>  
wrote:



So I interpret this as "ure3" (whatever that is) being the thing we
actually want, not this garbage fire?


On  
https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/  
the author in the commets writes on March 27, 2019:


"""
I would like “ure3” to be open. I also will need help. I designed  
Hyperscan but did not build it all by myself; [...]


If I build a new regex matching system it will be a lot smaller and  
simpler, but it will still be a huge effort.

"""

And on April 15, 2019:
"""
if I get to it.
"""

	So well, given the fact that the "ure3" doesn't exist, I doubt it is  
something we should or even could want.


br,
tj.


Re: Hyperscan regexp engine

2019-06-27 Thread Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum
Looking at the ycombinator page, it sounds like even ure3 would not
support 32-bit arches.  Sounds like a non-starter to me.


Re: Hyperscan regexp engine

2019-06-27 Thread Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum
So I interpret this as "ure3" (whatever that is) being the thing we
actually want, not this garbage fire?


Re: Hyperscan regexp engine

2019-06-27 Thread Tomasz Jamroszczak
On Thu, 27 Jun 2019 13:58:02 +0200, Marcus Comstedt (ACROSS) (Hail  
Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se>  
wrote:



/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/


For modern CPU:s?  Looks more like they are targeting a certain 1970:s
architecture:

   cpuid(1, 0, , , , );

Did you test it on RISC-V?


From https://news.ycombinator.com/item?id=19270199:

"""  
I suppose "Some Modern CPUs" was too long-winded a title?
As I said, it doesn't take a genius to understand Intel's motivations.  
They bought the project, after all.


Not being @ Intel anymore, I don't have access to the older stuff, and  
even if I did, the codebase has diverged significantly since.


Without going on too much of a tirade - the experience of developing for  
all those platforms really sucked. Almost all the non-x86 platforms had  
significant bugs in their toolchains. One of the MIPS variants (particular  
architecture elided to spare the guilty) had bugs in their gcc intrinsics  
in a way that suggested that no-one had ever done any significant  
third-party dev on the platform.


Big-endian was also a huge PITA.

It was a ton of work to keep all those systems alive, and our machine rack  
looked like a zoo of dev boards and weirdo devices.


In the "ure3" system I mention, I would make retargetability/portability  
to other systems a first-class goal.


One way of achieving this is not having such a huge profusion of methods  
and complexity. Hyperscan is over-engineered for many use cases if you  
aren't a network company looking to scan 5,000 complex regexes in  
streaming mode at hopefully maximal performance.

"""


Also:  114 kLOC for a regexp matcher.  Srsly?


For a 40x speed boost?  Worth at least trying.

BR,
tj


Re: Hyperscan regexp engine

2019-06-27 Thread Stephen R. van den Berg
Tomasz Jamroszczak wrote:
>On Thu, 27 Jun 2019 11:35:04 +0200, Stephen R. van den Berg
> wrote:

>>> There's the https://github.com/intel/hyperscan regexp library
>>>created over 10 years by algorithm start-up

>>As a matter of fact, I have looked at it, and if nobody beats me to it,
>>I might integrate support for it (should not be hard, given the fact that
>>it is API-compatible with PCRE).

>   That's good news.  Do you have any timeframe?

A bit hard to say, because other work sometimes drags on a lot longer
than anticipated (e.g. debugging the Shuffler took 14 days longer than
expected).  But let's say that before the end of July I most likely have
had a look at it.
-- 
Stephen.


Re: Hyperscan regexp engine

2019-06-27 Thread Tomasz Jamroszczak
On Thu, 27 Jun 2019 11:35:04 +0200, Stephen R. van den Berg   
wrote:



Tomasz Jamroszczak wrote:

There's the https://github.com/intel/hyperscan regexp library
created over 10 years by algorithm start-up
https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/
then bought by Intel and further developed.  The development did
everything the right way and the lib boasts a lot of optimizations
and speedup.  Is there a plan to import the lib to Pike?  Would it be
easy and straight-forward to do so?


As a matter of fact, I have looked at it, and if nobody beats me to it,
I might integrate support for it (should not be hard, given the fact that
it is API-compatible with PCRE).


That's good news.  Do you have any timeframe?

BR,
tj.


Re: Hyperscan regexp engine

2019-06-27 Thread Stephen R. van den Berg
Tomasz Jamroszczak wrote:
>   There's the https://github.com/intel/hyperscan regexp library
>created over 10 years by algorithm start-up
>https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/
>then bought by Intel and further developed.  The development did
>everything the right way and the lib boasts a lot of optimizations
>and speedup.  Is there a plan to import the lib to Pike?  Would it be
>easy and straight-forward to do so?

As a matter of fact, I have looked at it, and if nobody beats me to it,
I might integrate support for it (should not be hard, given the fact that
it is API-compatible with PCRE).
-- 
Stephen.