Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Jim Ingham via lldb-commits
I imagine Greg also thought the llvm classes were just a wrapper around the 
system reg* functions.  I wouldn't have had a problem with that, though it 
wouldn't have made any noticeable difference to lldb's performance.  But I 
actually had to deal with a bug in the OS X copy of the regex engine that 
crashed gdb back in the day, so I know such things are possible, and are much 
more likely to be found and fixed in the system implementation which is quite 
heavily used than in a private copy.  And I really don't want to be in the 
business of maintaining one of these beasts.  So let's hold off on this till 
llvm gets around to using an implementation that is battle tested, like the 
std::regex one.

Jim


> On Sep 21, 2016, at 6:47 PM, Zachary Turner  wrote:
> 
> Fair enough, greg seemed to be rather disturbed by the copy, so this was 
> really just to appease him, I'm also not too concerned . I do think it's 
> worth sharing code with llvm here, but I'm happy to put it off unless someone 
> insists (I wasn't planning to do it anyway until greg expressed concern)
> On Wed, Sep 21, 2016 at 6:42 PM Jim Ingham  wrote:
> First off, I bet we could decide for giggles to make 5 or 6 random copies of 
> strings before we pass them to regcomp in lldb and none of our users would 
> notice the difference.  It's just not something we do as a hot code path.  So 
> avoiding a copy or two is not worth even one new bug in regex handling IMO.
> 
> Secondly, maybe people who work on compilers find the notion of maintaining a 
> regular expression engine something they can get excited about (though I 
> doubt it) but that is certainly not something I want to take on unless I'm 
> getting a very strong benefit from it.
> 
> Jim
> 
> > On Sep 21, 2016, at 6:36 PM, Zachary Turner  wrote:
> >
> > I suppose, but at the same time if it's good enough for llvm it seems good 
> > enough for us.
> >
> > By using it we contribute to its testing, and if we find something it helps 
> > a much larger group of people than just us.
> >
> > Anyway it's you guys' call, but this string copy isn't going to go away 
> > otherwise and I don't really think reverting this change is in the projects 
> > best interest
> > On Wed, Sep 21, 2016 at 6:31 PM Jim Ingham  wrote:
> > I was being facetious about the enhancements.  I am more serious about 
> > bugs.  I would really rather use a maintained rather than a "I got this 
> > source at some point and use it for some things, but don't rigorously test 
> > it" version of regcomp & friends.  Can we hold off on that change till we 
> > have nothing else better to do?
> >
> > Jim
> > > On Sep 21, 2016, at 6:27 PM, Zachary Turner  wrote:
> > >
> > > I don't believe so. For the same reason we don't want enhanced on Apple 
> > > and extended everywhere else. Better if all platforms do the same thing.
> > >
> > > There may be a case to be made for standardizing on std::regex though, it 
> > > has many different flavors and has been standardized for some time.
> > >
> > > Maybe llvm::Regex could be rewritten in terms of std::regex, that would 
> > > enable all the cool flavors for everyone (but I imagine it would tickle 
> > > some strange compatibility bugs and not be entirely smooth)
> > >
> > >
> > > On Wed, Sep 21, 2016 at 6:20 PM Jim Ingham  wrote:
> > > So does the build machinery use the one that is supported on that 
> > > platform if it is available?  They are going to get much wider testing, 
> > > fixes and even "ENHANCE"ments...
> > >
> > > Jim
> > >
> > > > On Sep 21, 2016, at 6:07 PM, Zachary Turner  wrote:
> > > >
> > > > Windows :)
> > > >
> > > > But that was before std::regex. In theory std::regex would work 
> > > > everywhere (although idk how it performs)
> > > > On Wed, Sep 21, 2016 at 6:04 PM Jim Ingham  wrote:
> > > > It seems a little odd that llvm has its own forked copy of Henry 
> > > > Spencer's old regular expression engine?  Are there platforms we care 
> > > > about that don't have a maintained version of exactly the same code?
> > > >
> > > > Jim
> > > >
> > > >
> > > > > On Sep 21, 2016, at 5:42 PM, Zachary Turner  
> > > > > wrote:
> > > > >
> > > > > I'll try to address the Regex issue this week (by making everything 
> > > > > use llvm extended mode regexes).  If someone feels up to the 
> > > > > challenge, adding support for \d \s etc etc to llvm's regex 
> > > > > implementation would make a lot of people very happy.
> > > > >
> > > > > On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham  wrote:
> > > > >
> > > > > > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits 
> > > > > >  wrote:
> > > > > >
> > > > > > Yep, and we can't have any regex objects in LLDB using those 
> > > > > > because they will only work on Apple and we don't want code like:
> > > > > >
> > > > > > #if defined(__APPLE__)
> > > > > >   RegularExpression e("\s+");
> > > > > > #else
> > > > > >   RegularExpression e("[ \t]+");
> > > > > > #endif
> > > > > >
> > > > > > I know "\s" is probably extende

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
Fair enough, greg seemed to be rather disturbed by the copy, so this was
really just to appease him, I'm also not too concerned . I do think it's
worth sharing code with llvm here, but I'm happy to put it off unless
someone insists (I wasn't planning to do it anyway until greg expressed
concern)
On Wed, Sep 21, 2016 at 6:42 PM Jim Ingham  wrote:

> First off, I bet we could decide for giggles to make 5 or 6 random copies
> of strings before we pass them to regcomp in lldb and none of our users
> would notice the difference.  It's just not something we do as a hot code
> path.  So avoiding a copy or two is not worth even one new bug in regex
> handling IMO.
>
> Secondly, maybe people who work on compilers find the notion of
> maintaining a regular expression engine something they can get excited
> about (though I doubt it) but that is certainly not something I want to
> take on unless I'm getting a very strong benefit from it.
>
> Jim
>
> > On Sep 21, 2016, at 6:36 PM, Zachary Turner  wrote:
> >
> > I suppose, but at the same time if it's good enough for llvm it seems
> good enough for us.
> >
> > By using it we contribute to its testing, and if we find something it
> helps a much larger group of people than just us.
> >
> > Anyway it's you guys' call, but this string copy isn't going to go away
> otherwise and I don't really think reverting this change is in the projects
> best interest
> > On Wed, Sep 21, 2016 at 6:31 PM Jim Ingham  wrote:
> > I was being facetious about the enhancements.  I am more serious about
> bugs.  I would really rather use a maintained rather than a "I got this
> source at some point and use it for some things, but don't rigorously test
> it" version of regcomp & friends.  Can we hold off on that change till we
> have nothing else better to do?
> >
> > Jim
> > > On Sep 21, 2016, at 6:27 PM, Zachary Turner 
> wrote:
> > >
> > > I don't believe so. For the same reason we don't want enhanced on
> Apple and extended everywhere else. Better if all platforms do the same
> thing.
> > >
> > > There may be a case to be made for standardizing on std::regex though,
> it has many different flavors and has been standardized for some time.
> > >
> > > Maybe llvm::Regex could be rewritten in terms of std::regex, that
> would enable all the cool flavors for everyone (but I imagine it would
> tickle some strange compatibility bugs and not be entirely smooth)
> > >
> > >
> > > On Wed, Sep 21, 2016 at 6:20 PM Jim Ingham  wrote:
> > > So does the build machinery use the one that is supported on that
> platform if it is available?  They are going to get much wider testing,
> fixes and even "ENHANCE"ments...
> > >
> > > Jim
> > >
> > > > On Sep 21, 2016, at 6:07 PM, Zachary Turner 
> wrote:
> > > >
> > > > Windows :)
> > > >
> > > > But that was before std::regex. In theory std::regex would work
> everywhere (although idk how it performs)
> > > > On Wed, Sep 21, 2016 at 6:04 PM Jim Ingham 
> wrote:
> > > > It seems a little odd that llvm has its own forked copy of Henry
> Spencer's old regular expression engine?  Are there platforms we care about
> that don't have a maintained version of exactly the same code?
> > > >
> > > > Jim
> > > >
> > > >
> > > > > On Sep 21, 2016, at 5:42 PM, Zachary Turner 
> wrote:
> > > > >
> > > > > I'll try to address the Regex issue this week (by making
> everything use llvm extended mode regexes).  If someone feels up to the
> challenge, adding support for \d \s etc etc to llvm's regex implementation
> would make a lot of people very happy.
> > > > >
> > > > > On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham 
> wrote:
> > > > >
> > > > > > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> > > > > >
> > > > > > Yep, and we can't have any regex objects in LLDB using those
> because they will only work on Apple and we don't want code like:
> > > > > >
> > > > > > #if defined(__APPLE__)
> > > > > >   RegularExpression e("\s+");
> > > > > > #else
> > > > > >   RegularExpression e("[ \t]+");
> > > > > > #endif
> > > > > >
> > > > > > I know "\s" is probably extended, so this was a bad example, but
> you get my drift.
> > > > >
> > > > > Nope, sadly for extended you have to say [[:space:]].  To avoid
> the #define problem, we could require only extended regular expressions in
> lldb code, but let users type the more convenient enhanced one wherever the
> command line uses them.  But beyond these convenient shortcuts there
> probably aren't that many additions that would be useful for the kind of
> regular expressions our users are likely to write.  If we ever provide "-r"
> option to "memory find" the byte literal extension might come in handy.
> But I don't think that justifies making MacOS builds different.
> > > > >
> > > > > If it really bothered us we could go get a more modern regex
> engine from somewhere (rip it out of Tcl or something like that...)
> > > > >
> > > > > Jim
> > > > >
> > > > >
> > > > > >
> > > > > > Greg
> > 

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Jim Ingham via lldb-commits
First off, I bet we could decide for giggles to make 5 or 6 random copies of 
strings before we pass them to regcomp in lldb and none of our users would 
notice the difference.  It's just not something we do as a hot code path.  So 
avoiding a copy or two is not worth even one new bug in regex handling IMO.

Secondly, maybe people who work on compilers find the notion of maintaining a 
regular expression engine something they can get excited about (though I doubt 
it) but that is certainly not something I want to take on unless I'm getting a 
very strong benefit from it.

Jim

> On Sep 21, 2016, at 6:36 PM, Zachary Turner  wrote:
> 
> I suppose, but at the same time if it's good enough for llvm it seems good 
> enough for us. 
> 
> By using it we contribute to its testing, and if we find something it helps a 
> much larger group of people than just us.
> 
> Anyway it's you guys' call, but this string copy isn't going to go away 
> otherwise and I don't really think reverting this change is in the projects 
> best interest 
> On Wed, Sep 21, 2016 at 6:31 PM Jim Ingham  wrote:
> I was being facetious about the enhancements.  I am more serious about bugs.  
> I would really rather use a maintained rather than a "I got this source at 
> some point and use it for some things, but don't rigorously test it" version 
> of regcomp & friends.  Can we hold off on that change till we have nothing 
> else better to do?
> 
> Jim
> > On Sep 21, 2016, at 6:27 PM, Zachary Turner  wrote:
> >
> > I don't believe so. For the same reason we don't want enhanced on Apple and 
> > extended everywhere else. Better if all platforms do the same thing.
> >
> > There may be a case to be made for standardizing on std::regex though, it 
> > has many different flavors and has been standardized for some time.
> >
> > Maybe llvm::Regex could be rewritten in terms of std::regex, that would 
> > enable all the cool flavors for everyone (but I imagine it would tickle 
> > some strange compatibility bugs and not be entirely smooth)
> >
> >
> > On Wed, Sep 21, 2016 at 6:20 PM Jim Ingham  wrote:
> > So does the build machinery use the one that is supported on that platform 
> > if it is available?  They are going to get much wider testing, fixes and 
> > even "ENHANCE"ments...
> >
> > Jim
> >
> > > On Sep 21, 2016, at 6:07 PM, Zachary Turner  wrote:
> > >
> > > Windows :)
> > >
> > > But that was before std::regex. In theory std::regex would work 
> > > everywhere (although idk how it performs)
> > > On Wed, Sep 21, 2016 at 6:04 PM Jim Ingham  wrote:
> > > It seems a little odd that llvm has its own forked copy of Henry 
> > > Spencer's old regular expression engine?  Are there platforms we care 
> > > about that don't have a maintained version of exactly the same code?
> > >
> > > Jim
> > >
> > >
> > > > On Sep 21, 2016, at 5:42 PM, Zachary Turner  wrote:
> > > >
> > > > I'll try to address the Regex issue this week (by making everything use 
> > > > llvm extended mode regexes).  If someone feels up to the challenge, 
> > > > adding support for \d \s etc etc to llvm's regex implementation would 
> > > > make a lot of people very happy.
> > > >
> > > > On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham  wrote:
> > > >
> > > > > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits 
> > > > >  wrote:
> > > > >
> > > > > Yep, and we can't have any regex objects in LLDB using those because 
> > > > > they will only work on Apple and we don't want code like:
> > > > >
> > > > > #if defined(__APPLE__)
> > > > >   RegularExpression e("\s+");
> > > > > #else
> > > > >   RegularExpression e("[ \t]+");
> > > > > #endif
> > > > >
> > > > > I know "\s" is probably extended, so this was a bad example, but you 
> > > > > get my drift.
> > > >
> > > > Nope, sadly for extended you have to say [[:space:]].  To avoid the 
> > > > #define problem, we could require only extended regular expressions in 
> > > > lldb code, but let users type the more convenient enhanced one wherever 
> > > > the command line uses them.  But beyond these convenient shortcuts 
> > > > there probably aren't that many additions that would be useful for the 
> > > > kind of regular expressions our users are likely to write.  If we ever 
> > > > provide "-r" option to "memory find" the byte literal extension might 
> > > > come in handy.  But I don't think that justifies making MacOS builds 
> > > > different.
> > > >
> > > > If it really bothered us we could go get a more modern regex engine 
> > > > from somewhere (rip it out of Tcl or something like that...)
> > > >
> > > > Jim
> > > >
> > > >
> > > > >
> > > > > Greg
> > > > >
> > > > >> On Sep 21, 2016, at 5:19 PM, Zachary Turner  
> > > > >> wrote:
> > > > >>
> > > > >> That sounds like a plan.  BTW, extended is the one that everyone 
> > > > >> supports, enhanced is the one that only apple supports.
> > > > >>
> > > > >> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton  
> > > > >> wrote:
> > > > >> And we should check for any "extended" 

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
I suppose, but at the same time if it's good enough for llvm it seems good
enough for us.

By using it we contribute to its testing, and if we find something it helps
a much larger group of people than just us.

Anyway it's you guys' call, but this string copy isn't going to go away
otherwise and I don't really think reverting this change is in the projects
best interest
On Wed, Sep 21, 2016 at 6:31 PM Jim Ingham  wrote:

> I was being facetious about the enhancements.  I am more serious about
> bugs.  I would really rather use a maintained rather than a "I got this
> source at some point and use it for some things, but don't rigorously test
> it" version of regcomp & friends.  Can we hold off on that change till we
> have nothing else better to do?
>
> Jim
> > On Sep 21, 2016, at 6:27 PM, Zachary Turner  wrote:
> >
> > I don't believe so. For the same reason we don't want enhanced on Apple
> and extended everywhere else. Better if all platforms do the same thing.
> >
> > There may be a case to be made for standardizing on std::regex though,
> it has many different flavors and has been standardized for some time.
> >
> > Maybe llvm::Regex could be rewritten in terms of std::regex, that would
> enable all the cool flavors for everyone (but I imagine it would tickle
> some strange compatibility bugs and not be entirely smooth)
> >
> >
> > On Wed, Sep 21, 2016 at 6:20 PM Jim Ingham  wrote:
> > So does the build machinery use the one that is supported on that
> platform if it is available?  They are going to get much wider testing,
> fixes and even "ENHANCE"ments...
> >
> > Jim
> >
> > > On Sep 21, 2016, at 6:07 PM, Zachary Turner 
> wrote:
> > >
> > > Windows :)
> > >
> > > But that was before std::regex. In theory std::regex would work
> everywhere (although idk how it performs)
> > > On Wed, Sep 21, 2016 at 6:04 PM Jim Ingham  wrote:
> > > It seems a little odd that llvm has its own forked copy of Henry
> Spencer's old regular expression engine?  Are there platforms we care about
> that don't have a maintained version of exactly the same code?
> > >
> > > Jim
> > >
> > >
> > > > On Sep 21, 2016, at 5:42 PM, Zachary Turner 
> wrote:
> > > >
> > > > I'll try to address the Regex issue this week (by making everything
> use llvm extended mode regexes).  If someone feels up to the challenge,
> adding support for \d \s etc etc to llvm's regex implementation would make
> a lot of people very happy.
> > > >
> > > > On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham 
> wrote:
> > > >
> > > > > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> > > > >
> > > > > Yep, and we can't have any regex objects in LLDB using those
> because they will only work on Apple and we don't want code like:
> > > > >
> > > > > #if defined(__APPLE__)
> > > > >   RegularExpression e("\s+");
> > > > > #else
> > > > >   RegularExpression e("[ \t]+");
> > > > > #endif
> > > > >
> > > > > I know "\s" is probably extended, so this was a bad example, but
> you get my drift.
> > > >
> > > > Nope, sadly for extended you have to say [[:space:]].  To avoid the
> #define problem, we could require only extended regular expressions in lldb
> code, but let users type the more convenient enhanced one wherever the
> command line uses them.  But beyond these convenient shortcuts there
> probably aren't that many additions that would be useful for the kind of
> regular expressions our users are likely to write.  If we ever provide "-r"
> option to "memory find" the byte literal extension might come in handy.
> But I don't think that justifies making MacOS builds different.
> > > >
> > > > If it really bothered us we could go get a more modern regex engine
> from somewhere (rip it out of Tcl or something like that...)
> > > >
> > > > Jim
> > > >
> > > >
> > > > >
> > > > > Greg
> > > > >
> > > > >> On Sep 21, 2016, at 5:19 PM, Zachary Turner 
> wrote:
> > > > >>
> > > > >> That sounds like a plan.  BTW, extended is the one that everyone
> supports, enhanced is the one that only apple supports.
> > > > >>
> > > > >> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton 
> wrote:
> > > > >> And we should check for any "extended" mode regex stuff and get
> rid of it since as you said they are not portable. They tend to be
> shortcuts for classes of objects and we can just fix the regex to be more
> explicit about it. For now we would keep the
> lldb_private::RegularExpression class, have it have a llvm::Regex member
> and then lldbassert if the regex fails to compile so we can catch any
> extended cruft that we might miss so we can fix it...
> > > > >>
> > > > >>> On Sep 21, 2016, at 5:15 PM, Greg Clayton 
> wrote:
> > > > >>>
> > > > >>> To be clear: if we can make StringRef work efficiently, I am
> fine with that. We can just cut over to using llvm::Regex where it uses the
> start and end pointer. I would just like to avoid making string copies for
> no reason. I don't have anything against using StringRef as long as we do
>

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Jim Ingham via lldb-commits
I was being facetious about the enhancements.  I am more serious about bugs.  I 
would really rather use a maintained rather than a "I got this source at some 
point and use it for some things, but don't rigorously test it" version of 
regcomp & friends.  Can we hold off on that change till we have nothing else 
better to do?

Jim
> On Sep 21, 2016, at 6:27 PM, Zachary Turner  wrote:
> 
> I don't believe so. For the same reason we don't want enhanced on Apple and 
> extended everywhere else. Better if all platforms do the same thing.
> 
> There may be a case to be made for standardizing on std::regex though, it has 
> many different flavors and has been standardized for some time.
> 
> Maybe llvm::Regex could be rewritten in terms of std::regex, that would 
> enable all the cool flavors for everyone (but I imagine it would tickle some 
> strange compatibility bugs and not be entirely smooth)
> 
> 
> On Wed, Sep 21, 2016 at 6:20 PM Jim Ingham  wrote:
> So does the build machinery use the one that is supported on that platform if 
> it is available?  They are going to get much wider testing, fixes and even 
> "ENHANCE"ments...
> 
> Jim
> 
> > On Sep 21, 2016, at 6:07 PM, Zachary Turner  wrote:
> >
> > Windows :)
> >
> > But that was before std::regex. In theory std::regex would work everywhere 
> > (although idk how it performs)
> > On Wed, Sep 21, 2016 at 6:04 PM Jim Ingham  wrote:
> > It seems a little odd that llvm has its own forked copy of Henry Spencer's 
> > old regular expression engine?  Are there platforms we care about that 
> > don't have a maintained version of exactly the same code?
> >
> > Jim
> >
> >
> > > On Sep 21, 2016, at 5:42 PM, Zachary Turner  wrote:
> > >
> > > I'll try to address the Regex issue this week (by making everything use 
> > > llvm extended mode regexes).  If someone feels up to the challenge, 
> > > adding support for \d \s etc etc to llvm's regex implementation would 
> > > make a lot of people very happy.
> > >
> > > On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham  wrote:
> > >
> > > > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits 
> > > >  wrote:
> > > >
> > > > Yep, and we can't have any regex objects in LLDB using those because 
> > > > they will only work on Apple and we don't want code like:
> > > >
> > > > #if defined(__APPLE__)
> > > >   RegularExpression e("\s+");
> > > > #else
> > > >   RegularExpression e("[ \t]+");
> > > > #endif
> > > >
> > > > I know "\s" is probably extended, so this was a bad example, but you 
> > > > get my drift.
> > >
> > > Nope, sadly for extended you have to say [[:space:]].  To avoid the 
> > > #define problem, we could require only extended regular expressions in 
> > > lldb code, but let users type the more convenient enhanced one wherever 
> > > the command line uses them.  But beyond these convenient shortcuts there 
> > > probably aren't that many additions that would be useful for the kind of 
> > > regular expressions our users are likely to write.  If we ever provide 
> > > "-r" option to "memory find" the byte literal extension might come in 
> > > handy.  But I don't think that justifies making MacOS builds different.
> > >
> > > If it really bothered us we could go get a more modern regex engine from 
> > > somewhere (rip it out of Tcl or something like that...)
> > >
> > > Jim
> > >
> > >
> > > >
> > > > Greg
> > > >
> > > >> On Sep 21, 2016, at 5:19 PM, Zachary Turner  wrote:
> > > >>
> > > >> That sounds like a plan.  BTW, extended is the one that everyone 
> > > >> supports, enhanced is the one that only apple supports.
> > > >>
> > > >> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton  
> > > >> wrote:
> > > >> And we should check for any "extended" mode regex stuff and get rid of 
> > > >> it since as you said they are not portable. They tend to be shortcuts 
> > > >> for classes of objects and we can just fix the regex to be more 
> > > >> explicit about it. For now we would keep the 
> > > >> lldb_private::RegularExpression class, have it have a llvm::Regex 
> > > >> member and then lldbassert if the regex fails to compile so we can 
> > > >> catch any extended cruft that we might miss so we can fix it...
> > > >>
> > > >>> On Sep 21, 2016, at 5:15 PM, Greg Clayton  wrote:
> > > >>>
> > > >>> To be clear: if we can make StringRef work efficiently, I am fine 
> > > >>> with that. We can just cut over to using llvm::Regex where it uses 
> > > >>> the start and end pointer. I would just like to avoid making string 
> > > >>> copies for no reason. I don't have anything against using StringRef 
> > > >>> as long as we do it efficiently.
> > > >>>
> > > >>> Greg
> > > >>>
> > > >>>
> > >  On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits 
> > >   wrote:
> > > 
> > > >
> > > > On Sep 21, 2016, at 4:43 PM, Zachary Turner  
> > > > wrote:
> > > >
> > > > You need to duplicate something on the heap once when you execute 
> > > > the regex.  And in turn you save tens or hundre

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
I don't believe so. For the same reason we don't want enhanced on Apple and
extended everywhere else. Better if all platforms do the same thing.

There may be a case to be made for standardizing on std::regex though, it
has many different flavors and has been standardized for some time.

Maybe llvm::Regex could be rewritten in terms of std::regex, that would
enable all the cool flavors for everyone (but I imagine it would tickle
some strange compatibility bugs and not be entirely smooth)


On Wed, Sep 21, 2016 at 6:20 PM Jim Ingham  wrote:

> So does the build machinery use the one that is supported on that platform
> if it is available?  They are going to get much wider testing, fixes and
> even "ENHANCE"ments...
>
> Jim
>
> > On Sep 21, 2016, at 6:07 PM, Zachary Turner  wrote:
> >
> > Windows :)
> >
> > But that was before std::regex. In theory std::regex would work
> everywhere (although idk how it performs)
> > On Wed, Sep 21, 2016 at 6:04 PM Jim Ingham  wrote:
> > It seems a little odd that llvm has its own forked copy of Henry
> Spencer's old regular expression engine?  Are there platforms we care about
> that don't have a maintained version of exactly the same code?
> >
> > Jim
> >
> >
> > > On Sep 21, 2016, at 5:42 PM, Zachary Turner 
> wrote:
> > >
> > > I'll try to address the Regex issue this week (by making everything
> use llvm extended mode regexes).  If someone feels up to the challenge,
> adding support for \d \s etc etc to llvm's regex implementation would make
> a lot of people very happy.
> > >
> > > On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham  wrote:
> > >
> > > > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> > > >
> > > > Yep, and we can't have any regex objects in LLDB using those because
> they will only work on Apple and we don't want code like:
> > > >
> > > > #if defined(__APPLE__)
> > > >   RegularExpression e("\s+");
> > > > #else
> > > >   RegularExpression e("[ \t]+");
> > > > #endif
> > > >
> > > > I know "\s" is probably extended, so this was a bad example, but you
> get my drift.
> > >
> > > Nope, sadly for extended you have to say [[:space:]].  To avoid the
> #define problem, we could require only extended regular expressions in lldb
> code, but let users type the more convenient enhanced one wherever the
> command line uses them.  But beyond these convenient shortcuts there
> probably aren't that many additions that would be useful for the kind of
> regular expressions our users are likely to write.  If we ever provide "-r"
> option to "memory find" the byte literal extension might come in handy.
> But I don't think that justifies making MacOS builds different.
> > >
> > > If it really bothered us we could go get a more modern regex engine
> from somewhere (rip it out of Tcl or something like that...)
> > >
> > > Jim
> > >
> > >
> > > >
> > > > Greg
> > > >
> > > >> On Sep 21, 2016, at 5:19 PM, Zachary Turner 
> wrote:
> > > >>
> > > >> That sounds like a plan.  BTW, extended is the one that everyone
> supports, enhanced is the one that only apple supports.
> > > >>
> > > >> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton 
> wrote:
> > > >> And we should check for any "extended" mode regex stuff and get rid
> of it since as you said they are not portable. They tend to be shortcuts
> for classes of objects and we can just fix the regex to be more explicit
> about it. For now we would keep the lldb_private::RegularExpression class,
> have it have a llvm::Regex member and then lldbassert if the regex fails to
> compile so we can catch any extended cruft that we might miss so we can fix
> it...
> > > >>
> > > >>> On Sep 21, 2016, at 5:15 PM, Greg Clayton 
> wrote:
> > > >>>
> > > >>> To be clear: if we can make StringRef work efficiently, I am fine
> with that. We can just cut over to using llvm::Regex where it uses the
> start and end pointer. I would just like to avoid making string copies for
> no reason. I don't have anything against using StringRef as long as we do
> it efficiently.
> > > >>>
> > > >>> Greg
> > > >>>
> > > >>>
> > >  On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> > > 
> > > >
> > > > On Sep 21, 2016, at 4:43 PM, Zachary Turner 
> wrote:
> > > >
> > > > You need to duplicate something on the heap once when you
> execute the regex.  And in turn you save tens or hundreds or copies on the
> way there because of inefficient string usage.
> > > 
> > >  Where? please show this.
> > > 
> > >  I see the following callers:
> > > 
> > > const char *class_name =
> > > 
> iterator->second->GetClassName().AsCString("");
> > > if (regex_up && class_name &&
> > > !regex_up->Execute(llvm::StringRef(class_name)))
> > > 
> > >  You are adding a strlen() call here to construct the StringRef,
> not saving anything.
> > > 
> > > 
> > >  bool CommandObjectRegexCommand

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Jim Ingham via lldb-commits
So does the build machinery use the one that is supported on that platform if 
it is available?  They are going to get much wider testing, fixes and even 
"ENHANCE"ments...

Jim

> On Sep 21, 2016, at 6:07 PM, Zachary Turner  wrote:
> 
> Windows :)
> 
> But that was before std::regex. In theory std::regex would work everywhere 
> (although idk how it performs)
> On Wed, Sep 21, 2016 at 6:04 PM Jim Ingham  wrote:
> It seems a little odd that llvm has its own forked copy of Henry Spencer's 
> old regular expression engine?  Are there platforms we care about that don't 
> have a maintained version of exactly the same code?
> 
> Jim
> 
> 
> > On Sep 21, 2016, at 5:42 PM, Zachary Turner  wrote:
> >
> > I'll try to address the Regex issue this week (by making everything use 
> > llvm extended mode regexes).  If someone feels up to the challenge, adding 
> > support for \d \s etc etc to llvm's regex implementation would make a lot 
> > of people very happy.
> >
> > On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham  wrote:
> >
> > > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits 
> > >  wrote:
> > >
> > > Yep, and we can't have any regex objects in LLDB using those because they 
> > > will only work on Apple and we don't want code like:
> > >
> > > #if defined(__APPLE__)
> > >   RegularExpression e("\s+");
> > > #else
> > >   RegularExpression e("[ \t]+");
> > > #endif
> > >
> > > I know "\s" is probably extended, so this was a bad example, but you get 
> > > my drift.
> >
> > Nope, sadly for extended you have to say [[:space:]].  To avoid the #define 
> > problem, we could require only extended regular expressions in lldb code, 
> > but let users type the more convenient enhanced one wherever the command 
> > line uses them.  But beyond these convenient shortcuts there probably 
> > aren't that many additions that would be useful for the kind of regular 
> > expressions our users are likely to write.  If we ever provide "-r" option 
> > to "memory find" the byte literal extension might come in handy.  But I 
> > don't think that justifies making MacOS builds different.
> >
> > If it really bothered us we could go get a more modern regex engine from 
> > somewhere (rip it out of Tcl or something like that...)
> >
> > Jim
> >
> >
> > >
> > > Greg
> > >
> > >> On Sep 21, 2016, at 5:19 PM, Zachary Turner  wrote:
> > >>
> > >> That sounds like a plan.  BTW, extended is the one that everyone 
> > >> supports, enhanced is the one that only apple supports.
> > >>
> > >> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton  wrote:
> > >> And we should check for any "extended" mode regex stuff and get rid of 
> > >> it since as you said they are not portable. They tend to be shortcuts 
> > >> for classes of objects and we can just fix the regex to be more explicit 
> > >> about it. For now we would keep the lldb_private::RegularExpression 
> > >> class, have it have a llvm::Regex member and then lldbassert if the 
> > >> regex fails to compile so we can catch any extended cruft that we might 
> > >> miss so we can fix it...
> > >>
> > >>> On Sep 21, 2016, at 5:15 PM, Greg Clayton  wrote:
> > >>>
> > >>> To be clear: if we can make StringRef work efficiently, I am fine with 
> > >>> that. We can just cut over to using llvm::Regex where it uses the start 
> > >>> and end pointer. I would just like to avoid making string copies for no 
> > >>> reason. I don't have anything against using StringRef as long as we do 
> > >>> it efficiently.
> > >>>
> > >>> Greg
> > >>>
> > >>>
> >  On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits 
> >   wrote:
> > 
> > >
> > > On Sep 21, 2016, at 4:43 PM, Zachary Turner  
> > > wrote:
> > >
> > > You need to duplicate something on the heap once when you execute the 
> > > regex.  And in turn you save tens or hundreds or copies on the way 
> > > there because of inefficient string usage.
> > 
> >  Where? please show this.
> > 
> >  I see the following callers:
> > 
> > const char *class_name =
> > iterator->second->GetClassName().AsCString("");
> > if (regex_up && class_name &&
> > !regex_up->Execute(llvm::StringRef(class_name)))
> > 
> >  You are adding a strlen() call here to construct the StringRef, not 
> >  saving anything.
> > 
> > 
> >  bool CommandObjectRegexCommand::DoExecute(const char *command,
> > CommandReturnObject &result) {
> >  if (command) {
> >   EntryCollection::const_iterator pos, end = m_entries.end();
> >   for (pos = m_entries.begin(); pos != end; ++pos) {
> > RegularExpression::Match regex_match(m_max_matches);
> > 
> > if (pos->regex.Execute(command, ®ex_match)) {
> >   std::string new_command(pos->command);
> >   std::string match_str;
> > 
> >  No copy saved. Just wasted time with strlen in StringRef constructor.
> > 
> > 

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
Windows :)

But that was before std::regex. In theory std::regex would work everywhere
(although idk how it performs)
On Wed, Sep 21, 2016 at 6:04 PM Jim Ingham  wrote:

> It seems a little odd that llvm has its own forked copy of Henry Spencer's
> old regular expression engine?  Are there platforms we care about that
> don't have a maintained version of exactly the same code?
>
> Jim
>
>
> > On Sep 21, 2016, at 5:42 PM, Zachary Turner  wrote:
> >
> > I'll try to address the Regex issue this week (by making everything use
> llvm extended mode regexes).  If someone feels up to the challenge, adding
> support for \d \s etc etc to llvm's regex implementation would make a lot
> of people very happy.
> >
> > On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham  wrote:
> >
> > > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> > >
> > > Yep, and we can't have any regex objects in LLDB using those because
> they will only work on Apple and we don't want code like:
> > >
> > > #if defined(__APPLE__)
> > >   RegularExpression e("\s+");
> > > #else
> > >   RegularExpression e("[ \t]+");
> > > #endif
> > >
> > > I know "\s" is probably extended, so this was a bad example, but you
> get my drift.
> >
> > Nope, sadly for extended you have to say [[:space:]].  To avoid the
> #define problem, we could require only extended regular expressions in lldb
> code, but let users type the more convenient enhanced one wherever the
> command line uses them.  But beyond these convenient shortcuts there
> probably aren't that many additions that would be useful for the kind of
> regular expressions our users are likely to write.  If we ever provide "-r"
> option to "memory find" the byte literal extension might come in handy.
> But I don't think that justifies making MacOS builds different.
> >
> > If it really bothered us we could go get a more modern regex engine from
> somewhere (rip it out of Tcl or something like that...)
> >
> > Jim
> >
> >
> > >
> > > Greg
> > >
> > >> On Sep 21, 2016, at 5:19 PM, Zachary Turner 
> wrote:
> > >>
> > >> That sounds like a plan.  BTW, extended is the one that everyone
> supports, enhanced is the one that only apple supports.
> > >>
> > >> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton 
> wrote:
> > >> And we should check for any "extended" mode regex stuff and get rid
> of it since as you said they are not portable. They tend to be shortcuts
> for classes of objects and we can just fix the regex to be more explicit
> about it. For now we would keep the lldb_private::RegularExpression class,
> have it have a llvm::Regex member and then lldbassert if the regex fails to
> compile so we can catch any extended cruft that we might miss so we can fix
> it...
> > >>
> > >>> On Sep 21, 2016, at 5:15 PM, Greg Clayton 
> wrote:
> > >>>
> > >>> To be clear: if we can make StringRef work efficiently, I am fine
> with that. We can just cut over to using llvm::Regex where it uses the
> start and end pointer. I would just like to avoid making string copies for
> no reason. I don't have anything against using StringRef as long as we do
> it efficiently.
> > >>>
> > >>> Greg
> > >>>
> > >>>
> >  On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> > 
> > >
> > > On Sep 21, 2016, at 4:43 PM, Zachary Turner 
> wrote:
> > >
> > > You need to duplicate something on the heap once when you execute
> the regex.  And in turn you save tens or hundreds or copies on the way
> there because of inefficient string usage.
> > 
> >  Where? please show this.
> > 
> >  I see the following callers:
> > 
> > const char *class_name =
> > iterator->second->GetClassName().AsCString("");
> > if (regex_up && class_name &&
> > !regex_up->Execute(llvm::StringRef(class_name)))
> > 
> >  You are adding a strlen() call here to construct the StringRef, not
> saving anything.
> > 
> > 
> >  bool CommandObjectRegexCommand::DoExecute(const char *command,
> > CommandReturnObject &result)
> {
> >  if (command) {
> >   EntryCollection::const_iterator pos, end = m_entries.end();
> >   for (pos = m_entries.begin(); pos != end; ++pos) {
> > RegularExpression::Match regex_match(m_max_matches);
> > 
> > if (pos->regex.Execute(command, ®ex_match)) {
> >   std::string new_command(pos->command);
> >   std::string match_str;
> > 
> >  No copy saved. Just wasted time with strlen in StringRef
> constructor.
> > 
> > 
> >   DataVisualization::Categories::ForEach(
> >   [®ex, &result](const lldb::TypeCategoryImplSP &category_sp)
> -> bool {
> > if (regex) {
> >   bool escape = true;
> >   if (regex->GetText() == category_sp->GetName()) {
> > escape = false;
> >   } else if (re

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Jim Ingham via lldb-commits
It seems a little odd that llvm has its own forked copy of Henry Spencer's old 
regular expression engine?  Are there platforms we care about that don't have a 
maintained version of exactly the same code?  

Jim


> On Sep 21, 2016, at 5:42 PM, Zachary Turner  wrote:
> 
> I'll try to address the Regex issue this week (by making everything use llvm 
> extended mode regexes).  If someone feels up to the challenge, adding support 
> for \d \s etc etc to llvm's regex implementation would make a lot of people 
> very happy.
> 
> On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham  wrote:
> 
> > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits 
> >  wrote:
> >
> > Yep, and we can't have any regex objects in LLDB using those because they 
> > will only work on Apple and we don't want code like:
> >
> > #if defined(__APPLE__)
> >   RegularExpression e("\s+");
> > #else
> >   RegularExpression e("[ \t]+");
> > #endif
> >
> > I know "\s" is probably extended, so this was a bad example, but you get my 
> > drift.
> 
> Nope, sadly for extended you have to say [[:space:]].  To avoid the #define 
> problem, we could require only extended regular expressions in lldb code, but 
> let users type the more convenient enhanced one wherever the command line 
> uses them.  But beyond these convenient shortcuts there probably aren't that 
> many additions that would be useful for the kind of regular expressions our 
> users are likely to write.  If we ever provide "-r" option to "memory find" 
> the byte literal extension might come in handy.  But I don't think that 
> justifies making MacOS builds different.
> 
> If it really bothered us we could go get a more modern regex engine from 
> somewhere (rip it out of Tcl or something like that...)
> 
> Jim
> 
> 
> >
> > Greg
> >
> >> On Sep 21, 2016, at 5:19 PM, Zachary Turner  wrote:
> >>
> >> That sounds like a plan.  BTW, extended is the one that everyone supports, 
> >> enhanced is the one that only apple supports.
> >>
> >> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton  wrote:
> >> And we should check for any "extended" mode regex stuff and get rid of it 
> >> since as you said they are not portable. They tend to be shortcuts for 
> >> classes of objects and we can just fix the regex to be more explicit about 
> >> it. For now we would keep the lldb_private::RegularExpression class, have 
> >> it have a llvm::Regex member and then lldbassert if the regex fails to 
> >> compile so we can catch any extended cruft that we might miss so we can 
> >> fix it...
> >>
> >>> On Sep 21, 2016, at 5:15 PM, Greg Clayton  wrote:
> >>>
> >>> To be clear: if we can make StringRef work efficiently, I am fine with 
> >>> that. We can just cut over to using llvm::Regex where it uses the start 
> >>> and end pointer. I would just like to avoid making string copies for no 
> >>> reason. I don't have anything against using StringRef as long as we do it 
> >>> efficiently.
> >>>
> >>> Greg
> >>>
> >>>
>  On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits 
>   wrote:
> 
> >
> > On Sep 21, 2016, at 4:43 PM, Zachary Turner  wrote:
> >
> > You need to duplicate something on the heap once when you execute the 
> > regex.  And in turn you save tens or hundreds or copies on the way 
> > there because of inefficient string usage.
> 
>  Where? please show this.
> 
>  I see the following callers:
> 
> const char *class_name =
> iterator->second->GetClassName().AsCString("");
> if (regex_up && class_name &&
> !regex_up->Execute(llvm::StringRef(class_name)))
> 
>  You are adding a strlen() call here to construct the StringRef, not 
>  saving anything.
> 
> 
>  bool CommandObjectRegexCommand::DoExecute(const char *command,
> CommandReturnObject &result) {
>  if (command) {
>   EntryCollection::const_iterator pos, end = m_entries.end();
>   for (pos = m_entries.begin(); pos != end; ++pos) {
> RegularExpression::Match regex_match(m_max_matches);
> 
> if (pos->regex.Execute(command, ®ex_match)) {
>   std::string new_command(pos->command);
>   std::string match_str;
> 
>  No copy saved. Just wasted time with strlen in StringRef constructor.
> 
> 
>   DataVisualization::Categories::ForEach(
>   [®ex, &result](const lldb::TypeCategoryImplSP &category_sp) -> 
>  bool {
> if (regex) {
>   bool escape = true;
>   if (regex->GetText() == category_sp->GetName()) {
> escape = false;
>   } else if (regex->Execute(llvm::StringRef::withNullAsEmpty(
>  category_sp->GetName( {
> escape = false;
>   }
> 
>   if (escape)
> return true;
> }
> 
>  No copy saved. Just wasted time with strlen in Str

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
I'll try to address the Regex issue this week (by making everything use
llvm extended mode regexes).  If someone feels up to the challenge, adding
support for \d \s etc etc to llvm's regex implementation would make a lot
of people very happy.

On Wed, Sep 21, 2016 at 5:38 PM Jim Ingham  wrote:

>
> > On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> >
> > Yep, and we can't have any regex objects in LLDB using those because
> they will only work on Apple and we don't want code like:
> >
> > #if defined(__APPLE__)
> >   RegularExpression e("\s+");
> > #else
> >   RegularExpression e("[ \t]+");
> > #endif
> >
> > I know "\s" is probably extended, so this was a bad example, but you get
> my drift.
>
> Nope, sadly for extended you have to say [[:space:]].  To avoid the
> #define problem, we could require only extended regular expressions in lldb
> code, but let users type the more convenient enhanced one wherever the
> command line uses them.  But beyond these convenient shortcuts there
> probably aren't that many additions that would be useful for the kind of
> regular expressions our users are likely to write.  If we ever provide "-r"
> option to "memory find" the byte literal extension might come in handy.
> But I don't think that justifies making MacOS builds different.
>
> If it really bothered us we could go get a more modern regex engine from
> somewhere (rip it out of Tcl or something like that...)
>
> Jim
>
>
> >
> > Greg
> >
> >> On Sep 21, 2016, at 5:19 PM, Zachary Turner  wrote:
> >>
> >> That sounds like a plan.  BTW, extended is the one that everyone
> supports, enhanced is the one that only apple supports.
> >>
> >> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton 
> wrote:
> >> And we should check for any "extended" mode regex stuff and get rid of
> it since as you said they are not portable. They tend to be shortcuts for
> classes of objects and we can just fix the regex to be more explicit about
> it. For now we would keep the lldb_private::RegularExpression class, have
> it have a llvm::Regex member and then lldbassert if the regex fails to
> compile so we can catch any extended cruft that we might miss so we can fix
> it...
> >>
> >>> On Sep 21, 2016, at 5:15 PM, Greg Clayton  wrote:
> >>>
> >>> To be clear: if we can make StringRef work efficiently, I am fine with
> that. We can just cut over to using llvm::Regex where it uses the start and
> end pointer. I would just like to avoid making string copies for no reason.
> I don't have anything against using StringRef as long as we do it
> efficiently.
> >>>
> >>> Greg
> >>>
> >>>
>  On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> 
> >
> > On Sep 21, 2016, at 4:43 PM, Zachary Turner 
> wrote:
> >
> > You need to duplicate something on the heap once when you execute
> the regex.  And in turn you save tens or hundreds or copies on the way
> there because of inefficient string usage.
> 
>  Where? please show this.
> 
>  I see the following callers:
> 
> const char *class_name =
> iterator->second->GetClassName().AsCString("");
> if (regex_up && class_name &&
> !regex_up->Execute(llvm::StringRef(class_name)))
> 
>  You are adding a strlen() call here to construct the StringRef, not
> saving anything.
> 
> 
>  bool CommandObjectRegexCommand::DoExecute(const char *command,
> CommandReturnObject &result) {
>  if (command) {
>   EntryCollection::const_iterator pos, end = m_entries.end();
>   for (pos = m_entries.begin(); pos != end; ++pos) {
> RegularExpression::Match regex_match(m_max_matches);
> 
> if (pos->regex.Execute(command, ®ex_match)) {
>   std::string new_command(pos->command);
>   std::string match_str;
> 
>  No copy saved. Just wasted time with strlen in StringRef constructor.
> 
> 
>   DataVisualization::Categories::ForEach(
>   [®ex, &result](const lldb::TypeCategoryImplSP &category_sp)
> -> bool {
> if (regex) {
>   bool escape = true;
>   if (regex->GetText() == category_sp->GetName()) {
> escape = false;
>   } else if (regex->Execute(llvm::StringRef::withNullAsEmpty(
>  category_sp->GetName( {
> escape = false;
>   }
> 
>   if (escape)
> return true;
> }
> 
>  No copy saved. Just wasted time with strlen in StringRef constructor.
> 
> 
> TypeCategoryImpl::ForEachCallbacks foreach;
> foreach
>   .SetExact([&result, &formatter_regex, &any_printed](
> ConstString name,
> const FormatterSharedPointer &format_sp) -> bool {
> if (formatter_regex)

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Jim Ingham via lldb-commits

> On Sep 21, 2016, at 5:25 PM, Greg Clayton via lldb-commits 
>  wrote:
> 
> Yep, and we can't have any regex objects in LLDB using those because they 
> will only work on Apple and we don't want code like:
> 
> #if defined(__APPLE__)
>   RegularExpression e("\s+");
> #else
>   RegularExpression e("[ \t]+");
> #endif
> 
> I know "\s" is probably extended, so this was a bad example, but you get my 
> drift.

Nope, sadly for extended you have to say [[:space:]].  To avoid the #define 
problem, we could require only extended regular expressions in lldb code, but 
let users type the more convenient enhanced one wherever the command line uses 
them.  But beyond these convenient shortcuts there probably aren't that many 
additions that would be useful for the kind of regular expressions our users 
are likely to write.  If we ever provide "-r" option to "memory find" the byte 
literal extension might come in handy.  But I don't think that justifies making 
MacOS builds different.

If it really bothered us we could go get a more modern regex engine from 
somewhere (rip it out of Tcl or something like that...)

Jim


> 
> Greg
> 
>> On Sep 21, 2016, at 5:19 PM, Zachary Turner  wrote:
>> 
>> That sounds like a plan.  BTW, extended is the one that everyone supports, 
>> enhanced is the one that only apple supports.  
>> 
>> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton  wrote:
>> And we should check for any "extended" mode regex stuff and get rid of it 
>> since as you said they are not portable. They tend to be shortcuts for 
>> classes of objects and we can just fix the regex to be more explicit about 
>> it. For now we would keep the lldb_private::RegularExpression class, have it 
>> have a llvm::Regex member and then lldbassert if the regex fails to compile 
>> so we can catch any extended cruft that we might miss so we can fix it...
>> 
>>> On Sep 21, 2016, at 5:15 PM, Greg Clayton  wrote:
>>> 
>>> To be clear: if we can make StringRef work efficiently, I am fine with 
>>> that. We can just cut over to using llvm::Regex where it uses the start and 
>>> end pointer. I would just like to avoid making string copies for no reason. 
>>> I don't have anything against using StringRef as long as we do it 
>>> efficiently.
>>> 
>>> Greg
>>> 
>>> 
 On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits 
  wrote:
 
> 
> On Sep 21, 2016, at 4:43 PM, Zachary Turner  wrote:
> 
> You need to duplicate something on the heap once when you execute the 
> regex.  And in turn you save tens or hundreds or copies on the way there 
> because of inefficient string usage.
 
 Where? please show this.
 
 I see the following callers:
 
const char *class_name =
iterator->second->GetClassName().AsCString("");
if (regex_up && class_name &&
!regex_up->Execute(llvm::StringRef(class_name)))
 
 You are adding a strlen() call here to construct the StringRef, not saving 
 anything.
 
 
 bool CommandObjectRegexCommand::DoExecute(const char *command,
CommandReturnObject &result) {
 if (command) {
  EntryCollection::const_iterator pos, end = m_entries.end();
  for (pos = m_entries.begin(); pos != end; ++pos) {
RegularExpression::Match regex_match(m_max_matches);
 
if (pos->regex.Execute(command, ®ex_match)) {
  std::string new_command(pos->command);
  std::string match_str;
 
 No copy saved. Just wasted time with strlen in StringRef constructor.
 
 
  DataVisualization::Categories::ForEach(
  [®ex, &result](const lldb::TypeCategoryImplSP &category_sp) -> 
 bool {
if (regex) {
  bool escape = true;
  if (regex->GetText() == category_sp->GetName()) {
escape = false;
  } else if (regex->Execute(llvm::StringRef::withNullAsEmpty(
 category_sp->GetName( {
escape = false;
  }
 
  if (escape)
return true;
}
 
 No copy saved. Just wasted time with strlen in StringRef constructor.
 
 
TypeCategoryImpl::ForEachCallbacks foreach;
foreach
  .SetExact([&result, &formatter_regex, &any_printed](
ConstString name,
const FormatterSharedPointer &format_sp) -> bool {
if (formatter_regex) {
  bool escape = true;
  if (name.GetStringRef() == formatter_regex->GetText()) {
escape = false;
  } else if (formatter_regex->Execute(name.GetStringRef())) {
escape = false;
  }
 
 No copy saved. Just wasted time with strlen in StringRef constructor.
 
  bool ParseCoordinate(const char *id_cstr) {
RegularExpression regex;
RegularExpression::Match regex_

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
On Wed, Sep 21, 2016 at 5:13 PM Greg Clayton  wrote:

>
> > On Sep 21, 2016, at 4:43 PM, Zachary Turner  wrote:
> >
> > You need to duplicate something on the heap once when you execute the
> regex.  And in turn you save tens or hundreds or copies on the way there
> because of inefficient string usage.
>
> Where? please show this.
>
> I see the following callers:
>
>   const char *class_name =
>   iterator->second->GetClassName().AsCString("");
>   if (regex_up && class_name &&
>   !regex_up->Execute(llvm::StringRef(class_name)))
>
> You are adding a strlen() call here to construct the StringRef, not saving
> anything.
>
Right, this is only this way because i deleted the const char * version of
the function.  Which as I explained in a previous email, I did because it
forces the compiler to tell you when you're using a raw string literal, so
you can manually examine it and make sure it's not null, since that would
cause it to crash.  So you're right, here it doesn't save you anything, but
it doesn't hurt anything either, and the explicit conversion helps catch
bugs in OTHER  places.  Which again, is only temporary until all the
conversions are done, and we can remove the =delete overloads.


>
>
> bool CommandObjectRegexCommand::DoExecute(const char *command,
>   CommandReturnObject &result) {
>   if (command) {
> EntryCollection::const_iterator pos, end = m_entries.end();
> for (pos = m_entries.begin(); pos != end; ++pos) {
>   RegularExpression::Match regex_match(m_max_matches);
>
>   if (pos->regex.Execute(command, ®ex_match)) {
> std::string new_command(pos->command);
> std::string match_str;
>
> No copy saved. Just wasted time with strlen in StringRef constructor.
>
Right.  But that's because DoExecute takes a const char*, not a StringRef.
And again, doing the entire codebase all at once is infeasible.  Think
about how many places along the pipeline are using this command.  This
isn't the only place where we use it.  We use it later in the function, we
use it in the function above us on the callchain.  We use it lower down on
the callchain.  And every time we do anything with it, we calculate the
length.  So yes, there's an extra one here.  But again, it's only temporary
until in a subsequent patch I change DoExecute() to take a StringRef.
Can't do it all at once.


>
>
> DataVisualization::Categories::ForEach(
> [®ex, &result](const lldb::TypeCategoryImplSP &category_sp) ->
> bool {
>   if (regex) {
> bool escape = true;
> if (regex->GetText() == category_sp->GetName()) {
>   escape = false;
> } else if (regex->Execute(llvm::StringRef::withNullAsEmpty(
>category_sp->GetName( {
>   escape = false;
> }
>
> if (escape)
>   return true;
>   }
>
> No copy saved. Just wasted time with strlen in StringRef constructor.
>
>
>   TypeCategoryImpl::ForEachCallbacks foreach;
>   foreach
> .SetExact([&result, &formatter_regex, &any_printed](
>   ConstString name,
>   const FormatterSharedPointer &format_sp) -> bool {
>   if (formatter_regex) {
> bool escape = true;
> if (name.GetStringRef() == formatter_regex->GetText()) {
>   escape = false;
> } else if (formatter_regex->Execute(name.GetStringRef())) {
>   escape = false;
> }
>
> No copy saved. Just wasted time with strlen in StringRef constructor.
>
> bool ParseCoordinate(const char *id_cstr) {
>   RegularExpression regex;
>   RegularExpression::Match regex_match(3);
>
>   llvm::StringRef id_ref = llvm::StringRef::withNullAsEmpty(id_cstr);
>   bool matched = false;
>   if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+),([0-9]+)$")) &&
>   regex.Execute(id_ref, ®ex_match))
> matched = true;
>   else if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+)$")) &&
>regex.Execute(id_ref, ®ex_match))
> matched = true;
>   else if (regex.Compile(llvm::StringRef("^([0-9]+)$")) &&
>regex.Execute(id_ref, ®ex_match))
>
> No copy saved. Just wasted time with strlen in StringRef constructor.
>
Again, the solution is to have ParseCoordinate take a StringRef.  Problem
solved.


>
> void DWARFCompileUnit::ParseProducerInfo() {
>   m_producer_version_major = UINT32_MAX;
>   m_producer_version_minor = UINT32_MAX;
>   m_producer_version_update = UINT32_MAX;
>
>   const DWARFDebugInfoEntry *die = GetCompileUnitDIEPtrOnly();
>   if (die) {
>
> const char *producer_cstr = die->GetAttributeValueAsString(
> m_dwarf2Data, this, DW_AT_producer, NULL);
> if (producer_cstr) {
>   RegularExpression llvm_gcc_regex(
>   llvm::StringRef("^4\\.[012]\\.[01] \\(Based on Apple "
>   

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Greg Clayton via lldb-commits
Yep, and we can't have any regex objects in LLDB using those because they will 
only work on Apple and we don't want code like:

#if defined(__APPLE__)
   RegularExpression e("\s+");
#else
   RegularExpression e("[ \t]+");
#endif

I know "\s" is probably extended, so this was a bad example, but you get my 
drift.

Greg

> On Sep 21, 2016, at 5:19 PM, Zachary Turner  wrote:
> 
> That sounds like a plan.  BTW, extended is the one that everyone supports, 
> enhanced is the one that only apple supports.  
> 
> On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton  wrote:
> And we should check for any "extended" mode regex stuff and get rid of it 
> since as you said they are not portable. They tend to be shortcuts for 
> classes of objects and we can just fix the regex to be more explicit about 
> it. For now we would keep the lldb_private::RegularExpression class, have it 
> have a llvm::Regex member and then lldbassert if the regex fails to compile 
> so we can catch any extended cruft that we might miss so we can fix it...
> 
> > On Sep 21, 2016, at 5:15 PM, Greg Clayton  wrote:
> >
> > To be clear: if we can make StringRef work efficiently, I am fine with 
> > that. We can just cut over to using llvm::Regex where it uses the start and 
> > end pointer. I would just like to avoid making string copies for no reason. 
> > I don't have anything against using StringRef as long as we do it 
> > efficiently.
> >
> > Greg
> >
> >
> >> On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits 
> >>  wrote:
> >>
> >>>
> >>> On Sep 21, 2016, at 4:43 PM, Zachary Turner  wrote:
> >>>
> >>> You need to duplicate something on the heap once when you execute the 
> >>> regex.  And in turn you save tens or hundreds or copies on the way there 
> >>> because of inefficient string usage.
> >>
> >> Where? please show this.
> >>
> >> I see the following callers:
> >>
> >> const char *class_name =
> >> iterator->second->GetClassName().AsCString("");
> >> if (regex_up && class_name &&
> >> !regex_up->Execute(llvm::StringRef(class_name)))
> >>
> >> You are adding a strlen() call here to construct the StringRef, not saving 
> >> anything.
> >>
> >>
> >> bool CommandObjectRegexCommand::DoExecute(const char *command,
> >> CommandReturnObject &result) {
> >> if (command) {
> >>   EntryCollection::const_iterator pos, end = m_entries.end();
> >>   for (pos = m_entries.begin(); pos != end; ++pos) {
> >> RegularExpression::Match regex_match(m_max_matches);
> >>
> >> if (pos->regex.Execute(command, ®ex_match)) {
> >>   std::string new_command(pos->command);
> >>   std::string match_str;
> >>
> >> No copy saved. Just wasted time with strlen in StringRef constructor.
> >>
> >>
> >>   DataVisualization::Categories::ForEach(
> >>   [®ex, &result](const lldb::TypeCategoryImplSP &category_sp) -> 
> >> bool {
> >> if (regex) {
> >>   bool escape = true;
> >>   if (regex->GetText() == category_sp->GetName()) {
> >> escape = false;
> >>   } else if (regex->Execute(llvm::StringRef::withNullAsEmpty(
> >>  category_sp->GetName( {
> >> escape = false;
> >>   }
> >>
> >>   if (escape)
> >> return true;
> >> }
> >>
> >> No copy saved. Just wasted time with strlen in StringRef constructor.
> >>
> >>
> >> TypeCategoryImpl::ForEachCallbacks foreach;
> >> foreach
> >>   .SetExact([&result, &formatter_regex, &any_printed](
> >> ConstString name,
> >> const FormatterSharedPointer &format_sp) -> bool {
> >> if (formatter_regex) {
> >>   bool escape = true;
> >>   if (name.GetStringRef() == formatter_regex->GetText()) {
> >> escape = false;
> >>   } else if (formatter_regex->Execute(name.GetStringRef())) {
> >> escape = false;
> >>   }
> >>
> >> No copy saved. Just wasted time with strlen in StringRef constructor.
> >>
> >>   bool ParseCoordinate(const char *id_cstr) {
> >> RegularExpression regex;
> >> RegularExpression::Match regex_match(3);
> >>
> >> llvm::StringRef id_ref = llvm::StringRef::withNullAsEmpty(id_cstr);
> >> bool matched = false;
> >> if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+),([0-9]+)$")) &&
> >> regex.Execute(id_ref, ®ex_match))
> >>   matched = true;
> >> else if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+)$")) &&
> >>  regex.Execute(id_ref, ®ex_match))
> >>   matched = true;
> >> else if (regex.Compile(llvm::StringRef("^([0-9]+)$")) &&
> >>  regex.Execute(id_ref, ®ex_match))
> >>
> >> No copy saved. Just wasted time with strlen in StringRef constructor.
> >>
> >> void DWARFCompileUnit::ParseProducerInfo() {
> >> m_producer_version_major = UINT32_MAX;
> >> m_producer_version_minor = UINT32_MAX;
> >> m_producer_version_update = UINT32_MAX;

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
That sounds like a plan.  BTW, extended is the one that everyone supports,
enhanced is the one that only apple supports.

On Wed, Sep 21, 2016 at 5:18 PM Greg Clayton  wrote:

> And we should check for any "extended" mode regex stuff and get rid of it
> since as you said they are not portable. They tend to be shortcuts for
> classes of objects and we can just fix the regex to be more explicit about
> it. For now we would keep the lldb_private::RegularExpression class, have
> it have a llvm::Regex member and then lldbassert if the regex fails to
> compile so we can catch any extended cruft that we might miss so we can fix
> it...
>
> > On Sep 21, 2016, at 5:15 PM, Greg Clayton  wrote:
> >
> > To be clear: if we can make StringRef work efficiently, I am fine with
> that. We can just cut over to using llvm::Regex where it uses the start and
> end pointer. I would just like to avoid making string copies for no reason.
> I don't have anything against using StringRef as long as we do it
> efficiently.
> >
> > Greg
> >
> >
> >> On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> >>
> >>>
> >>> On Sep 21, 2016, at 4:43 PM, Zachary Turner 
> wrote:
> >>>
> >>> You need to duplicate something on the heap once when you execute the
> regex.  And in turn you save tens or hundreds or copies on the way there
> because of inefficient string usage.
> >>
> >> Where? please show this.
> >>
> >> I see the following callers:
> >>
> >> const char *class_name =
> >> iterator->second->GetClassName().AsCString("");
> >> if (regex_up && class_name &&
> >> !regex_up->Execute(llvm::StringRef(class_name)))
> >>
> >> You are adding a strlen() call here to construct the StringRef, not
> saving anything.
> >>
> >>
> >> bool CommandObjectRegexCommand::DoExecute(const char *command,
> >> CommandReturnObject &result) {
> >> if (command) {
> >>   EntryCollection::const_iterator pos, end = m_entries.end();
> >>   for (pos = m_entries.begin(); pos != end; ++pos) {
> >> RegularExpression::Match regex_match(m_max_matches);
> >>
> >> if (pos->regex.Execute(command, ®ex_match)) {
> >>   std::string new_command(pos->command);
> >>   std::string match_str;
> >>
> >> No copy saved. Just wasted time with strlen in StringRef constructor.
> >>
> >>
> >>   DataVisualization::Categories::ForEach(
> >>   [®ex, &result](const lldb::TypeCategoryImplSP &category_sp) ->
> bool {
> >> if (regex) {
> >>   bool escape = true;
> >>   if (regex->GetText() == category_sp->GetName()) {
> >> escape = false;
> >>   } else if (regex->Execute(llvm::StringRef::withNullAsEmpty(
> >>  category_sp->GetName( {
> >> escape = false;
> >>   }
> >>
> >>   if (escape)
> >> return true;
> >> }
> >>
> >> No copy saved. Just wasted time with strlen in StringRef constructor.
> >>
> >>
> >> TypeCategoryImpl::ForEachCallbacks foreach;
> >> foreach
> >>   .SetExact([&result, &formatter_regex, &any_printed](
> >> ConstString name,
> >> const FormatterSharedPointer &format_sp) -> bool {
> >> if (formatter_regex) {
> >>   bool escape = true;
> >>   if (name.GetStringRef() == formatter_regex->GetText()) {
> >> escape = false;
> >>   } else if (formatter_regex->Execute(name.GetStringRef())) {
> >> escape = false;
> >>   }
> >>
> >> No copy saved. Just wasted time with strlen in StringRef constructor.
> >>
> >>   bool ParseCoordinate(const char *id_cstr) {
> >> RegularExpression regex;
> >> RegularExpression::Match regex_match(3);
> >>
> >> llvm::StringRef id_ref = llvm::StringRef::withNullAsEmpty(id_cstr);
> >> bool matched = false;
> >> if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+),([0-9]+)$"))
> &&
> >> regex.Execute(id_ref, ®ex_match))
> >>   matched = true;
> >> else if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+)$")) &&
> >>  regex.Execute(id_ref, ®ex_match))
> >>   matched = true;
> >> else if (regex.Compile(llvm::StringRef("^([0-9]+)$")) &&
> >>  regex.Execute(id_ref, ®ex_match))
> >>
> >> No copy saved. Just wasted time with strlen in StringRef constructor.
> >>
> >> void DWARFCompileUnit::ParseProducerInfo() {
> >> m_producer_version_major = UINT32_MAX;
> >> m_producer_version_minor = UINT32_MAX;
> >> m_producer_version_update = UINT32_MAX;
> >>
> >> const DWARFDebugInfoEntry *die = GetCompileUnitDIEPtrOnly();
> >> if (die) {
> >>
> >>   const char *producer_cstr = die->GetAttributeValueAsString(
> >>   m_dwarf2Data, this, DW_AT_producer, NULL);
> >>   if (producer_cstr) {
> >> RegularExpression llvm_gcc_regex(
> >> llvm::StringRef("^4\\.[012]\\.[01] \\(Based on Apple "
> >> "Inc\\. bui

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Greg Clayton via lldb-commits
And we should check for any "extended" mode regex stuff and get rid of it since 
as you said they are not portable. They tend to be shortcuts for classes of 
objects and we can just fix the regex to be more explicit about it. For now we 
would keep the lldb_private::RegularExpression class, have it have a 
llvm::Regex member and then lldbassert if the regex fails to compile so we can 
catch any extended cruft that we might miss so we can fix it...

> On Sep 21, 2016, at 5:15 PM, Greg Clayton  wrote:
> 
> To be clear: if we can make StringRef work efficiently, I am fine with that. 
> We can just cut over to using llvm::Regex where it uses the start and end 
> pointer. I would just like to avoid making string copies for no reason. I 
> don't have anything against using StringRef as long as we do it efficiently.
> 
> Greg
> 
> 
>> On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits 
>>  wrote:
>> 
>>> 
>>> On Sep 21, 2016, at 4:43 PM, Zachary Turner  wrote:
>>> 
>>> You need to duplicate something on the heap once when you execute the 
>>> regex.  And in turn you save tens or hundreds or copies on the way there 
>>> because of inefficient string usage.  
>> 
>> Where? please show this.
>> 
>> I see the following callers:
>> 
>> const char *class_name =
>> iterator->second->GetClassName().AsCString("");
>> if (regex_up && class_name &&
>> !regex_up->Execute(llvm::StringRef(class_name)))
>> 
>> You are adding a strlen() call here to construct the StringRef, not saving 
>> anything.
>> 
>> 
>> bool CommandObjectRegexCommand::DoExecute(const char *command,
>> CommandReturnObject &result) {
>> if (command) {
>>   EntryCollection::const_iterator pos, end = m_entries.end();
>>   for (pos = m_entries.begin(); pos != end; ++pos) {
>> RegularExpression::Match regex_match(m_max_matches);
>> 
>> if (pos->regex.Execute(command, ®ex_match)) {
>>   std::string new_command(pos->command);
>>   std::string match_str;
>> 
>> No copy saved. Just wasted time with strlen in StringRef constructor.
>> 
>> 
>>   DataVisualization::Categories::ForEach(
>>   [®ex, &result](const lldb::TypeCategoryImplSP &category_sp) -> bool 
>> {
>> if (regex) {
>>   bool escape = true;
>>   if (regex->GetText() == category_sp->GetName()) {
>> escape = false;
>>   } else if (regex->Execute(llvm::StringRef::withNullAsEmpty(
>>  category_sp->GetName( {
>> escape = false;
>>   }
>> 
>>   if (escape)
>> return true;
>> }
>> 
>> No copy saved. Just wasted time with strlen in StringRef constructor.
>> 
>> 
>> TypeCategoryImpl::ForEachCallbacks foreach;
>> foreach
>>   .SetExact([&result, &formatter_regex, &any_printed](
>> ConstString name,
>> const FormatterSharedPointer &format_sp) -> bool {
>> if (formatter_regex) {
>>   bool escape = true;
>>   if (name.GetStringRef() == formatter_regex->GetText()) {
>> escape = false;
>>   } else if (formatter_regex->Execute(name.GetStringRef())) {
>> escape = false;
>>   }
>> 
>> No copy saved. Just wasted time with strlen in StringRef constructor.
>> 
>>   bool ParseCoordinate(const char *id_cstr) {
>> RegularExpression regex;
>> RegularExpression::Match regex_match(3);
>> 
>> llvm::StringRef id_ref = llvm::StringRef::withNullAsEmpty(id_cstr);
>> bool matched = false;
>> if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+),([0-9]+)$")) &&
>> regex.Execute(id_ref, ®ex_match))
>>   matched = true;
>> else if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+)$")) &&
>>  regex.Execute(id_ref, ®ex_match))
>>   matched = true;
>> else if (regex.Compile(llvm::StringRef("^([0-9]+)$")) &&
>>  regex.Execute(id_ref, ®ex_match))
>> 
>> No copy saved. Just wasted time with strlen in StringRef constructor.
>> 
>> void DWARFCompileUnit::ParseProducerInfo() {
>> m_producer_version_major = UINT32_MAX;
>> m_producer_version_minor = UINT32_MAX;
>> m_producer_version_update = UINT32_MAX;
>> 
>> const DWARFDebugInfoEntry *die = GetCompileUnitDIEPtrOnly();
>> if (die) {
>> 
>>   const char *producer_cstr = die->GetAttributeValueAsString(
>>   m_dwarf2Data, this, DW_AT_producer, NULL);
>>   if (producer_cstr) {
>> RegularExpression llvm_gcc_regex(
>> llvm::StringRef("^4\\.[012]\\.[01] \\(Based on Apple "
>> "Inc\\. build [0-9]+\\) \\(LLVM build "
>> "[\\.0-9]+\\)$"));
>> if (llvm_gcc_regex.Execute(llvm::StringRef(producer_cstr))) {
>>   m_producer = eProducerLLVMGCC;
>> } else if (strstr(producer_cstr, "clang")) {
>>   static RegularExpression g_clang_version_regex(
>>   llvm::StringRef("clang-([0-9]+)\\.([0-9]+)\\.([0-9]+)"));
>>   RegularExpre

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Greg Clayton via lldb-commits
To be clear: if we can make StringRef work efficiently, I am fine with that. We 
can just cut over to using llvm::Regex where it uses the start and end pointer. 
I would just like to avoid making string copies for no reason. I don't have 
anything against using StringRef as long as we do it efficiently.

Greg


> On Sep 21, 2016, at 5:13 PM, Greg Clayton via lldb-commits 
>  wrote:
> 
>> 
>> On Sep 21, 2016, at 4:43 PM, Zachary Turner  wrote:
>> 
>> You need to duplicate something on the heap once when you execute the regex. 
>>  And in turn you save tens or hundreds or copies on the way there because of 
>> inefficient string usage.  
> 
> Where? please show this.
> 
> I see the following callers:
> 
>  const char *class_name =
>  iterator->second->GetClassName().AsCString("");
>  if (regex_up && class_name &&
>  !regex_up->Execute(llvm::StringRef(class_name)))
> 
> You are adding a strlen() call here to construct the StringRef, not saving 
> anything.
> 
> 
> bool CommandObjectRegexCommand::DoExecute(const char *command,
>  CommandReturnObject &result) {
>  if (command) {
>EntryCollection::const_iterator pos, end = m_entries.end();
>for (pos = m_entries.begin(); pos != end; ++pos) {
>  RegularExpression::Match regex_match(m_max_matches);
> 
>  if (pos->regex.Execute(command, ®ex_match)) {
>std::string new_command(pos->command);
>std::string match_str;
> 
> No copy saved. Just wasted time with strlen in StringRef constructor.
> 
> 
>DataVisualization::Categories::ForEach(
>[®ex, &result](const lldb::TypeCategoryImplSP &category_sp) -> bool 
> {
>  if (regex) {
>bool escape = true;
>if (regex->GetText() == category_sp->GetName()) {
>  escape = false;
>} else if (regex->Execute(llvm::StringRef::withNullAsEmpty(
>   category_sp->GetName( {
>  escape = false;
>}
> 
>if (escape)
>  return true;
>  }
> 
> No copy saved. Just wasted time with strlen in StringRef constructor.
> 
> 
>  TypeCategoryImpl::ForEachCallbacks foreach;
>  foreach
>.SetExact([&result, &formatter_regex, &any_printed](
>  ConstString name,
>  const FormatterSharedPointer &format_sp) -> bool {
>  if (formatter_regex) {
>bool escape = true;
>if (name.GetStringRef() == formatter_regex->GetText()) {
>  escape = false;
>} else if (formatter_regex->Execute(name.GetStringRef())) {
>  escape = false;
>}
> 
> No copy saved. Just wasted time with strlen in StringRef constructor.
> 
>bool ParseCoordinate(const char *id_cstr) {
>  RegularExpression regex;
>  RegularExpression::Match regex_match(3);
> 
>  llvm::StringRef id_ref = llvm::StringRef::withNullAsEmpty(id_cstr);
>  bool matched = false;
>  if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+),([0-9]+)$")) &&
>  regex.Execute(id_ref, ®ex_match))
>matched = true;
>  else if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+)$")) &&
>   regex.Execute(id_ref, ®ex_match))
>matched = true;
>  else if (regex.Compile(llvm::StringRef("^([0-9]+)$")) &&
>   regex.Execute(id_ref, ®ex_match))
> 
> No copy saved. Just wasted time with strlen in StringRef constructor.
> 
> void DWARFCompileUnit::ParseProducerInfo() {
>  m_producer_version_major = UINT32_MAX;
>  m_producer_version_minor = UINT32_MAX;
>  m_producer_version_update = UINT32_MAX;
> 
>  const DWARFDebugInfoEntry *die = GetCompileUnitDIEPtrOnly();
>  if (die) {
> 
>const char *producer_cstr = die->GetAttributeValueAsString(
>m_dwarf2Data, this, DW_AT_producer, NULL);
>if (producer_cstr) {
>  RegularExpression llvm_gcc_regex(
>  llvm::StringRef("^4\\.[012]\\.[01] \\(Based on Apple "
>  "Inc\\. build [0-9]+\\) \\(LLVM build "
>  "[\\.0-9]+\\)$"));
>  if (llvm_gcc_regex.Execute(llvm::StringRef(producer_cstr))) {
>m_producer = eProducerLLVMGCC;
>  } else if (strstr(producer_cstr, "clang")) {
>static RegularExpression g_clang_version_regex(
>llvm::StringRef("clang-([0-9]+)\\.([0-9]+)\\.([0-9]+)"));
>RegularExpression::Match regex_match(3);
>if (g_clang_version_regex.Execute(llvm::StringRef(producer_cstr),
>  ®ex_match)) {
> 
> No copy saved. Just wasted time with strlen in StringRef constructor (2 of 
> them mind you).
> 
> void DWARFDebugPubnamesSet::Find(
>const RegularExpression ®ex,
>std::vector &die_offset_coll) const {
>  DescriptorConstIter pos;
>  DescriptorConstIter end = m_descriptors.end();
>  for (pos = m_descriptors.begin(); pos != end; ++pos) {
>if (regex.Execute(pos->name.c_str()))
>  die

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Greg Clayton via lldb-commits

> On Sep 21, 2016, at 4:43 PM, Zachary Turner  wrote:
> 
> You need to duplicate something on the heap once when you execute the regex.  
> And in turn you save tens or hundreds or copies on the way there because of 
> inefficient string usage.  

Where? please show this.

I see the following callers:

  const char *class_name =
  iterator->second->GetClassName().AsCString("");
  if (regex_up && class_name &&
  !regex_up->Execute(llvm::StringRef(class_name)))

You are adding a strlen() call here to construct the StringRef, not saving 
anything.


bool CommandObjectRegexCommand::DoExecute(const char *command,
  CommandReturnObject &result) {
  if (command) {
EntryCollection::const_iterator pos, end = m_entries.end();
for (pos = m_entries.begin(); pos != end; ++pos) {
  RegularExpression::Match regex_match(m_max_matches);

  if (pos->regex.Execute(command, ®ex_match)) {
std::string new_command(pos->command);
std::string match_str;

No copy saved. Just wasted time with strlen in StringRef constructor.


DataVisualization::Categories::ForEach(
[®ex, &result](const lldb::TypeCategoryImplSP &category_sp) -> bool {
  if (regex) {
bool escape = true;
if (regex->GetText() == category_sp->GetName()) {
  escape = false;
} else if (regex->Execute(llvm::StringRef::withNullAsEmpty(
   category_sp->GetName( {
  escape = false;
}

if (escape)
  return true;
  }

No copy saved. Just wasted time with strlen in StringRef constructor.


  TypeCategoryImpl::ForEachCallbacks foreach;
  foreach
.SetExact([&result, &formatter_regex, &any_printed](
  ConstString name,
  const FormatterSharedPointer &format_sp) -> bool {
  if (formatter_regex) {
bool escape = true;
if (name.GetStringRef() == formatter_regex->GetText()) {
  escape = false;
} else if (formatter_regex->Execute(name.GetStringRef())) {
  escape = false;
}

No copy saved. Just wasted time with strlen in StringRef constructor.

bool ParseCoordinate(const char *id_cstr) {
  RegularExpression regex;
  RegularExpression::Match regex_match(3);

  llvm::StringRef id_ref = llvm::StringRef::withNullAsEmpty(id_cstr);
  bool matched = false;
  if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+),([0-9]+)$")) &&
  regex.Execute(id_ref, ®ex_match))
matched = true;
  else if (regex.Compile(llvm::StringRef("^([0-9]+),([0-9]+)$")) &&
   regex.Execute(id_ref, ®ex_match))
matched = true;
  else if (regex.Compile(llvm::StringRef("^([0-9]+)$")) &&
   regex.Execute(id_ref, ®ex_match))

No copy saved. Just wasted time with strlen in StringRef constructor.

void DWARFCompileUnit::ParseProducerInfo() {
  m_producer_version_major = UINT32_MAX;
  m_producer_version_minor = UINT32_MAX;
  m_producer_version_update = UINT32_MAX;

  const DWARFDebugInfoEntry *die = GetCompileUnitDIEPtrOnly();
  if (die) {

const char *producer_cstr = die->GetAttributeValueAsString(
m_dwarf2Data, this, DW_AT_producer, NULL);
if (producer_cstr) {
  RegularExpression llvm_gcc_regex(
  llvm::StringRef("^4\\.[012]\\.[01] \\(Based on Apple "
  "Inc\\. build [0-9]+\\) \\(LLVM build "
  "[\\.0-9]+\\)$"));
  if (llvm_gcc_regex.Execute(llvm::StringRef(producer_cstr))) {
m_producer = eProducerLLVMGCC;
  } else if (strstr(producer_cstr, "clang")) {
static RegularExpression g_clang_version_regex(
llvm::StringRef("clang-([0-9]+)\\.([0-9]+)\\.([0-9]+)"));
RegularExpression::Match regex_match(3);
if (g_clang_version_regex.Execute(llvm::StringRef(producer_cstr),
  ®ex_match)) {

No copy saved. Just wasted time with strlen in StringRef constructor (2 of them 
mind you).

void DWARFDebugPubnamesSet::Find(
const RegularExpression ®ex,
std::vector &die_offset_coll) const {
  DescriptorConstIter pos;
  DescriptorConstIter end = m_descriptors.end();
  for (pos = m_descriptors.begin(); pos != end; ++pos) {
if (regex.Execute(pos->name.c_str()))
  die_offset_coll.push_back(m_header.die_offset + pos->offset);
  }
}

No copy saved. Just wasted time with strlen in StringRef constructor (2 of them 
mind you).


  std::string slice_str;
  if (reg_info_dict->GetValueForKeyAsString("slice", slice_str, nullptr)) {
// Slices use the following format:
//  REGNAME[MSBIT:LSBIT]
// REGNAME - name of the register to grab a slice of
// MSBIT - the most significant bit at which the current register value
// starts at
// LSBIT - the least significant

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
Incidentally, it appears the version of regcomp() on apple supports REG_PEND

, although that is not a standard.  And apple is already the only platform
where enhanced regexes are a thing.  So if you want to keep enhanced
regexes, one thing we could do is:

a) Have all non Apple platforms delegate to the LLVM implementation.
b) Have Apple's implementation use the system library but set the end
pointer.

That said, if you don't care about enhanced mode, let's just have everyone
use LLVM's.

On Wed, Sep 21, 2016 at 5:03 PM Zachary Turner  wrote:

> It supports extended, just not *enhanced*.  Enhanced appears to be an
> apple specific thing, and even the developer documentation recommends not
> using it in portable code.
>
> On Wed, Sep 21, 2016 at 5:00 PM Jim Ingham  wrote:
>
> IIRC you said the llvm one doesn't support extended regular expressions.
> If that's true, please don't do this till that is fixed.  I think pretty
> much everybody who is using reg expressions expects to be able to use
> extended regular expressions.
>
> Jim
>
> > On Sep 21, 2016, at 4:58 PM, Zachary Turner via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> >
> > Incidentally even the STL regex implementation supports specifying the
> end pointer.  So I would call the system one deficient in this regard and
> advocate for replacing it sooner rather than later.
> >
> > On Wed, Sep 21, 2016 at 4:52 PM Zachary Turner 
> wrote:
> > Looks like llvm's regex is better than LLDB's in this regard, since it
> supports explicitly setting the end pointer.  I can see a couple options:
> >
> > 1) Check if it's null terminated by peeking one past the end, and
> copying if it's not.  This is pretty hackish, not crazy about this idea.
> > 2) Un-delete the const char * version of the function but leave the
> StringRef overload, find all places where I added the explicit conversion
> and remove them so they invoke the const char* overload.
> > 3) Change lldb::RegularExpression to just delegate to llvm under the
> hood and set the end pointer.
> >
> > On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner 
> wrote:
> > Actually wait, it doesn't.  It just explicitly sets the end pointer.
> >
> > On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner 
> wrote:
> > Worth noting that llvm::Regex has this constructor:
> >
> >
> > Regex::Regex(StringRef regex, unsigned Flags) {
> >   unsigned flags = 0;
> >   preg = new llvm_regex();
> >   preg->re_endp = regex.end();
> >   if (Flags & IgnoreCase)
> > flags |= REG_ICASE;
> >   if (Flags & Newline)
> > flags |= REG_NEWLINE;
> >   if (!(Flags & BasicRegex))
> > flags |= REG_EXTENDED;
> >   error = llvm_regcomp(preg, regex.data(), flags|REG_PEND);
> > }
> >
> > So it assumes null termination even though you have a StringRef.
> >
> > On Wed, Sep 21, 2016 at 4:43 PM Zachary Turner 
> wrote:
> > You need to duplicate something on the heap once when you execute the
> regex.  And in turn you save tens or hundreds or copies on the way there
> because of inefficient string usage.
> >
> > We could also just un-delete the overload that takes a const char*, then
> the duplication would only ever happen when you explicitly use a StringRef.
> >
> > I don't agree this should be reverted.  In the process of doing this
> conversion I eliminated numerous careless string copies.
> >
> > On Wed, Sep 21, 2016 at 4:38 PM Greg Clayton  wrote:
> > This it the perfect example of why not to use a StringRef since the
> string needs to be null terminated. Why did we change this? Now even if you
> call this function:
> >
> > RegularExpression r(...);
> >
> > r.Execute("...", ...)
> >
> > You will need to duplicate the string on the heap just to execute this.
> Please revert this. Anything that requires null terminate is not a
> candidate for converting to StringRef.
> >
> >
> > > On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> > >
> > > Author: zturner
> > > Date: Wed Sep 21 12:13:51 2016
> > > New Revision: 282090
> > >
> > > URL: http://llvm.org/viewvc/llvm-project?rev=282090&view=rev
> > > Log:
> > > Fix failing regex tests.
> > >
> > > r282079 converted the regular expression interface to accept
> > > and return StringRefs instead of char pointers.  In one case
> > > a null pointer check was converted to an empty string check,
> > > but this was an incorrect conversion because an empty string
> > > is a valid regular expression.  Removing this check should
> > > fix the test failures.
> > >
> > > Modified:
> > >lldb/trunk/source/Core/RegularExpression.cpp
> > >
> > > Modified: lldb/trunk/source/Core/RegularExpression.cpp
> > > URL:
> http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff
> > >
> ==
> > > --- l

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
It supports extended, just not *enhanced*.  Enhanced appears to be an apple
specific thing, and even the developer documentation recommends not using
it in portable code.

On Wed, Sep 21, 2016 at 5:00 PM Jim Ingham  wrote:

> IIRC you said the llvm one doesn't support extended regular expressions.
> If that's true, please don't do this till that is fixed.  I think pretty
> much everybody who is using reg expressions expects to be able to use
> extended regular expressions.
>
> Jim
>
> > On Sep 21, 2016, at 4:58 PM, Zachary Turner via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> >
> > Incidentally even the STL regex implementation supports specifying the
> end pointer.  So I would call the system one deficient in this regard and
> advocate for replacing it sooner rather than later.
> >
> > On Wed, Sep 21, 2016 at 4:52 PM Zachary Turner 
> wrote:
> > Looks like llvm's regex is better than LLDB's in this regard, since it
> supports explicitly setting the end pointer.  I can see a couple options:
> >
> > 1) Check if it's null terminated by peeking one past the end, and
> copying if it's not.  This is pretty hackish, not crazy about this idea.
> > 2) Un-delete the const char * version of the function but leave the
> StringRef overload, find all places where I added the explicit conversion
> and remove them so they invoke the const char* overload.
> > 3) Change lldb::RegularExpression to just delegate to llvm under the
> hood and set the end pointer.
> >
> > On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner 
> wrote:
> > Actually wait, it doesn't.  It just explicitly sets the end pointer.
> >
> > On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner 
> wrote:
> > Worth noting that llvm::Regex has this constructor:
> >
> >
> > Regex::Regex(StringRef regex, unsigned Flags) {
> >   unsigned flags = 0;
> >   preg = new llvm_regex();
> >   preg->re_endp = regex.end();
> >   if (Flags & IgnoreCase)
> > flags |= REG_ICASE;
> >   if (Flags & Newline)
> > flags |= REG_NEWLINE;
> >   if (!(Flags & BasicRegex))
> > flags |= REG_EXTENDED;
> >   error = llvm_regcomp(preg, regex.data(), flags|REG_PEND);
> > }
> >
> > So it assumes null termination even though you have a StringRef.
> >
> > On Wed, Sep 21, 2016 at 4:43 PM Zachary Turner 
> wrote:
> > You need to duplicate something on the heap once when you execute the
> regex.  And in turn you save tens or hundreds or copies on the way there
> because of inefficient string usage.
> >
> > We could also just un-delete the overload that takes a const char*, then
> the duplication would only ever happen when you explicitly use a StringRef.
> >
> > I don't agree this should be reverted.  In the process of doing this
> conversion I eliminated numerous careless string copies.
> >
> > On Wed, Sep 21, 2016 at 4:38 PM Greg Clayton  wrote:
> > This it the perfect example of why not to use a StringRef since the
> string needs to be null terminated. Why did we change this? Now even if you
> call this function:
> >
> > RegularExpression r(...);
> >
> > r.Execute("...", ...)
> >
> > You will need to duplicate the string on the heap just to execute this.
> Please revert this. Anything that requires null terminate is not a
> candidate for converting to StringRef.
> >
> >
> > > On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> > >
> > > Author: zturner
> > > Date: Wed Sep 21 12:13:51 2016
> > > New Revision: 282090
> > >
> > > URL: http://llvm.org/viewvc/llvm-project?rev=282090&view=rev
> > > Log:
> > > Fix failing regex tests.
> > >
> > > r282079 converted the regular expression interface to accept
> > > and return StringRefs instead of char pointers.  In one case
> > > a null pointer check was converted to an empty string check,
> > > but this was an incorrect conversion because an empty string
> > > is a valid regular expression.  Removing this check should
> > > fix the test failures.
> > >
> > > Modified:
> > >lldb/trunk/source/Core/RegularExpression.cpp
> > >
> > > Modified: lldb/trunk/source/Core/RegularExpression.cpp
> > > URL:
> http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff
> > >
> ==
> > > --- lldb/trunk/source/Core/RegularExpression.cpp (original)
> > > +++ lldb/trunk/source/Core/RegularExpression.cpp Wed Sep 21 12:13:51
> 2016
> > > @@ -102,7 +102,7 @@ bool RegularExpression::Compile(llvm::St
> > > //-
> > > bool RegularExpression::Execute(llvm::StringRef str, Match *match)
> const {
> > >   int err = 1;
> > > -  if (!str.empty() && m_comp_err == 0) {
> > > +  if (m_comp_err == 0) {
> > > // Argument to regexec must be null-terminated.
> > > std::string reg_str = str;
> > > if (match) {
> > >
> > >
> > > ___
> > > lldb-comm

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Jim Ingham via lldb-commits
IIRC you said the llvm one doesn't support extended regular expressions.  If 
that's true, please don't do this till that is fixed.  I think pretty much 
everybody who is using reg expressions expects to be able to use extended 
regular expressions.

Jim

> On Sep 21, 2016, at 4:58 PM, Zachary Turner via lldb-commits 
>  wrote:
> 
> Incidentally even the STL regex implementation supports specifying the end 
> pointer.  So I would call the system one deficient in this regard and 
> advocate for replacing it sooner rather than later.
> 
> On Wed, Sep 21, 2016 at 4:52 PM Zachary Turner  wrote:
> Looks like llvm's regex is better than LLDB's in this regard, since it 
> supports explicitly setting the end pointer.  I can see a couple options:
> 
> 1) Check if it's null terminated by peeking one past the end, and copying if 
> it's not.  This is pretty hackish, not crazy about this idea.
> 2) Un-delete the const char * version of the function but leave the StringRef 
> overload, find all places where I added the explicit conversion and remove 
> them so they invoke the const char* overload.
> 3) Change lldb::RegularExpression to just delegate to llvm under the hood and 
> set the end pointer.
> 
> On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner  wrote:
> Actually wait, it doesn't.  It just explicitly sets the end pointer.
> 
> On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner  wrote:
> Worth noting that llvm::Regex has this constructor:
> 
> 
> Regex::Regex(StringRef regex, unsigned Flags) {
>   unsigned flags = 0;
>   preg = new llvm_regex();
>   preg->re_endp = regex.end();
>   if (Flags & IgnoreCase) 
> flags |= REG_ICASE;
>   if (Flags & Newline)
> flags |= REG_NEWLINE;
>   if (!(Flags & BasicRegex))
> flags |= REG_EXTENDED;
>   error = llvm_regcomp(preg, regex.data(), flags|REG_PEND);
> }
> 
> So it assumes null termination even though you have a StringRef.
> 
> On Wed, Sep 21, 2016 at 4:43 PM Zachary Turner  wrote:
> You need to duplicate something on the heap once when you execute the regex.  
> And in turn you save tens or hundreds or copies on the way there because of 
> inefficient string usage.  
> 
> We could also just un-delete the overload that takes a const char*, then the 
> duplication would only ever happen when you explicitly use a StringRef.
> 
> I don't agree this should be reverted.  In the process of doing this 
> conversion I eliminated numerous careless string copies.
> 
> On Wed, Sep 21, 2016 at 4:38 PM Greg Clayton  wrote:
> This it the perfect example of why not to use a StringRef since the string 
> needs to be null terminated. Why did we change this? Now even if you call 
> this function:
> 
> RegularExpression r(...);
> 
> r.Execute("...", ...)
> 
> You will need to duplicate the string on the heap just to execute this. 
> Please revert this. Anything that requires null terminate is not a candidate 
> for converting to StringRef.
> 
> 
> > On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits 
> >  wrote:
> >
> > Author: zturner
> > Date: Wed Sep 21 12:13:51 2016
> > New Revision: 282090
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=282090&view=rev
> > Log:
> > Fix failing regex tests.
> >
> > r282079 converted the regular expression interface to accept
> > and return StringRefs instead of char pointers.  In one case
> > a null pointer check was converted to an empty string check,
> > but this was an incorrect conversion because an empty string
> > is a valid regular expression.  Removing this check should
> > fix the test failures.
> >
> > Modified:
> >lldb/trunk/source/Core/RegularExpression.cpp
> >
> > Modified: lldb/trunk/source/Core/RegularExpression.cpp
> > URL: 
> > http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff
> > ==
> > --- lldb/trunk/source/Core/RegularExpression.cpp (original)
> > +++ lldb/trunk/source/Core/RegularExpression.cpp Wed Sep 21 12:13:51 2016
> > @@ -102,7 +102,7 @@ bool RegularExpression::Compile(llvm::St
> > //-
> > bool RegularExpression::Execute(llvm::StringRef str, Match *match) const {
> >   int err = 1;
> > -  if (!str.empty() && m_comp_err == 0) {
> > +  if (m_comp_err == 0) {
> > // Argument to regexec must be null-terminated.
> > std::string reg_str = str;
> > if (match) {
> >
> >
> > ___
> > lldb-commits mailing list
> > lldb-commits@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
> 
> ___
> lldb-commits mailing list
> lldb-commits@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb

Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
Incidentally even the STL regex implementation supports specifying the end
pointer.  So I would call the system one deficient in this regard and
advocate for replacing it sooner rather than later.

On Wed, Sep 21, 2016 at 4:52 PM Zachary Turner  wrote:

> Looks like llvm's regex is better than LLDB's in this regard, since it
> supports explicitly setting the end pointer.  I can see a couple options:
>
> 1) Check if it's null terminated by peeking one past the end, and copying
> if it's not.  This is pretty hackish, not crazy about this idea.
> 2) Un-delete the const char * version of the function but leave the
> StringRef overload, find all places where I added the explicit conversion
> and remove them so they invoke the const char* overload.
> 3) Change lldb::RegularExpression to just delegate to llvm under the hood
> and set the end pointer.
>
> On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner  wrote:
>
> Actually wait, it doesn't.  It just explicitly sets the end pointer.
>
> On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner  wrote:
>
> Worth noting that llvm::Regex has this constructor:
>
>
> Regex::Regex(StringRef regex, unsigned Flags) {
>   unsigned flags = 0;
>   preg = new llvm_regex();
>   preg->re_endp = regex.end();
>   if (Flags & IgnoreCase)
> flags |= REG_ICASE;
>   if (Flags & Newline)
> flags |= REG_NEWLINE;
>   if (!(Flags & BasicRegex))
> flags |= REG_EXTENDED;
>   error = llvm_regcomp(preg, regex.data(), flags|REG_PEND);
> }
>
> So it assumes null termination even though you have a StringRef.
>
> On Wed, Sep 21, 2016 at 4:43 PM Zachary Turner  wrote:
>
> You need to duplicate something on the heap once when you execute the
> regex.  And in turn you save tens or hundreds or copies on the way there
> because of inefficient string usage.
>
> We could also just un-delete the overload that takes a const char*, then
> the duplication would only ever happen when you explicitly use a StringRef.
>
> I don't agree this should be reverted.  In the process of doing this
> conversion I eliminated numerous careless string copies.
>
> On Wed, Sep 21, 2016 at 4:38 PM Greg Clayton  wrote:
>
> This it the perfect example of why not to use a StringRef since the string
> needs to be null terminated. Why did we change this? Now even if you call
> this function:
>
> RegularExpression r(...);
>
> r.Execute("...", ...)
>
> You will need to duplicate the string on the heap just to execute this.
> Please revert this. Anything that requires null terminate is not a
> candidate for converting to StringRef.
>
>
> > On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> >
> > Author: zturner
> > Date: Wed Sep 21 12:13:51 2016
> > New Revision: 282090
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=282090&view=rev
> > Log:
> > Fix failing regex tests.
> >
> > r282079 converted the regular expression interface to accept
> > and return StringRefs instead of char pointers.  In one case
> > a null pointer check was converted to an empty string check,
> > but this was an incorrect conversion because an empty string
> > is a valid regular expression.  Removing this check should
> > fix the test failures.
> >
> > Modified:
> >lldb/trunk/source/Core/RegularExpression.cpp
> >
> > Modified: lldb/trunk/source/Core/RegularExpression.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff
> >
> ==
> > --- lldb/trunk/source/Core/RegularExpression.cpp (original)
> > +++ lldb/trunk/source/Core/RegularExpression.cpp Wed Sep 21 12:13:51 2016
> > @@ -102,7 +102,7 @@ bool RegularExpression::Compile(llvm::St
> > //-
> > bool RegularExpression::Execute(llvm::StringRef str, Match *match) const
> {
> >   int err = 1;
> > -  if (!str.empty() && m_comp_err == 0) {
> > +  if (m_comp_err == 0) {
> > // Argument to regexec must be null-terminated.
> > std::string reg_str = str;
> > if (match) {
> >
> >
> > ___
> > lldb-commits mailing list
> > lldb-commits@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
>
>
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
Looks like llvm's regex is better than LLDB's in this regard, since it
supports explicitly setting the end pointer.  I can see a couple options:

1) Check if it's null terminated by peeking one past the end, and copying
if it's not.  This is pretty hackish, not crazy about this idea.
2) Un-delete the const char * version of the function but leave the
StringRef overload, find all places where I added the explicit conversion
and remove them so they invoke the const char* overload.
3) Change lldb::RegularExpression to just delegate to llvm under the hood
and set the end pointer.

On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner  wrote:

> Actually wait, it doesn't.  It just explicitly sets the end pointer.
>
> On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner  wrote:
>
> Worth noting that llvm::Regex has this constructor:
>
>
> Regex::Regex(StringRef regex, unsigned Flags) {
>   unsigned flags = 0;
>   preg = new llvm_regex();
>   preg->re_endp = regex.end();
>   if (Flags & IgnoreCase)
> flags |= REG_ICASE;
>   if (Flags & Newline)
> flags |= REG_NEWLINE;
>   if (!(Flags & BasicRegex))
> flags |= REG_EXTENDED;
>   error = llvm_regcomp(preg, regex.data(), flags|REG_PEND);
> }
>
> So it assumes null termination even though you have a StringRef.
>
> On Wed, Sep 21, 2016 at 4:43 PM Zachary Turner  wrote:
>
> You need to duplicate something on the heap once when you execute the
> regex.  And in turn you save tens or hundreds or copies on the way there
> because of inefficient string usage.
>
> We could also just un-delete the overload that takes a const char*, then
> the duplication would only ever happen when you explicitly use a StringRef.
>
> I don't agree this should be reverted.  In the process of doing this
> conversion I eliminated numerous careless string copies.
>
> On Wed, Sep 21, 2016 at 4:38 PM Greg Clayton  wrote:
>
> This it the perfect example of why not to use a StringRef since the string
> needs to be null terminated. Why did we change this? Now even if you call
> this function:
>
> RegularExpression r(...);
>
> r.Execute("...", ...)
>
> You will need to duplicate the string on the heap just to execute this.
> Please revert this. Anything that requires null terminate is not a
> candidate for converting to StringRef.
>
>
> > On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> >
> > Author: zturner
> > Date: Wed Sep 21 12:13:51 2016
> > New Revision: 282090
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=282090&view=rev
> > Log:
> > Fix failing regex tests.
> >
> > r282079 converted the regular expression interface to accept
> > and return StringRefs instead of char pointers.  In one case
> > a null pointer check was converted to an empty string check,
> > but this was an incorrect conversion because an empty string
> > is a valid regular expression.  Removing this check should
> > fix the test failures.
> >
> > Modified:
> >lldb/trunk/source/Core/RegularExpression.cpp
> >
> > Modified: lldb/trunk/source/Core/RegularExpression.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff
> >
> ==
> > --- lldb/trunk/source/Core/RegularExpression.cpp (original)
> > +++ lldb/trunk/source/Core/RegularExpression.cpp Wed Sep 21 12:13:51 2016
> > @@ -102,7 +102,7 @@ bool RegularExpression::Compile(llvm::St
> > //-
> > bool RegularExpression::Execute(llvm::StringRef str, Match *match) const
> {
> >   int err = 1;
> > -  if (!str.empty() && m_comp_err == 0) {
> > +  if (m_comp_err == 0) {
> > // Argument to regexec must be null-terminated.
> > std::string reg_str = str;
> > if (match) {
> >
> >
> > ___
> > lldb-commits mailing list
> > lldb-commits@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
>
>
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
Actually wait, it doesn't.  It just explicitly sets the end pointer.

On Wed, Sep 21, 2016 at 4:44 PM Zachary Turner  wrote:

> Worth noting that llvm::Regex has this constructor:
>
>
> Regex::Regex(StringRef regex, unsigned Flags) {
>   unsigned flags = 0;
>   preg = new llvm_regex();
>   preg->re_endp = regex.end();
>   if (Flags & IgnoreCase)
> flags |= REG_ICASE;
>   if (Flags & Newline)
> flags |= REG_NEWLINE;
>   if (!(Flags & BasicRegex))
> flags |= REG_EXTENDED;
>   error = llvm_regcomp(preg, regex.data(), flags|REG_PEND);
> }
>
> So it assumes null termination even though you have a StringRef.
>
> On Wed, Sep 21, 2016 at 4:43 PM Zachary Turner  wrote:
>
> You need to duplicate something on the heap once when you execute the
> regex.  And in turn you save tens or hundreds or copies on the way there
> because of inefficient string usage.
>
> We could also just un-delete the overload that takes a const char*, then
> the duplication would only ever happen when you explicitly use a StringRef.
>
> I don't agree this should be reverted.  In the process of doing this
> conversion I eliminated numerous careless string copies.
>
> On Wed, Sep 21, 2016 at 4:38 PM Greg Clayton  wrote:
>
> This it the perfect example of why not to use a StringRef since the string
> needs to be null terminated. Why did we change this? Now even if you call
> this function:
>
> RegularExpression r(...);
>
> r.Execute("...", ...)
>
> You will need to duplicate the string on the heap just to execute this.
> Please revert this. Anything that requires null terminate is not a
> candidate for converting to StringRef.
>
>
> > On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> >
> > Author: zturner
> > Date: Wed Sep 21 12:13:51 2016
> > New Revision: 282090
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=282090&view=rev
> > Log:
> > Fix failing regex tests.
> >
> > r282079 converted the regular expression interface to accept
> > and return StringRefs instead of char pointers.  In one case
> > a null pointer check was converted to an empty string check,
> > but this was an incorrect conversion because an empty string
> > is a valid regular expression.  Removing this check should
> > fix the test failures.
> >
> > Modified:
> >lldb/trunk/source/Core/RegularExpression.cpp
> >
> > Modified: lldb/trunk/source/Core/RegularExpression.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff
> >
> ==
> > --- lldb/trunk/source/Core/RegularExpression.cpp (original)
> > +++ lldb/trunk/source/Core/RegularExpression.cpp Wed Sep 21 12:13:51 2016
> > @@ -102,7 +102,7 @@ bool RegularExpression::Compile(llvm::St
> > //-
> > bool RegularExpression::Execute(llvm::StringRef str, Match *match) const
> {
> >   int err = 1;
> > -  if (!str.empty() && m_comp_err == 0) {
> > +  if (m_comp_err == 0) {
> > // Argument to regexec must be null-terminated.
> > std::string reg_str = str;
> > if (match) {
> >
> >
> > ___
> > lldb-commits mailing list
> > lldb-commits@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
>
>
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
Worth noting that llvm::Regex has this constructor:


Regex::Regex(StringRef regex, unsigned Flags) {
  unsigned flags = 0;
  preg = new llvm_regex();
  preg->re_endp = regex.end();
  if (Flags & IgnoreCase)
flags |= REG_ICASE;
  if (Flags & Newline)
flags |= REG_NEWLINE;
  if (!(Flags & BasicRegex))
flags |= REG_EXTENDED;
  error = llvm_regcomp(preg, regex.data(), flags|REG_PEND);
}

So it assumes null termination even though you have a StringRef.

On Wed, Sep 21, 2016 at 4:43 PM Zachary Turner  wrote:

> You need to duplicate something on the heap once when you execute the
> regex.  And in turn you save tens or hundreds or copies on the way there
> because of inefficient string usage.
>
> We could also just un-delete the overload that takes a const char*, then
> the duplication would only ever happen when you explicitly use a StringRef.
>
> I don't agree this should be reverted.  In the process of doing this
> conversion I eliminated numerous careless string copies.
>
> On Wed, Sep 21, 2016 at 4:38 PM Greg Clayton  wrote:
>
> This it the perfect example of why not to use a StringRef since the string
> needs to be null terminated. Why did we change this? Now even if you call
> this function:
>
> RegularExpression r(...);
>
> r.Execute("...", ...)
>
> You will need to duplicate the string on the heap just to execute this.
> Please revert this. Anything that requires null terminate is not a
> candidate for converting to StringRef.
>
>
> > On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> >
> > Author: zturner
> > Date: Wed Sep 21 12:13:51 2016
> > New Revision: 282090
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=282090&view=rev
> > Log:
> > Fix failing regex tests.
> >
> > r282079 converted the regular expression interface to accept
> > and return StringRefs instead of char pointers.  In one case
> > a null pointer check was converted to an empty string check,
> > but this was an incorrect conversion because an empty string
> > is a valid regular expression.  Removing this check should
> > fix the test failures.
> >
> > Modified:
> >lldb/trunk/source/Core/RegularExpression.cpp
> >
> > Modified: lldb/trunk/source/Core/RegularExpression.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff
> >
> ==
> > --- lldb/trunk/source/Core/RegularExpression.cpp (original)
> > +++ lldb/trunk/source/Core/RegularExpression.cpp Wed Sep 21 12:13:51 2016
> > @@ -102,7 +102,7 @@ bool RegularExpression::Compile(llvm::St
> > //-
> > bool RegularExpression::Execute(llvm::StringRef str, Match *match) const
> {
> >   int err = 1;
> > -  if (!str.empty() && m_comp_err == 0) {
> > +  if (m_comp_err == 0) {
> > // Argument to regexec must be null-terminated.
> > std::string reg_str = str;
> > if (match) {
> >
> >
> > ___
> > lldb-commits mailing list
> > lldb-commits@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
>
>
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Zachary Turner via lldb-commits
You need to duplicate something on the heap once when you execute the
regex.  And in turn you save tens or hundreds or copies on the way there
because of inefficient string usage.

We could also just un-delete the overload that takes a const char*, then
the duplication would only ever happen when you explicitly use a StringRef.

I don't agree this should be reverted.  In the process of doing this
conversion I eliminated numerous careless string copies.

On Wed, Sep 21, 2016 at 4:38 PM Greg Clayton  wrote:

> This it the perfect example of why not to use a StringRef since the string
> needs to be null terminated. Why did we change this? Now even if you call
> this function:
>
> RegularExpression r(...);
>
> r.Execute("...", ...)
>
> You will need to duplicate the string on the heap just to execute this.
> Please revert this. Anything that requires null terminate is not a
> candidate for converting to StringRef.
>
>
> > On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits <
> lldb-commits@lists.llvm.org> wrote:
> >
> > Author: zturner
> > Date: Wed Sep 21 12:13:51 2016
> > New Revision: 282090
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=282090&view=rev
> > Log:
> > Fix failing regex tests.
> >
> > r282079 converted the regular expression interface to accept
> > and return StringRefs instead of char pointers.  In one case
> > a null pointer check was converted to an empty string check,
> > but this was an incorrect conversion because an empty string
> > is a valid regular expression.  Removing this check should
> > fix the test failures.
> >
> > Modified:
> >lldb/trunk/source/Core/RegularExpression.cpp
> >
> > Modified: lldb/trunk/source/Core/RegularExpression.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff
> >
> ==
> > --- lldb/trunk/source/Core/RegularExpression.cpp (original)
> > +++ lldb/trunk/source/Core/RegularExpression.cpp Wed Sep 21 12:13:51 2016
> > @@ -102,7 +102,7 @@ bool RegularExpression::Compile(llvm::St
> > //-
> > bool RegularExpression::Execute(llvm::StringRef str, Match *match) const
> {
> >   int err = 1;
> > -  if (!str.empty() && m_comp_err == 0) {
> > +  if (m_comp_err == 0) {
> > // Argument to regexec must be null-terminated.
> > std::string reg_str = str;
> > if (match) {
> >
> >
> > ___
> > lldb-commits mailing list
> > lldb-commits@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
>
>
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


Re: [Lldb-commits] [lldb] r282090 - Fix failing regex tests.

2016-09-21 Thread Greg Clayton via lldb-commits
This it the perfect example of why not to use a StringRef since the string 
needs to be null terminated. Why did we change this? Now even if you call this 
function:

RegularExpression r(...);

r.Execute("...", ...)

You will need to duplicate the string on the heap just to execute this. Please 
revert this. Anything that requires null terminate is not a candidate for 
converting to StringRef.


> On Sep 21, 2016, at 10:13 AM, Zachary Turner via lldb-commits 
>  wrote:
> 
> Author: zturner
> Date: Wed Sep 21 12:13:51 2016
> New Revision: 282090
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=282090&view=rev
> Log:
> Fix failing regex tests.
> 
> r282079 converted the regular expression interface to accept
> and return StringRefs instead of char pointers.  In one case
> a null pointer check was converted to an empty string check,
> but this was an incorrect conversion because an empty string
> is a valid regular expression.  Removing this check should
> fix the test failures.
> 
> Modified:
>lldb/trunk/source/Core/RegularExpression.cpp
> 
> Modified: lldb/trunk/source/Core/RegularExpression.cpp
> URL: 
> http://llvm.org/viewvc/llvm-project/lldb/trunk/source/Core/RegularExpression.cpp?rev=282090&r1=282089&r2=282090&view=diff
> ==
> --- lldb/trunk/source/Core/RegularExpression.cpp (original)
> +++ lldb/trunk/source/Core/RegularExpression.cpp Wed Sep 21 12:13:51 2016
> @@ -102,7 +102,7 @@ bool RegularExpression::Compile(llvm::St
> //-
> bool RegularExpression::Execute(llvm::StringRef str, Match *match) const {
>   int err = 1;
> -  if (!str.empty() && m_comp_err == 0) {
> +  if (m_comp_err == 0) {
> // Argument to regexec must be null-terminated.
> std::string reg_str = str;
> if (match) {
> 
> 
> ___
> lldb-commits mailing list
> lldb-commits@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

___
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits