Dino - do you want me to register a workitem? Also, if I wanted to patch v1.1.1, where should I start?
On Thu, Feb 21, 2008 at 8:48 PM, Dino Viehland <[EMAIL PROTECTED]> wrote: > After Birsh's last comment that they're calling compile I was thinking we > might be doing something really stupid. Turns out that we are. For our > supposedly compiled regex's we're not passing the Compiled option. So > they're all getting interpretted and of course that's going to suck big > time. > > So the fix for this is probably to just pass RegexOptions.Compiled to the > Regex object being contructed into RE_Pattern. Internally the Regex class > will compile to dynamic methods (much better than compiling to an assembly > because it'll be collectible). > > It's actually a little more complex than that because we create RE_Pattern > objects for the non-compiled so we need to flow the option in, but it's > still a simple fix. > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto: > [EMAIL PROTECTED] On Behalf Of Dan Shechter > Sent: Thursday, February 21, 2008 10:19 AM > To: Discussion of IronPython > Subject: Re: [IronPython] Slow Performance of CPython libs? > > Just out of curiosity... > I'm guessing that there are two levels of caching that could be used. > One would be to hold a global dictionary of some sort of pattern -> > Regex that will save the lengthy parsing of the re. > > The other would be to actually call Regex.CompileToAssembly to get a > more efficient reperesentation (in runtime that is) of the re. > > Would you (the IP team) favor the former or the latter? > > Shechter. > > > Birsch wrote: > > I checked both sgmllib.py and BeautifulSoup.py - and it seems both are > > reusing the same regexps (searched for re.compile). > > > > I think your suggestion is very relevant in this case. It makes sense to > > replicate the "compile once use many" behavior that is commonly used > > with regexp. > > > > -Birsch > > > > On Thu, Feb 21, 2008 at 7:30 PM, Dino Viehland > > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > wrote: > > > > Do you know if the same reg ex is being used repeatedly? If so > > maybe we can cache & compile the regex instead of interpretting it > > all the time. > > > > > > > > *From:* [EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]> > > [mailto:[EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]>] *On Behalf Of *Birsch > > *Sent:* Thursday, February 21, 2008 8:30 AM > > > > *To:* Discussion of IronPython > > *Subject:* Re: [IronPython] Slow Performance of CPython libs? > > > > > > > > I took on Cooper's advice and profiled with ANTS. Here are the top > > methods: > > > > *Namespace* > > > > > > > > *Method name* > > > > > > > > *Time (sec.)* > > > > > > > > *Time with children (sec.)* > > > > > > > > *Hit count* > > > > > > > > *Source file* > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Go() > > > > > > > > 37.0189 > > > > > > > > 94.4676 > > > > > > > > 13689612 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Operator() > > > > > > > > 6.2411 > > > > > > > > 6.2411 > > > > > > > > 131146274 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Advance(int i) > > > > > > > > 5.9264 > > > > > > > > 8.7202 > > > > > > > > 66000263 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.SetOperator(int op) > > > > > > > > 5.5750 > > > > > > > > 5.5750 > > > > > > > > 131146274 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Backtrack() > > > > > > > > 5.5692 > > > > > > > > 9.4895 > > > > > > > > 37781343 > > > > > > > > IronPython.Runtime.Operations > > > > > > > > Ops.CallWithContext(ICallerContext context, object func, object > > arg0, object arg1) > > > > > > > > 5.5572 > > > > > > > > 114.5245 > > > > > > > > 79754 > > > > > > > > IronPython.Runtime.Calls > > > > > > > > Method.Call(ICallerContext context, object arg0) > > > > > > > > 4.9052 > > > > > > > > 114.8251 > > > > > > > > 50886 > > > > > > > > IronPython.Runtime.Calls > > > > > > > > PythonFunction.CallInstance(ICallerContext context, object arg0, > > object arg1) > > > > > > > > 4.8876 > > > > > > > > 114.8059 > > > > > > > > 50886 > > > > > > > > IronPython.Runtime.Calls > > > > > > > > Function2.Call(ICallerContext context, object arg0, object arg1) > > > > > > > > 4.6400 > > > > > > > > 114.5471 > > > > > > > > 47486 > > > > > > > > IronPython.Runtime.Operations > > > > > > > > Ops.CallWithContext(ICallerContext context, object func, object > arg0) > > > > > > > > 4.2344 > > > > > > > > 114.1604 > > > > > > > > 146658 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexBoyerMoore.Scan(string text, int index, int beglimit, int > endlimit) > > > > > > > > 3.6465 > > > > > > > > 3.6465 > > > > > > > > 13678131 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexCharClass.CharInClassRecursive(char ch, string set, int start) > > > > > > > > 3.6288 > > > > > > > > 5.7113 > > > > > > > > 31508162 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Goto(int newpos) > > > > > > > > 3.2058 > > > > > > > > 5.1470 > > > > > > > > 27364668 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Operand(int i) > > > > > > > > 3.1923 > > > > > > > > 3.1923 > > > > > > > > 73230687 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexRunner.EnsureStorage() > > > > > > > > 3.0803 > > > > > > > > 3.0803 > > > > > > > > 51474823 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexCharClass.CharInClass(char ch, string set) > > > > > > > > 3.0713 > > > > > > > > 8.7827 > > > > > > > > 31508162 > > > > > > > > IronPython.Runtime.Calls > > > > > > > > Method.Call(ICallerContext context, object arg0, object arg1) > > > > > > > > 2.9821 > > > > > > > > 7.8675 > > > > > > > > 15012 > > > > > > > > IronPython.Runtime.Calls > > > > > > > > PythonFunction.CallInstance(ICallerContext context, object arg0, > > object arg1, object arg2) > > > > > > > > 2.9794 > > > > > > > > 7.8639 > > > > > > > > 15012 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Forwardcharnext() > > > > > > > > 2.8852 > > > > > > > > 2.8852 > > > > > > > > 62865185 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Forwardchars() > > > > > > > > 2.8279 > > > > > > > > 2.8279 > > > > > > > > 59436277 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexCharClass.CharInClassInternal(char ch, string set, int start, > > int mySetLength, int myCategoryLength) > > > > > > > > 2.0632 > > > > > > > > 2.0826 > > > > > > > > 31508162 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexRunner.Scan(Regex regex, string text, int textbeg, int textend, > > int textstart, int prevlen, bool quick) > > > > > > > > 1.8376 > > > > > > > > 101.7226 > > > > > > > > 43009 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.FindFirstChar() > > > > > > > > 1.6405 > > > > > > > > 5.3456 > > > > > > > > 13701755 > > > > > > > > IronPython.Runtime.Types > > > > > > > > OldClass.TryLookupSlot(SymbolId name, out object ret) > > > > > > > > 1.5573 > > > > > > > > 2.8124 > > > > > > > > 389516 > > > > > > > > IronPython.Runtime.Operations > > > > > > > > Ops.GetAttr(ICallerContext context, object o, SymbolId name) > > > > > > > > 1.5365 > > > > > > > > 5.3456 > > > > > > > > 558524 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Textpos() > > > > > > > > 1.4020 > > > > > > > > 1.4020 > > > > > > > > 32648926 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Advance() > > > > > > > > 1.1916 > > > > > > > > 2.9526 > > > > > > > > 13703950 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Textto(int newpos) > > > > > > > > 1.1218 > > > > > > > > 1.1218 > > > > > > > > 24120890 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.TrackPeek() > > > > > > > > 1.0579 > > > > > > > > 1.0579 > > > > > > > > 24120894 > > > > > > > > System.Text.RegularExpressions > > > > > > > > Regex.Run(bool quick, int prevlen, string input, int beginning, int > > length, int startat) > > > > > > > > 0.7280 > > > > > > > > 102.4644 > > > > > > > > 43009 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.TrackPush(int I1) > > > > > > > > 0.6834 > > > > > > > > 0.6834 > > > > > > > > 13745149 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.StackPush(int I1) > > > > > > > > 0.6542 > > > > > > > > 0.6542 > > > > > > > > 13703955 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.TrackPop() > > > > > > > > 0.6068 > > > > > > > > 0.6068 > > > > > > > > 13663035 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.TrackPush() > > > > > > > > 0.6049 > > > > > > > > 0.6049 > > > > > > > > 13708230 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.StackPop() > > > > > > > > 0.5836 > > > > > > > > 0.5836 > > > > > > > > 13703956 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.Bump() > > > > > > > > 0.4987 > > > > > > > > 0.4987 > > > > > > > > 10472790 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.TrackPush(int I1, int I2) > > > > > > > > 0.4864 > > > > > > > > 0.4864 > > > > > > > > 10472790 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.TrackPeek(int i) > > > > > > > > 0.4663 > > > > > > > > 0.4663 > > > > > > > > 10457859 > > > > > > > > System.Text.RegularExpressions > > > > > > > > RegexInterpreter.TrackPop(int framesize) > > > > > > > > 0.4396 > > > > > > > > 0.4396 > > > > > > > > 10457859 > > > > > > > > > > Moving up the stack of regex.Go(), most calls originate from > > sgmllib's parse_starttag. > > > > HTH, > > -Birsch > > > > On Thu, Feb 21, 2008 at 2:35 PM, Birsch <[EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]>> wrote: > > > > Thanks Michael and Dino. > > > > I'll prof and send update. Got a good profiler recommendation for > .Net? > > Meanwhile I noticed the sample site below causes BeautifulSoup to > > generate quite a few [python] exceptions during __init__. Does > > IronPython handle exceptions significantly slower than CPtyhon? > > > > Repro code is simple (just build a BeautifulSoup obj with mininova's > > home page). > > Here are the .py and .cs I used to time the diffs: > > > > *bstest.py:* > > #Bypass CPython default socket implementation with IPCE/FePy > > import imp, os, sys > > sys.modules['socket'] = module = imp.new_module('socket') > > execfile('socket.py', module.__dict__) > > > > from BeautifulSoup import BeautifulSoup > > from urllib import urlopen > > import datetime > > > > def getContent(url): > > #Download html data > > startTime = datetime.datetime.now() > > print "Getting url", url > > html = urlopen(url).read() > > print "Time taken:", datetime.datetime.now() - startTime > > > > #Make soup > > startTime = datetime.datetime.now() > > print "Making soup..." > > soup = BeautifulSoup(markup=html) > > print "Time taken:", datetime.datetime.now() - startTime > > > > if __name__ == "__main__": > > print getContent("www.mininova.org <http://www.mininova.org>") > > > > > > *C#:* > > using System; > > using System.Collections.Generic; > > using System.Text; > > using IronPython.Hosting; > > > > namespace IronPythonBeautifulSoupTest > > { > > public class Program > > { > > public static void Main(string[] args) > > { > > //Init > > System.Console.WriteLine("Starting..."); > > DateTime start = DateTime.Now; > > PythonEngine engine = new PythonEngine(); > > > > //Add paths: > > //BeautifulSoup.py, socket.py, bstest.py located on exe > dir > > engine.AddToPath(@"."); > > //CPython Lib (replace with your own) > > engine.AddToPath(@"D:\Dev\Python\Lib"); > > > > //Import and load > > TimeSpan span = DateTime.Now - start; > > System.Console.WriteLine("[1] Import: " + > > span.TotalSeconds); > > DateTime d = DateTime.Now; > > engine.ExecuteFile(@"bstest.py"); > > span = DateTime.Now - d; > > System.Console.WriteLine("[2] Load: " + > span.TotalSeconds); > > > > //Execute > > d = DateTime.Now; > > > > engine.Execute("getContent(\"http://www.mininova.org\<http://www.mininova.org%5C> > ")"); > > span = DateTime.Now - d; > > System.Console.WriteLine("[3] Execute: " + > > span.TotalSeconds); > > span = DateTime.Now - start; > > System.Console.WriteLine("Total: " + span.TotalSeconds); > > > > > > } > > } > > } > > > > > > On Wed, Feb 20, 2008 at 6:57 PM, Dino Viehland > > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > > wrote: > > > > We've actually had this issue reported once before a long time ago - > > it's a very low CodePlex ID - > > http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=651 > > > > We haven't had a chance to investigate the end-to-end scenario. If > > someone could come up with a smaller simpler repro that'd be great. > > Otherwise we haven't forgotten about it we've just had more > > immediately pressing issues to work on :(. > > > > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]> > > [mailto:[EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]>] On Behalf Of Michael > Foord > > Sent: Wednesday, February 20, 2008 5:20 AM > > To: Discussion of IronPython > > Subject: Re: [IronPython] Slow Performance of CPython libs? > > > > Birsch wrote: > > > Hi - We've been using IronPython successfully to allow > extensibility > > > of our application. > > > > > > Overall we are happy with the performance, with the exception of > > > BeautifulSoup which seems to run very slowly: x5 or more time to > > > execute compared to CPython. > > > > > > Most of the time seems to be spent during __init__() of BS, where > the > > > markup is parsed. > > > > > > We suspect this has to do with the fact that our CPython env is > > > executing .pyc files and can precompile its libs, while the > > IronPython > > > environment compiles each iteration. We couldn't find a way to > > > pre-compile the libs and then introduce them into the code, but > > in any > > > case this will result in a large management overhead since the > amount > > > of CPython libs we expose to our users contains 100's of modules. > > > > > > Any ideas on how to optimize? > > > > I think it is worth doing real profiling to find out where the time > is > > being spent during parsing. > > > > If it is spending most of the time in '__init__' then the time is > > probably not spent in importing - so compilation isn't relevant and > it > > is a runtime performance issue. (Importing is much slower with > > IronPython and at Resolver Systems we do use precompiled binaries - > but > > strangely enough it doesn't provide much of a performance gain.) > > > > Michael > > http://www.manning.com/foord > > > > > > > > Thanks, > > > -Birsch > > > > > > Note: we're using FePy/IPCE libs with regular IP v1.1.1 runtime > DLLs > > > (this was done to overcome library incompatibilities and network > > > errors). However, the relevant slow .py code (mainly SGMLParser > and > > > BeautifulSoup) is the same. > > > > > > ------------------------------------------------------------------------ > > > > > > _______________________________________________ > > > Users mailing list > > > Users@lists.ironpython.com <mailto:Users@lists.ironpython.com> > > > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > > > > > > > _______________________________________________ > > Users mailing list > > Users@lists.ironpython.com <mailto:Users@lists.ironpython.com> > > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > > _______________________________________________ > > Users mailing list > > Users@lists.ironpython.com <mailto:Users@lists.ironpython.com> > > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > > > > > > > > > > > > > > _______________________________________________ > > Users mailing list > > Users@lists.ironpython.com <mailto:Users@lists.ironpython.com> > > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Users mailing list > > Users@lists.ironpython.com > > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > _______________________________________________ > Users mailing list > Users@lists.ironpython.com > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > _______________________________________________ > Users mailing list > Users@lists.ironpython.com > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com >
_______________________________________________ Users mailing list Users@lists.ironpython.com http://lists.ironpython.com/listinfo.cgi/users-ironpython.com