M.-A. Lemburg wrote: > I think it's worthwhile reconsidering this approach for > character type queries that do no involve a huge number > of code points.
I would advise against that. I measure both versions (your version called PyUnicode_IsLinebreak2) with the following code volatile int result; void unibench() { #define REPS 10000000000LL long long i; clock_t s1,s2,s3,s4,s5; s1 = clock(); for(i=0;i<REPS;i++) result = _PyUnicode_IsLinebreak('('); s2 = clock(); for(i=0;i<REPS;i++) result = PyUnicode_IsLinebreak2('('); s3 = clock(); for(i=0;i<REPS;i++) result = _PyUnicode_IsLinebreak('\n'); s4 = clock(); for(i=0;i<REPS;i++) result = PyUnicode_IsLinebreak2('\n'); s5 = clock(); printf("f1, (: %d\nf2, (: %d\nf1, CR: %d\n, f2, CR: %d\n", (int)(s2-s1),(int)(s3-s2),(int)(s4-s3),(int)(s5-s4)); } and got those numbers f1, (: 13210000 f2, (: 13300000 f1, CR: 13220000 , f2, CR: 13250000 What can be seen is that performance the two versions is nearly identical, with the code currently used being slightly better. What can also be seen is that, on my machine, 1e10 calls to IsLinebreak take 13.2 seconds. So 51 Mio calls take about 70ms. The reported performance problem is more likely in the allocation of all these splitlines results, and the copying of the same strings over and over again. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com