Re: Python3.0 has more duplication in source code than Python2.5
I have made the same analysis to some commercial source code, the dup60 rate is quite often significantly larger than 15%. commercial code sucks often .. that's why they hide it :) -- дамјан ( http://softver.org.mk/damjan/ ) Scarlett Johansson: You always see the glass half-empty. Woody Allen: No. I see the glass half-full, but of poison. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
I don't think code duplication rate has strong relationship towards code quality. On Sun, Feb 8, 2009 at 9:12 AM, Terry terry.yin...@gmail.com wrote: On 2月8日, 上午8时51分, Terry terry.yin...@gmail.com wrote: On 2月8日, 上午12时20分, Benjamin Peterson benja...@python.org wrote: Terry terry.yinzhe at gmail.com writes: On 2月7日, 下午7时10分, Diez B. Roggisch de...@nospam.web.de wrote: Do you by any chance have a few examples of these? There is a lot of idiomatic code in python to e.g. acquire and release the GIL or doing refcount-stuff. If that happens to be done with rather generic names as arguments, I can well imagine that as being the cause. Starting at line 5119 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c This isn't really fair because Python-ast.c is auto generated. ;) Oops! I don't know that! Then the analysis will not be valid, since too many duplications are from there. Hey! I have to say sorry because I found I made a mistake. Because Python- ast.c is auto-generated and shouldn't be counted here, the right duplication rate of Python3.0 is very small (5%). And I found the duplications are quite trivial, I wound not say that all of them are acceptable, but certainly not a strong enough evident for code quality. I have made the same analysis to some commercial source code, the dup60 rate is quite often significantly larger than 15%. -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
On Sun, Feb 8, 2009 at 9:12 AM, Terry terry.yin...@gmail.com wrote: I have made the same analysis to some commercial source code, the dup60 rate is quite often significantly larger than 15%. En Sun, 08 Feb 2009 07:10:12 -0200, Henry Read henry...@gmail.com escribió: I don't think code duplication rate has strong relationship towards code quality. Not directly; but large chunks of identical code repeated in many places aren't a good sign. I'd question myself if all of them are equally tested? What if someone fixes a bug - will the change be propagated everywhere? Should the code be refactored? -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
On 2月7日, 下午3时36分, Martin v. Löwis mar...@v.loewis.de wrote: Does that say something about the code quality of Python3.0? Not necessarily. IIUC, copying a single file with 2000 lines completely could already account for that increase. It would be interesting to see what specific files have gained large numbers of additional files, compared to 2.5. Regards, Martin But the duplication are always not very big, from about 100 lines (rare) to less the 5 lines. As you can see the Rate30 is much bigger than Rate60, that means there are a lot of small duplications. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
Terry schrieb: On 2月7日, 下午3时36分, Martin v. Löwis mar...@v.loewis.de wrote: Does that say something about the code quality of Python3.0? Not necessarily. IIUC, copying a single file with 2000 lines completely could already account for that increase. It would be interesting to see what specific files have gained large numbers of additional files, compared to 2.5. Regards, Martin But the duplication are always not very big, from about 100 lines (rare) to less the 5 lines. As you can see the Rate30 is much bigger than Rate60, that means there are a lot of small duplications. Do you by any chance have a few examples of these? There is a lot of idiomatic code in python to e.g. acquire and release the GIL or doing refcount-stuff. If that happens to be done with rather generic names as arguments, I can well imagine that as being the cause. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
On 2月7日, 下午7时10分, Diez B. Roggisch de...@nospam.web.de wrote: Terry schrieb: On 2月7日, 下午3时36分, Martin v. Löwis mar...@v.loewis.de wrote: Does that say something about the code quality of Python3.0? Not necessarily. IIUC, copying a single file with 2000 lines completely could already account for that increase. It would be interesting to see what specific files have gained large numbers of additional files, compared to 2.5. Regards, Martin But the duplication are always not very big, from about 100 lines (rare) to less the 5 lines. As you can see the Rate30 is much bigger than Rate60, that means there are a lot of small duplications. Do you by any chance have a few examples of these? There is a lot of idiomatic code in python to e.g. acquire and release the GIL or doing refcount-stuff. If that happens to be done with rather generic names as arguments, I can well imagine that as being the cause. Diez Example 1: Found a 64 line (153 tokens) duplication in the following files: Starting at line 73 of D:\DOWNLOADS\Python-3.0\Python\thread_pth.h Starting at line 222 of D:\DOWNLOADS\Python-3.0\Python \thread_pthread.h return (long) threadid; #else return (long) *(long *) threadid; #endif } static void do_PyThread_exit_thread(int no_cleanup) { dprintf((PyThread_exit_thread called\n)); if (!initialized) { if (no_cleanup) _exit(0); else exit(0); } } void PyThread_exit_thread(void) { do_PyThread_exit_thread(0); } void PyThread__exit_thread(void) { do_PyThread_exit_thread(1); } #ifndef NO_EXIT_PROG static void do_PyThread_exit_prog(int status, int no_cleanup) { dprintf((PyThread_exit_prog(%d) called\n, status)); if (!initialized) if (no_cleanup) _exit(status); else exit(status); } void PyThread_exit_prog(int status) { do_PyThread_exit_prog(status, 0); } void PyThread__exit_prog(int status) { do_PyThread_exit_prog(status, 1); } #endif /* NO_EXIT_PROG */ #ifdef USE_SEMAPHORES /* * Lock support. */ PyThread_type_lock PyThread_allocate_lock(void) { -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
On 2月7日, 下午7时10分, Diez B. Roggisch de...@nospam.web.de wrote: Terry schrieb: On 2月7日, 下午3时36分, Martin v. Löwis mar...@v.loewis.de wrote: Does that say something about the code quality of Python3.0? Not necessarily. IIUC, copying a single file with 2000 lines completely could already account for that increase. It would be interesting to see what specific files have gained large numbers of additional files, compared to 2.5. Regards, Martin But the duplication are always not very big, from about 100 lines (rare) to less the 5 lines. As you can see the Rate30 is much bigger than Rate60, that means there are a lot of small duplications. Do you by any chance have a few examples of these? There is a lot of idiomatic code in python to e.g. acquire and release the GIL or doing refcount-stuff. If that happens to be done with rather generic names as arguments, I can well imagine that as being the cause. Diez Example 2: Found a 16 line (106 tokens) duplication in the following files: Starting at line 4970 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c Starting at line 5015 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c Starting at line 5073 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c Starting at line 5119 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c PyErr_Format(PyExc_TypeError, GeneratorExp field \generators\ must be a list, not a %.200s, tmp- ob_type-tp_name); goto failed; } len = PyList_GET_SIZE(tmp); generators = asdl_seq_new(len, arena); if (generators == NULL) goto failed; for (i = 0; i len; i++) { comprehension_ty value; res = obj2ast_comprehension (PyList_GET_ITEM(tmp, i), value, arena); if (res != 0) goto failed; asdl_seq_SET(generators, i, value); } Py_XDECREF(tmp); tmp = NULL; } else { PyErr_SetString(PyExc_TypeError, required field \generators\ missing from GeneratorExp); -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
On 2月7日, 下午7时10分, Diez B. Roggisch de...@nospam.web.de wrote: Terry schrieb: On 2月7日, 下午3时36分, Martin v. Löwis mar...@v.loewis.de wrote: Does that say something about the code quality of Python3.0? Not necessarily. IIUC, copying a single file with 2000 lines completely could already account for that increase. It would be interesting to see what specific files have gained large numbers of additional files, compared to 2.5. Regards, Martin But the duplication are always not very big, from about 100 lines (rare) to less the 5 lines. As you can see the Rate30 is much bigger than Rate60, that means there are a lot of small duplications. Do you by any chance have a few examples of these? There is a lot of idiomatic code in python to e.g. acquire and release the GIL or doing refcount-stuff. If that happens to be done with rather generic names as arguments, I can well imagine that as being the cause. Diez Example of a small one (61 token duplicated): Found a 19 line (61 tokens) duplication in the following files: Starting at line 132 of D:\DOWNLOADS\Python-3.0\Python\modsupport.c Starting at line 179 of D:\DOWNLOADS\Python-3.0\Python\modsupport.c PyTuple_SET_ITEM(v, i, w); } if (itemfailed) { /* do_mkvalue() should have already set an error */ Py_DECREF(v); return NULL; } if (**p_format != endchar) { Py_DECREF(v); PyErr_SetString(PyExc_SystemError, Unmatched paren in format); return NULL; } if (endchar) ++*p_format; return v; } static PyObject * -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
On 2月7日, 下午7时10分, Diez B. Roggisch de...@nospam.web.de wrote: Terry schrieb: On 2月7日, 下午3时36分, Martin v. Löwis mar...@v.loewis.de wrote: Does that say something about the code quality of Python3.0? Not necessarily. IIUC, copying a single file with 2000 lines completely could already account for that increase. It would be interesting to see what specific files have gained large numbers of additional files, compared to 2.5. Regards, Martin But the duplication are always not very big, from about 100 lines (rare) to less the 5 lines. As you can see the Rate30 is much bigger than Rate60, that means there are a lot of small duplications. Do you by any chance have a few examples of these? There is a lot of idiomatic code in python to e.g. acquire and release the GIL or doing refcount-stuff. If that happens to be done with rather generic names as arguments, I can well imagine that as being the cause. Diez Example of a even small one (30 token duplicated): Found a 11 line (30 tokens) duplication in the following files: Starting at line 2551 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c Starting at line 3173 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c if (PyObject_SetAttrString(result, ifs, value) == -1) goto failed; Py_DECREF(value); return result; failed: Py_XDECREF(value); Py_XDECREF(result); return NULL; } PyObject* -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
On 2月7日, 下午7时10分, Diez B. Roggisch de...@nospam.web.de wrote: Terry schrieb: On 2月7日, 下午3时36分, Martin v. Löwis mar...@v.loewis.de wrote: Does that say something about the code quality of Python3.0? Not necessarily. IIUC, copying a single file with 2000 lines completely could already account for that increase. It would be interesting to see what specific files have gained large numbers of additional files, compared to 2.5. Regards, Martin But the duplication are always not very big, from about 100 lines (rare) to less the 5 lines. As you can see the Rate30 is much bigger than Rate60, that means there are a lot of small duplications. Do you by any chance have a few examples of these? There is a lot of idiomatic code in python to e.g. acquire and release the GIL or doing refcount-stuff. If that happens to be done with rather generic names as arguments, I can well imagine that as being the cause. Diez And I'm not saying that you can not have duplication in code. But it seems that the stable successful software releases tend to have relatively stable duplication rate. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
Terry terry.yinzhe at gmail.com writes: On 2月7日, 下午7时10分, Diez B. Roggisch de...@nospam.web.de wrote: Do you by any chance have a few examples of these? There is a lot of idiomatic code in python to e.g. acquire and release the GIL or doing refcount-stuff. If that happens to be done with rather generic names as arguments, I can well imagine that as being the cause. Starting at line 5119 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c This isn't really fair because Python-ast.c is auto generated. ;) -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
Terry wrote: ... I'm not saying that you can not have duplication in code. But it seems that the stable successful software releases tend to have relatively stable duplication rate. This analysis overlooks the fact that 3.0 _was_ a major change, and is likely to grow cut-and-paste solutions to some problems as we switch to Unicode strings from byte strings. You are comparing a .0 version to .5 versions. I expect the polishing that follows as we go up through .1, .2, and so on will lower those redundancy measures. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
And I'm not saying that you can not have duplication in code. But it seems that the stable successful software releases tend to have relatively stable duplication rate. So if some software has an instable duplication rate, it probably means that it is either not stable, or not successful. In the case of Python 3.0, it's fairly obvious which one it is: it's not stable. Indeed, Python 3.0 is a significant change from Python 2.x. Of course, anybody following the Python 3 development process could have told you see even without any code metrics. I still find the raw numbers fairly useless. What matters more to me is what specific code duplications have been added. Furthermore, your Dup30 classification is not important to me, but I'm rather after the nearly 2000 new chunks of code that has more than 60 subsequent tokens duplicated. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
But the duplication are always not very big, from about 100 lines (rare) to less the 5 lines. As you can see the Rate30 is much bigger than Rate60, that means there are a lot of small duplications. I don't find that important for code quality. It's the large chunks that I would like to see de-duplicated (unless, of course, they are in generated code, in which case I couldn't care less). Unfortunately, none of the examples you have posted so far are - large chunks, and - new in 3.0. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
-On [20090207 18:25], Scott David Daniels (scott.dani...@acm.org) wrote: This analysis overlooks the fact that 3.0 _was_ a major change, and is likely to grow cut-and-paste solutions to some problems as we switch to Unicode strings from byte strings. You'd best hope the copied section was thoroughly reviewed otherwise you're duplicating a flaw across X other sections. And then you also best hope that whoever finds said flaw and fixes it is also smart enough to check for similar constructs around the code base. -- Jeroen Ruigrok van der Werven asmodai(-at-)in-nomine.org / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Earth to earth, ashes to ashes, dust to dust... -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
Jeroen Ruigrok van der Werven wrote: -On [20090207 18:25], Scott David Daniels (scott.dani...@acm.org) wrote: This analysis overlooks the fact that 3.0 _was_ a major change, and is likely to grow cut-and-paste solutions to some problems as we switch to Unicode strings from byte strings. You'd best hope the copied section was thoroughly reviewed otherwise you're duplicating a flaw across X other sections. And then you also best hope that whoever finds said flaw and fixes it is also smart enough to check for similar constructs around the code base. This is probably preferable to five different developers solving the same problem five different ways and introducing three *different* bugs, no? regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
This is probably preferable to five different developers solving the same problem five different ways and introducing three *different* bugs, no? With the examples presented, I'm not convinced that there is actually significant code duplication going on in the first place. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
-On [20090207 21:07], Steve Holden (st...@holdenweb.com) wrote: This is probably preferable to five different developers solving the same problem five different ways and introducing three *different* bugs, no? I guess the answer would be 'that depends', but in most cases you would be correct, yes. -- Jeroen Ruigrok van der Werven asmodai(-at-)in-nomine.org / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Earth to earth, ashes to ashes, dust to dust... -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
Steve Holden wrote: You'd best hope the copied section was thoroughly reviewed otherwise you're duplicating a flaw across X other sections. And then you also best hope that whoever finds said flaw and fixes it is also smart enough to check for similar constructs around the code base. This is probably preferable to five different developers solving the same problem five different ways and introducing three *different* bugs, no? someone posted some numbers that suggested that more code than normal was copied in python 3.0. that seems reasonable, as others have said, because it's a new major release. but as far as i know, this is the first time it's been raised. so it seems like a useful piece of information that might help improve python in some way. which should be welcomed. yet the general tone of the responses has been more defensive than i would have expected. i don't really understand why. nothing really terrible, given the extremes you get on the net in general, but still a little disappointing. the email quoted above is a typical example. as i said - nothing terrible, just a misleading false dichotomy. yes, five people solving it five different ways would be worse, but that doesn't mean there isn't some better solution. surely it would be preferable if there was one way, that didn't involve copying code, that everyone could use? i'm not saying there is such a solution. i'm not even saying that there is certainly a problem. i'm just making the quiet observation that the original information is interesting, might be useful, and should be welcomed. andrew -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
yet the general tone of the responses has been more defensive than i would have expected. i don't really understand why. nothing really terrible, given the extremes you get on the net in general, but still a little disappointing. I think this is fairly easy to explain. The OP closes with the question Does that say something about the code quality of Python3.0? thus suggesting that the quality of Python 3 is poor. Nobody likes to hear that the quality of his work is poor. He then goes on saying But it seems that the stable successful software releases tend to have relatively stable duplication rate. suggesting that Python 3.0 cannot be successful, because it doesn't have a relatively stable duplication rate. Nobody likes to hear that a project one has put many month into cannot be successful. Hence the defensive responses. i'm not saying there is such a solution. i'm not even saying that there is certainly a problem. i'm just making the quiet observation that the original information is interesting, might be useful, and should be welcomed. The information is interesting. I question whether it is useful as-is, as it doesn't tell me *what* code got duplicated (and it seems it is also incorrect, since it includes analysis of generated code). While I can welcome the information, I cannot welcome the conclusion that the OP apparently draws from them. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
On 2月8日, 上午12时20分, Benjamin Peterson benja...@python.org wrote: Terry terry.yinzhe at gmail.com writes: On 2月7日, 下午7时10分, Diez B. Roggisch de...@nospam.web.de wrote: Do you by any chance have a few examples of these? There is a lot of idiomatic code in python to e.g. acquire and release the GIL or doing refcount-stuff. If that happens to be done with rather generic names as arguments, I can well imagine that as being the cause. Starting at line 5119 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c This isn't really fair because Python-ast.c is auto generated. ;) Oops! I don't know that! Then the analysis will not be valid, since too many duplications are from there. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
On 2月8日, 上午8时51分, Terry terry.yin...@gmail.com wrote: On 2月8日, 上午12时20分, Benjamin Peterson benja...@python.org wrote: Terry terry.yinzhe at gmail.com writes: On 2月7日, 下午7时10分, Diez B. Roggisch de...@nospam.web.de wrote: Do you by any chance have a few examples of these? There is a lot of idiomatic code in python to e.g. acquire and release the GIL or doing refcount-stuff. If that happens to be done with rather generic names as arguments, I can well imagine that as being the cause. Starting at line 5119 of D:\DOWNLOADS\Python-3.0\Python\Python-ast.c This isn't really fair because Python-ast.c is auto generated. ;) Oops! I don't know that! Then the analysis will not be valid, since too many duplications are from there. Hey! I have to say sorry because I found I made a mistake. Because Python- ast.c is auto-generated and shouldn't be counted here, the right duplication rate of Python3.0 is very small (5%). And I found the duplications are quite trivial, I wound not say that all of them are acceptable, but certainly not a strong enough evident for code quality. I have made the same analysis to some commercial source code, the dup60 rate is quite often significantly larger than 15%. -- http://mail.python.org/mailman/listinfo/python-list
Python3.0 has more duplication in source code than Python2.5
I used a CPD (copy/paste detector) in PMD to analyze the code duplication in Python source code. I found that Python3.0 contains more duplicated code than the previous versions. The CPD tool is far from perfect, but I still feel the analysis makes some sense. |Source Code | NLOC | Dup60 | Dup30 | Rate60| Rate 30 | Python1.5(Core) 19418 10723023 6% 16% Python2.5(Core) 35797 16566441 5% 18% Python3.0(Core) 40737 34609076 8% 22% Apache(server) 18693 11142553 6% 14% NLOC: The net lines of code Dup60: Lines of code that has 60 continuous tokens duplicated to other code (counted twice or more) Dup30: 30 tokens duplicated Rate60: Dup60/NLOC Rate30: Dup30/NLOC We can see that the common duplicated rate is tended to be stable. But Python3.0 is slightly bigger than that. Consider the small increase in NLOC, the duplication rate of Python3.0 might be too big. Does that say something about the code quality of Python3.0? -- http://mail.python.org/mailman/listinfo/python-list
Re: Python3.0 has more duplication in source code than Python2.5
Does that say something about the code quality of Python3.0? Not necessarily. IIUC, copying a single file with 2000 lines completely could already account for that increase. It would be interesting to see what specific files have gained large numbers of additional files, compared to 2.5. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list