Re: duplicate class names in ASF Java projects

2014-03-21 Thread Ralph Goers
In the case of logging-log4j2 the package and class names are duplicated with 
log4j to provide a bridge so that code does not need to be rewritten to 
upgrade. However, if you look at the line counts you will see that they are not 
the same as the classes are very different.

Ralph


On Mar 20, 2014, at 9:49 PM, Pawel Slusarz p...@sw7d.com wrote:

 Greetings,
 
 When looking at the Apache SF Java projects as a group, I noticed that a
 large number of projects have duplicate class names, ie
 both openejb and tomee have a class named
 jug.client.command.api.AbstractCommand
 
 When edge cases, ie test.Foo and tomcat55, tomcat60, tomcat70 get
 eliminated, it still appears that the practice of code sharing by
 drag-drop-modify is quite prevalent. Over 14,000 (out of 165,000)
 classes were shared that way in the ecosystem, and 103 projects (out of
 300) are affected.
 
 Sometimes a measurement and visualization is all it takes to realize a
 problem and begin fixing it. Below is raw data that can help understand
 better what and how is happening:
 
 http://pslusarz.github.io/archeology3d/research/apache/conflicting-classes/index.html
 
 Hope this is the right place to engage in this sort of conversation.
 
 Paul Slusarz
 
 PS: Who am I and what's my agenda? I am interested in looking at large
 codebases in search of patterns. I picked Apache SF, because, unlike my
 company code, the data can be independently verified. The issue with
 conflicting class names became apparent as I was trying to identify and
 understand classes that are shared in this ecosystem. Some more
 background on this approach can be found on my blog:
 http://10kftcode.blogspot.com/
 
 -
 To unsubscribe, e-mail: community-unsubscr...@apache.org
 For additional commands, e-mail: community-h...@apache.org
 


-
To unsubscribe, e-mail: community-unsubscr...@apache.org
For additional commands, e-mail: community-h...@apache.org



Re: duplicate class names in ASF Java projects

2014-03-21 Thread Christopher
You may want to filter out small files, or common file name
conventions: e.g.
https://github.com/apache/accumulo/blob/trunk/maven-plugin/src/it/plugin-test/postbuild.groovy
and 
https://github.com/apache/maven-plugins/blob/trunk/maven-invoker-plugin/src/it/script-additional-vars/src/it/groovy/postbuild.groovy
are not the same, but probably were both built from the same example
template.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Fri, Mar 21, 2014 at 12:49 AM, Pawel Slusarz p...@sw7d.com wrote:
 Greetings,

 When looking at the Apache SF Java projects as a group, I noticed that a
 large number of projects have duplicate class names, ie
 both openejb and tomee have a class named
 jug.client.command.api.AbstractCommand

 When edge cases, ie test.Foo and tomcat55, tomcat60, tomcat70 get
 eliminated, it still appears that the practice of code sharing by
 drag-drop-modify is quite prevalent. Over 14,000 (out of 165,000)
 classes were shared that way in the ecosystem, and 103 projects (out of
 300) are affected.

 Sometimes a measurement and visualization is all it takes to realize a
 problem and begin fixing it. Below is raw data that can help understand
 better what and how is happening:

 http://pslusarz.github.io/archeology3d/research/apache/conflicting-classes/index.html

 Hope this is the right place to engage in this sort of conversation.

 Paul Slusarz

 PS: Who am I and what's my agenda? I am interested in looking at large
 codebases in search of patterns. I picked Apache SF, because, unlike my
 company code, the data can be independently verified. The issue with
 conflicting class names became apparent as I was trying to identify and
 understand classes that are shared in this ecosystem. Some more
 background on this approach can be found on my blog:
 http://10kftcode.blogspot.com/

 -
 To unsubscribe, e-mail: community-unsubscr...@apache.org
 For additional commands, e-mail: community-h...@apache.org


-
To unsubscribe, e-mail: community-unsubscr...@apache.org
For additional commands, e-mail: community-h...@apache.org



Re: duplicate class names in ASF Java projects

2014-03-21 Thread Pawel Slusarz
On 3/21/14, 8:58 AM, sebb wrote:
 Note that sanselan was renamed as commons imaging. However the package
 names were also changed so I'm not sure why they are shown as
 duplicates. sanselan: org.apache.sanselan imaging:
 org.apache.commons.imaging Perhaps the information has been derived
 from SVN rather than the published releases. In which case I suspect
 there are a lot of false positives. Not all SVN (or Git) source code
 is part of a release, and source code may go through various name
 changes. 

It looks like the rename was committed to sanselan in the source code
repo before the project was decomissioned. Glad the rename didn't make
it to a release jar. Thanks for the explanation.
Paul

-
To unsubscribe, e-mail: community-unsubscr...@apache.org
For additional commands, e-mail: community-h...@apache.org



Re: duplicate class names in ASF Java projects

2014-03-21 Thread Pawel Slusarz
Ralph,
Thanks for the explanation. Is there a strategy for projects that have
both brought in transitively? Hunting down class name conflicts in a
multi-layered dependency tree was one of the reasons that I got
interested in the subject, and I haven't found a satisfying solution to
it yet.
Paul

On 3/21/14, 10:24 AM, Ralph Goers wrote:
 In the case of logging-log4j2 the package and class names are duplicated with 
 log4j to provide a bridge so that code does not need to be rewritten to 
 upgrade. However, if you look at the line counts you will see that they are 
 not the same as the classes are very different.

 Ralph


-
To unsubscribe, e-mail: community-unsubscr...@apache.org
For additional commands, e-mail: community-h...@apache.org



duplicate class names in ASF Java projects

2014-03-20 Thread Pawel Slusarz
Greetings,

When looking at the Apache SF Java projects as a group, I noticed that a
large number of projects have duplicate class names, ie
both openejb and tomee have a class named
jug.client.command.api.AbstractCommand

When edge cases, ie test.Foo and tomcat55, tomcat60, tomcat70 get
eliminated, it still appears that the practice of code sharing by
drag-drop-modify is quite prevalent. Over 14,000 (out of 165,000)
classes were shared that way in the ecosystem, and 103 projects (out of
300) are affected.

Sometimes a measurement and visualization is all it takes to realize a
problem and begin fixing it. Below is raw data that can help understand
better what and how is happening:

http://pslusarz.github.io/archeology3d/research/apache/conflicting-classes/index.html

Hope this is the right place to engage in this sort of conversation.

Paul Slusarz

PS: Who am I and what's my agenda? I am interested in looking at large
codebases in search of patterns. I picked Apache SF, because, unlike my
company code, the data can be independently verified. The issue with
conflicting class names became apparent as I was trying to identify and
understand classes that are shared in this ecosystem. Some more
background on this approach can be found on my blog:
http://10kftcode.blogspot.com/

-
To unsubscribe, e-mail: community-unsubscr...@apache.org
For additional commands, e-mail: community-h...@apache.org