Re: duplicate class names in ASF Java projects

2014-03-21 Thread Pawel Slusarz
Ralph,
Thanks for the explanation. Is there a strategy for projects that have
both brought in transitively? Hunting down class name conflicts in a
multi-layered dependency tree was one of the reasons that I got
interested in the subject, and I haven't found a satisfying solution to
it yet.
Paul

On 3/21/14, 10:24 AM, Ralph Goers wrote:
> In the case of logging-log4j2 the package and class names are duplicated with 
> log4j to provide a bridge so that code does not need to be rewritten to 
> upgrade. However, if you look at the line counts you will see that they are 
> not the same as the classes are very different.
>
> Ralph
>

-
To unsubscribe, e-mail: community-unsubscr...@apache.org
For additional commands, e-mail: community-h...@apache.org



Re: duplicate class names in ASF Java projects

2014-03-21 Thread Pawel Slusarz
On 3/21/14, 8:58 AM, sebb wrote:
> Note that sanselan was renamed as commons imaging. However the package
> names were also changed so I'm not sure why they are shown as
> duplicates. sanselan: org.apache.sanselan imaging:
> org.apache.commons.imaging Perhaps the information has been derived
> from SVN rather than the published releases. In which case I suspect
> there are a lot of false positives. Not all SVN (or Git) source code
> is part of a release, and source code may go through various name
> changes. 

It looks like the rename was committed to sanselan in the source code
repo before the project was decomissioned. Glad the rename didn't make
it to a release jar. Thanks for the explanation.
Paul

-
To unsubscribe, e-mail: community-unsubscr...@apache.org
For additional commands, e-mail: community-h...@apache.org



Re: duplicate class names in ASF Java projects

2014-03-21 Thread Christopher
You may want to filter out small files, or common file name
conventions: e.g.
https://github.com/apache/accumulo/blob/trunk/maven-plugin/src/it/plugin-test/postbuild.groovy
and 
https://github.com/apache/maven-plugins/blob/trunk/maven-invoker-plugin/src/it/script-additional-vars/src/it/groovy/postbuild.groovy
are not the same, but probably were both built from the same example
template.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Fri, Mar 21, 2014 at 12:49 AM, Pawel Slusarz  wrote:
> Greetings,
>
> When looking at the Apache SF Java projects as a group, I noticed that a
> large number of projects have duplicate class names, ie
> both openejb and tomee have a class named
> jug.client.command.api.AbstractCommand
>
> When edge cases, ie test.Foo and tomcat55, tomcat60, tomcat70 get
> eliminated, it still appears that the practice of code sharing by
> drag-drop-modify is quite prevalent. Over 14,000 (out of 165,000)
> classes were shared that way in the ecosystem, and 103 projects (out of
> 300) are affected.
>
> Sometimes a measurement and visualization is all it takes to realize a
> problem and begin fixing it. Below is raw data that can help understand
> better what and how is happening:
>
> http://pslusarz.github.io/archeology3d/research/apache/conflicting-classes/index.html
>
> Hope this is the right place to engage in this sort of conversation.
>
> Paul Slusarz
>
> PS: Who am I and what's my agenda? I am interested in looking at large
> codebases in search of patterns. I picked Apache SF, because, unlike my
> company code, the data can be independently verified. The issue with
> conflicting class names became apparent as I was trying to identify and
> understand classes that are shared in this ecosystem. Some more
> background on this approach can be found on my blog:
> http://10kftcode.blogspot.com/
>
> -
> To unsubscribe, e-mail: community-unsubscr...@apache.org
> For additional commands, e-mail: community-h...@apache.org
>

-
To unsubscribe, e-mail: community-unsubscr...@apache.org
For additional commands, e-mail: community-h...@apache.org



Re: duplicate class names in ASF Java projects

2014-03-21 Thread Ralph Goers
In the case of logging-log4j2 the package and class names are duplicated with 
log4j to provide a bridge so that code does not need to be rewritten to 
upgrade. However, if you look at the line counts you will see that they are not 
the same as the classes are very different.

Ralph


On Mar 20, 2014, at 9:49 PM, Pawel Slusarz  wrote:

> Greetings,
> 
> When looking at the Apache SF Java projects as a group, I noticed that a
> large number of projects have duplicate class names, ie
> both openejb and tomee have a class named
> jug.client.command.api.AbstractCommand
> 
> When edge cases, ie test.Foo and tomcat55, tomcat60, tomcat70 get
> eliminated, it still appears that the practice of code sharing by
> drag-drop-modify is quite prevalent. Over 14,000 (out of 165,000)
> classes were shared that way in the ecosystem, and 103 projects (out of
> 300) are affected.
> 
> Sometimes a measurement and visualization is all it takes to realize a
> problem and begin fixing it. Below is raw data that can help understand
> better what and how is happening:
> 
> http://pslusarz.github.io/archeology3d/research/apache/conflicting-classes/index.html
> 
> Hope this is the right place to engage in this sort of conversation.
> 
> Paul Slusarz
> 
> PS: Who am I and what's my agenda? I am interested in looking at large
> codebases in search of patterns. I picked Apache SF, because, unlike my
> company code, the data can be independently verified. The issue with
> conflicting class names became apparent as I was trying to identify and
> understand classes that are shared in this ecosystem. Some more
> background on this approach can be found on my blog:
> http://10kftcode.blogspot.com/
> 
> -
> To unsubscribe, e-mail: community-unsubscr...@apache.org
> For additional commands, e-mail: community-h...@apache.org
> 


-
To unsubscribe, e-mail: community-unsubscr...@apache.org
For additional commands, e-mail: community-h...@apache.org



Re: duplicate class names in ASF Java projects

2014-03-21 Thread sebb
On 21 March 2014 04:49, Pawel Slusarz  wrote:
> Greetings,
>
> When looking at the Apache SF Java projects as a group, I noticed that a
> large number of projects have duplicate class names, ie
> both openejb and tomee have a class named
> jug.client.command.api.AbstractCommand
>
> When edge cases, ie test.Foo and tomcat55, tomcat60, tomcat70 get
> eliminated, it still appears that the practice of code sharing by
> drag-drop-modify is quite prevalent. Over 14,000 (out of 165,000)
> classes were shared that way in the ecosystem, and 103 projects (out of
> 300) are affected.

Note that sanselan was renamed as commons imaging.
However the package names were also changed so I'm not sure why they
are shown as duplicates.

sanselan: org.apache.sanselan
imaging: org.apache.commons.imaging

Perhaps the information has been derived from SVN rather than the
published releases.
In which case I suspect there are a lot of false positives.
Not all SVN (or Git) source code is part of a release, and source code
may go through various name changes.

> Sometimes a measurement and visualization is all it takes to realize a
> problem and begin fixing it. Below is raw data that can help understand
> better what and how is happening:
>
> http://pslusarz.github.io/archeology3d/research/apache/conflicting-classes/index.html
>
> Hope this is the right place to engage in this sort of conversation.
>
> Paul Slusarz
>
> PS: Who am I and what's my agenda? I am interested in looking at large
> codebases in search of patterns. I picked Apache SF, because, unlike my
> company code, the data can be independently verified. The issue with
> conflicting class names became apparent as I was trying to identify and
> understand classes that are shared in this ecosystem. Some more
> background on this approach can be found on my blog:
> http://10kftcode.blogspot.com/
>
> -
> To unsubscribe, e-mail: community-unsubscr...@apache.org
> For additional commands, e-mail: community-h...@apache.org
>

-
To unsubscribe, e-mail: community-unsubscr...@apache.org
For additional commands, e-mail: community-h...@apache.org