Re: Output Semantics

2009-05-01 Thread Robert Burrell Donkin
Jochen Wiedmann wrote:
 Hi,
 
 as you have probably noticed, I have created a new branch for
 experimenting with RAT. The reason for creating a branch was that I
 found RAT's way of emitting output plainly confusing, at least to me.
 I never fully understood the system with subject, predicate, and
 object. In particular, it was never clear to me, how header
 sample, license family, and so on relate. Apart from that, RAT-14
 strongly asked for a semantically richer output than basically a table
 with three columns.
 
 I have now (partially) resolved this in a way that satisfies me (but
 possibly others not as well): The output is now a series of IClaim
 objects with a class hierarchy that provides the semantical
 information. In particular (resolving RAT-14), running RAT will now
 result in the creation of a ClaimStatistic. This result can be
 viewed on
 
   
 https://svn.apache.org/repos/asf/incubator/rat/main/branches/rat-output-semantics/
 
 I would now like to ask for confirmation to treat this as the base for
 RAT 0.7. As I do now have a more thorough understanding, I should as
 well be able to roll back most of my changes and create the
 ClaimStatistic with comparatively minor changes. However, my feeling
 is that others would share my problems in the future.
 
 If noone else intervenes, then I'd move the current trunk to
 branches/apache-rat-project-0.6 and my private branch to the trunk.
 I'd also like to use the ClaimStatistics to create a set of
 so-called policies. Policies would be simple plugins for the RAT Maven
 Plugin, which allow to configure the required behaviour quite easily.
 Typical policies might be only ASL files, only approved licenses,
 at most 3 unknown files, and so on. This allows projects to
 integrate RAT into their standard build, refusing the build, if the
 policy isn't met.

in the long run, i think that more complex inferences are going to be
required for real life policies than rat can easily support
programmatically. reusing semantic web stuff seems like a reasonable
solution, allowing users to create their own license ontologies. hence
the simple, loosely coupled triple based approach.

but the design approach is too complex and confusing. it was definitely
a mistake.

i've merged in the branch locally and it seems like a good step forward.
unless there are objections, i'll commit the merged code.

- robert



Re: Output Semantics

2009-04-30 Thread Robert Burrell Donkin
Robert Burrell Donkin wrote:
 Jochen Wiedmann wrote:
 On Mon, Mar 16, 2009 at 10:22 PM, Robert Burrell Donkin
 robertburrelldon...@blueyonder.co.uk wrote:

 one worry i had about non-streaming approaches is that they're not easy
 to use with big data sets (for example, scanning all the source in the
 incubator) since all the data needs to be in before the report can be
 produced
 I believe there is a misunderstandment, Robert. True, I have merged
 some of the previously multiple events into one, but only
 per-resource. That's still streaming.
 
 agreed

(i plan to restart work on RAT sometime soonish)

i'm happy with these changes if anyone wants to dive in

- robert



Re: Output Semantics

2009-03-16 Thread Jukka Zitting
Hi,

On Mon, Mar 16, 2009 at 1:32 AM, Jochen Wiedmann
jochen.wiedm...@gmail.com wrote:
 as you have probably noticed, I have created a new branch for
 experimenting with RAT. The reason for creating a branch was that I
 found RAT's way of emitting output plainly confusing, at least to me.
 I never fully understood the system with subject, predicate, and
 object.

AFAIK that comes from the RDF data model. It's a pretty comprehensive
framework for expressing all sorts of metadata, but as you notice it
does require some higher level tools to answer questions like the ones
implied by the policy feature you mention.

BR,

Jukka Zitting


Re: Output Semantics

2009-03-16 Thread Robert Burrell Donkin
Jukka Zitting wrote:
 Hi,
 
 On Mon, Mar 16, 2009 at 1:32 AM, Jochen Wiedmann
 jochen.wiedm...@gmail.com wrote:
 as you have probably noticed, I have created a new branch for
 experimenting with RAT. The reason for creating a branch was that I
 found RAT's way of emitting output plainly confusing, at least to me.
 I never fully understood the system with subject, predicate, and
 object.
 
 AFAIK that comes from the RDF data model. 

+1

 It's a pretty comprehensive
 framework for expressing all sorts of metadata, but as you notice it
 does require some higher level tools to answer questions like the ones
 implied by the policy feature you mention.

+1

in general, using semantics allows more complex policy problems to be
solve (in particular, thinking about licensing families). the streaming
RDF approach taken is RAT is not the right one, though. seemed like a
reasonable design at the time but it adds complexity and it's not really
reasonable to try to perform streaming logic.

- robert



Re: Output Semantics

2009-03-16 Thread Robert Burrell Donkin
Jochen Wiedmann wrote:
 Hi,
 
 as you have probably noticed, I have created a new branch for
 experimenting with RAT. The reason for creating a branch was that I
 found RAT's way of emitting output plainly confusing, at least to me.
 I never fully understood the system with subject, predicate, and
 object. In particular, it was never clear to me, how header
 sample, license family, and so on relate. Apart from that, RAT-14
 strongly asked for a semantically richer output than basically a table
 with three columns.

RDF is surprisingly powerful (but the streaming design was a mistake),
and the power lies in the loose coupling between concepts. probably a
meta-data store design would have been better (and easier to understand).

 I have now (partially) resolved this in a way that satisfies me (but
 possibly others not as well): The output is now a series of IClaim
 objects with a class hierarchy that provides the semantical
 information. In particular (resolving RAT-14), running RAT will now
 result in the creation of a ClaimStatistic. This result can be
 viewed on
 
   
 https://svn.apache.org/repos/asf/incubator/rat/main/branches/rat-output-semantics/

 I would now like to ask for confirmation to treat this as the base for
 RAT 0.7. As I do now have a more thorough understanding, I should as
 well be able to roll back most of my changes and create the
 ClaimStatistic with comparatively minor changes. However, my feeling
 is that others would share my problems in the future.

one worry i had about non-streaming approaches is that they're not easy
to use with big data sets (for example, scanning all the source in the
incubator) since all the data needs to be in before the report can be
produced

but i haven't found much time for RAT so feel free to take the design in
whatever direction you want. experience with scan is that the code reuse
has turned out to be limited.

 If noone else intervenes, then I'd move the current trunk to
 branches/apache-rat-project-0.6 and my private branch to the trunk.
 I'd also like to use the ClaimStatistics to create a set of
 so-called policies. Policies would be simple plugins for the RAT Maven
 Plugin, which allow to configure the required behaviour quite easily.
 Typical policies might be only ASL files, only approved licenses,
 at most 3 unknown files, and so on. This allows projects to
 integrate RAT into their standard build, refusing the build, if the
 policy isn't met.

i introduced the semantic stuff to handle policy

IMHO the right way to approach policies is through ontologies and RDF.
the problem is that there are only so many ways to handle the first
order logic that's required to solve this in the general.

- robert



Re: Output Semantics

2009-03-16 Thread Jochen Wiedmann
On Mon, Mar 16, 2009 at 10:22 PM, Robert Burrell Donkin
robertburrelldon...@blueyonder.co.uk wrote:

 one worry i had about non-streaming approaches is that they're not easy
 to use with big data sets (for example, scanning all the source in the
 incubator) since all the data needs to be in before the report can be
 produced

I believe there is a misunderstandment, Robert. True, I have merged
some of the previously multiple events into one, but only
per-resource. That's still streaming.


Jochen


-- 
I have always wished for my computer to be as easy to use as my
telephone; my wish has come true because I can no longer figure out
how to use my telephone.

-- (Bjarne Stroustrup,
http://www.research.att.com/~bs/bs_faq.html#really-say-that
   My guess: Nokia E50)


Re: Output Semantics

2009-03-16 Thread Robert Burrell Donkin
Jochen Wiedmann wrote:
 On Mon, Mar 16, 2009 at 10:22 PM, Robert Burrell Donkin
 robertburrelldon...@blueyonder.co.uk wrote:
 
 one worry i had about non-streaming approaches is that they're not easy
 to use with big data sets (for example, scanning all the source in the
 incubator) since all the data needs to be in before the report can be
 produced
 
 I believe there is a misunderstandment, Robert. True, I have merged
 some of the previously multiple events into one, but only
 per-resource. That's still streaming.

agreed

- robert



Output Semantics

2009-03-15 Thread Jochen Wiedmann
Hi,

as you have probably noticed, I have created a new branch for
experimenting with RAT. The reason for creating a branch was that I
found RAT's way of emitting output plainly confusing, at least to me.
I never fully understood the system with subject, predicate, and
object. In particular, it was never clear to me, how header
sample, license family, and so on relate. Apart from that, RAT-14
strongly asked for a semantically richer output than basically a table
with three columns.

I have now (partially) resolved this in a way that satisfies me (but
possibly others not as well): The output is now a series of IClaim
objects with a class hierarchy that provides the semantical
information. In particular (resolving RAT-14), running RAT will now
result in the creation of a ClaimStatistic. This result can be
viewed on

  
https://svn.apache.org/repos/asf/incubator/rat/main/branches/rat-output-semantics/

I would now like to ask for confirmation to treat this as the base for
RAT 0.7. As I do now have a more thorough understanding, I should as
well be able to roll back most of my changes and create the
ClaimStatistic with comparatively minor changes. However, my feeling
is that others would share my problems in the future.

If noone else intervenes, then I'd move the current trunk to
branches/apache-rat-project-0.6 and my private branch to the trunk.
I'd also like to use the ClaimStatistics to create a set of
so-called policies. Policies would be simple plugins for the RAT Maven
Plugin, which allow to configure the required behaviour quite easily.
Typical policies might be only ASL files, only approved licenses,
at most 3 unknown files, and so on. This allows projects to
integrate RAT into their standard build, refusing the build, if the
policy isn't met.

Jochen


-- 
I have always wished for my computer to be as easy to use as my
telephone; my wish has come true because I can no longer figure out
how to use my telephone.

-- (Bjarne Stroustrup,
http://www.research.att.com/~bs/bs_faq.html#really-say-that
   My guess: Nokia E50)