Re: Output Semantics
Jochen Wiedmann wrote: Hi, as you have probably noticed, I have created a new branch for experimenting with RAT. The reason for creating a branch was that I found RAT's way of emitting output plainly confusing, at least to me. I never fully understood the system with subject, predicate, and object. In particular, it was never clear to me, how header sample, license family, and so on relate. Apart from that, RAT-14 strongly asked for a semantically richer output than basically a table with three columns. I have now (partially) resolved this in a way that satisfies me (but possibly others not as well): The output is now a series of IClaim objects with a class hierarchy that provides the semantical information. In particular (resolving RAT-14), running RAT will now result in the creation of a ClaimStatistic. This result can be viewed on https://svn.apache.org/repos/asf/incubator/rat/main/branches/rat-output-semantics/ I would now like to ask for confirmation to treat this as the base for RAT 0.7. As I do now have a more thorough understanding, I should as well be able to roll back most of my changes and create the ClaimStatistic with comparatively minor changes. However, my feeling is that others would share my problems in the future. If noone else intervenes, then I'd move the current trunk to branches/apache-rat-project-0.6 and my private branch to the trunk. I'd also like to use the ClaimStatistics to create a set of so-called policies. Policies would be simple plugins for the RAT Maven Plugin, which allow to configure the required behaviour quite easily. Typical policies might be only ASL files, only approved licenses, at most 3 unknown files, and so on. This allows projects to integrate RAT into their standard build, refusing the build, if the policy isn't met. in the long run, i think that more complex inferences are going to be required for real life policies than rat can easily support programmatically. reusing semantic web stuff seems like a reasonable solution, allowing users to create their own license ontologies. hence the simple, loosely coupled triple based approach. but the design approach is too complex and confusing. it was definitely a mistake. i've merged in the branch locally and it seems like a good step forward. unless there are objections, i'll commit the merged code. - robert
Re: Output Semantics
Robert Burrell Donkin wrote: Jochen Wiedmann wrote: On Mon, Mar 16, 2009 at 10:22 PM, Robert Burrell Donkin robertburrelldon...@blueyonder.co.uk wrote: one worry i had about non-streaming approaches is that they're not easy to use with big data sets (for example, scanning all the source in the incubator) since all the data needs to be in before the report can be produced I believe there is a misunderstandment, Robert. True, I have merged some of the previously multiple events into one, but only per-resource. That's still streaming. agreed (i plan to restart work on RAT sometime soonish) i'm happy with these changes if anyone wants to dive in - robert
Re: Output Semantics
Hi, On Mon, Mar 16, 2009 at 1:32 AM, Jochen Wiedmann jochen.wiedm...@gmail.com wrote: as you have probably noticed, I have created a new branch for experimenting with RAT. The reason for creating a branch was that I found RAT's way of emitting output plainly confusing, at least to me. I never fully understood the system with subject, predicate, and object. AFAIK that comes from the RDF data model. It's a pretty comprehensive framework for expressing all sorts of metadata, but as you notice it does require some higher level tools to answer questions like the ones implied by the policy feature you mention. BR, Jukka Zitting
Re: Output Semantics
Jukka Zitting wrote: Hi, On Mon, Mar 16, 2009 at 1:32 AM, Jochen Wiedmann jochen.wiedm...@gmail.com wrote: as you have probably noticed, I have created a new branch for experimenting with RAT. The reason for creating a branch was that I found RAT's way of emitting output plainly confusing, at least to me. I never fully understood the system with subject, predicate, and object. AFAIK that comes from the RDF data model. +1 It's a pretty comprehensive framework for expressing all sorts of metadata, but as you notice it does require some higher level tools to answer questions like the ones implied by the policy feature you mention. +1 in general, using semantics allows more complex policy problems to be solve (in particular, thinking about licensing families). the streaming RDF approach taken is RAT is not the right one, though. seemed like a reasonable design at the time but it adds complexity and it's not really reasonable to try to perform streaming logic. - robert
Re: Output Semantics
Jochen Wiedmann wrote: Hi, as you have probably noticed, I have created a new branch for experimenting with RAT. The reason for creating a branch was that I found RAT's way of emitting output plainly confusing, at least to me. I never fully understood the system with subject, predicate, and object. In particular, it was never clear to me, how header sample, license family, and so on relate. Apart from that, RAT-14 strongly asked for a semantically richer output than basically a table with three columns. RDF is surprisingly powerful (but the streaming design was a mistake), and the power lies in the loose coupling between concepts. probably a meta-data store design would have been better (and easier to understand). I have now (partially) resolved this in a way that satisfies me (but possibly others not as well): The output is now a series of IClaim objects with a class hierarchy that provides the semantical information. In particular (resolving RAT-14), running RAT will now result in the creation of a ClaimStatistic. This result can be viewed on https://svn.apache.org/repos/asf/incubator/rat/main/branches/rat-output-semantics/ I would now like to ask for confirmation to treat this as the base for RAT 0.7. As I do now have a more thorough understanding, I should as well be able to roll back most of my changes and create the ClaimStatistic with comparatively minor changes. However, my feeling is that others would share my problems in the future. one worry i had about non-streaming approaches is that they're not easy to use with big data sets (for example, scanning all the source in the incubator) since all the data needs to be in before the report can be produced but i haven't found much time for RAT so feel free to take the design in whatever direction you want. experience with scan is that the code reuse has turned out to be limited. If noone else intervenes, then I'd move the current trunk to branches/apache-rat-project-0.6 and my private branch to the trunk. I'd also like to use the ClaimStatistics to create a set of so-called policies. Policies would be simple plugins for the RAT Maven Plugin, which allow to configure the required behaviour quite easily. Typical policies might be only ASL files, only approved licenses, at most 3 unknown files, and so on. This allows projects to integrate RAT into their standard build, refusing the build, if the policy isn't met. i introduced the semantic stuff to handle policy IMHO the right way to approach policies is through ontologies and RDF. the problem is that there are only so many ways to handle the first order logic that's required to solve this in the general. - robert
Re: Output Semantics
On Mon, Mar 16, 2009 at 10:22 PM, Robert Burrell Donkin robertburrelldon...@blueyonder.co.uk wrote: one worry i had about non-streaming approaches is that they're not easy to use with big data sets (for example, scanning all the source in the incubator) since all the data needs to be in before the report can be produced I believe there is a misunderstandment, Robert. True, I have merged some of the previously multiple events into one, but only per-resource. That's still streaming. Jochen -- I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone. -- (Bjarne Stroustrup, http://www.research.att.com/~bs/bs_faq.html#really-say-that My guess: Nokia E50)
Re: Output Semantics
Jochen Wiedmann wrote: On Mon, Mar 16, 2009 at 10:22 PM, Robert Burrell Donkin robertburrelldon...@blueyonder.co.uk wrote: one worry i had about non-streaming approaches is that they're not easy to use with big data sets (for example, scanning all the source in the incubator) since all the data needs to be in before the report can be produced I believe there is a misunderstandment, Robert. True, I have merged some of the previously multiple events into one, but only per-resource. That's still streaming. agreed - robert
Output Semantics
Hi, as you have probably noticed, I have created a new branch for experimenting with RAT. The reason for creating a branch was that I found RAT's way of emitting output plainly confusing, at least to me. I never fully understood the system with subject, predicate, and object. In particular, it was never clear to me, how header sample, license family, and so on relate. Apart from that, RAT-14 strongly asked for a semantically richer output than basically a table with three columns. I have now (partially) resolved this in a way that satisfies me (but possibly others not as well): The output is now a series of IClaim objects with a class hierarchy that provides the semantical information. In particular (resolving RAT-14), running RAT will now result in the creation of a ClaimStatistic. This result can be viewed on https://svn.apache.org/repos/asf/incubator/rat/main/branches/rat-output-semantics/ I would now like to ask for confirmation to treat this as the base for RAT 0.7. As I do now have a more thorough understanding, I should as well be able to roll back most of my changes and create the ClaimStatistic with comparatively minor changes. However, my feeling is that others would share my problems in the future. If noone else intervenes, then I'd move the current trunk to branches/apache-rat-project-0.6 and my private branch to the trunk. I'd also like to use the ClaimStatistics to create a set of so-called policies. Policies would be simple plugins for the RAT Maven Plugin, which allow to configure the required behaviour quite easily. Typical policies might be only ASL files, only approved licenses, at most 3 unknown files, and so on. This allows projects to integrate RAT into their standard build, refusing the build, if the policy isn't met. Jochen -- I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone. -- (Bjarne Stroustrup, http://www.research.att.com/~bs/bs_faq.html#really-say-that My guess: Nokia E50)