Re: [FOSSology] License highlighting (Bob Gobeille)

2011-08-01 Thread Bob Gobeille
On Jul 29, 2011, at 3:57 PM, Dragoslav Mitrinovic wrote:

 I'll add my vote for license match highlighting. This was an extremely useful 
 feature in bsam, and for us it is the single most missed feature since the 
 transition to nomos. In fact, while I like nomos for its speed and easier 
 review of results, I wish bsam was not removed and was instead left as an 
 alternative scanning agent. We are still using 1.3.0, where both agents 
 co-exist side-by-side, and we sometimes run scans with both. When we scan 
 with both agents, we rely primarily on nomos scan, but also check bsam 
 results for few particular licenses of high interest to us. At times, bsam 
 would find things that nomos missed, so it's a good complement to nomos. 

A problem we have with bsam is that we don't have a maintainer.  Thats why we 
don't want to include it in our release.   

Since nomos is your primary scanner, why don't we work on it so that you don't 
need bsam?  If you find files where nomos misses a license, could you send them 
to the list (or me or any developer)?  Or better yet, go to 
http://bugs.linux-foundation.org/ and file a bug?


 One particular type of files that nomos was not stellar with is a class of 
 debian copyright files. Those files tend to be long and list many licenses 
 aggregated from many source files, which is in contrast to single source 
 files with which nomos heuristics were primarily developed.  

I was going to file this bug for you, but most of the debian copyright files I 
see are short and only have a single license.  Could you file a bug on a 
specific file?

 Another issue we ran into with nomos is scanning of native executables. For 
 instance, if you scan GNU tar executable with bSAM and nomos, bSAM will 
 report GPLv3 while nomos will stay silent. GNU tar has an embedded license 
 statement (type tar --version to see it), and bSAM finds this string. Nomos 
 on the other hand expects files to be in a form of a single 0-terminated 
 string, and so regexp search on a binary file will typically terminate 
 prematurely.

This was fixed in 1.4.1
http://bugs.linux-foundation.org/show_bug.cgi?id=723

 This is a longish post, so I'll summarize briefly:
 license match highlighting is super-important for us
 we would love to see bSAM reappear in its 1.3.0 form, even if no further 
 development is planned for it
 I'd suggest running Unix strings(1) command on native executables prior to 
 passing the data to nomos 
I think I've got these addressed above.

 By the way, license match highlighting was important enough for us that I've 
 spent some time studying how nomos works and thinking about how it could be 
 done. I have a very rough proof of concept thing that works about 90% of the 
 time. (By works I mean it gives you some idea about where the license is 
 found in the file). I'd be happy to share the ideas (and code if there is 
 interest). This would be probably better suited for fossology-devel mailing 
 list. Let me know if you are interested, and I could join the list and talk 
 there...

You have modified nomos code so the highlighting works 90% of the time?  That's 
excellent.  Yes, that's a fossology-devel subject.  Doing it yourself is the 
best way to get what you want.  ;-)   Please share your code.

You didn't mention about adding licenses to nomos.  That's something I'd think 
nobody would be happy with.

Thanks!
Bob Gobeille

___
fossology mailing list
fossology@fossology.org
http://fossology.org/mailman/listinfo/fossology


Re: [FOSSology] License highlighting (Bob Gobeille)

2011-08-01 Thread Dragoslav Mitrinovic
On Mon, Aug 1, 2011 at 10:17 AM, Bob Gobeille bob.gobei...@hp.com wrote:

 On Jul 29, 2011, at 3:57 PM, Dragoslav Mitrinovic wrote:

 I'll add my vote for license match highlighting. This was an extremely
 useful feature in bsam, and for us it is the single most missed feature
 since the transition to nomos. In fact, while I like nomos for its speed and
 easier review of results, I wish bsam was not removed and was instead left
 as an alternative scanning agent. We are still using 1.3.0, where both
 agents co-exist side-by-side, and we sometimes run scans with both. When we
 scan with both agents, we rely primarily on nomos scan, but also check bsam
 results for few particular licenses of high interest to us. At times, bsam
 would find things that nomos missed, so it's a good complement to nomos.


 A problem we have with bsam is that we don't have a maintainer.  Thats why
 we don't want to include it in our release.

 Since nomos is your primary scanner, why don't we work on it so that you
 don't need bsam?  If you find files where nomos misses a license, could you
 send them to the list (or me or any developer)?  Or better yet, go to
 http://bugs.linux-foundation.org/ and file a bug?


While I'd love to keep bSAM as an alternative scanner for a while, I myself
don't have the resources to volunteer to be a maintainer. :-)  So yeah, I
understand you perfectly well.

I'll start feeding specific files and examples to the bug tracking system.


 One particular type of files that nomos was not stellar with is a class of
 debian copyright files. Those files tend to be long and list many licenses
 aggregated from many source files, which is in contrast to single source
 files with which nomos heuristics were primarily developed.


 I was going to file this bug for you, but most of the debian copyright
 files I see are short and only have a single license.  Could you file a bug
 on a specific file?


Sure, next time we scan something with lot of debian copyright files, I'll
note specific examples and file bugs.

Another issue we ran into with nomos is scanning of native executables. For
 instance, if you scan GNU tar executable with bSAM and nomos, bSAM will
 report GPLv3 while nomos will stay silent. GNU tar has an embedded license
 statement (type tar --version to see it), and bSAM finds this string. Nomos
 on the other hand expects files to be in a form of a single 0-terminated
 string, and so regexp search on a binary file will typically terminate
 prematurely.


 This was fixed in 1.4.1
 http://bugs.linux-foundation.org/show_bug.cgi?id=723


We are yet to switch to 1.4.x, but it's great to know this was fixed,
thanks!

SNIP


 You have modified nomos code so the highlighting works 90% of the time?
  That's excellent.  Yes, that's a fossology-devel subject.  Doing it
 yourself is the best way to get what you want.  ;-)   Please share your
 code.


OK, I'll join the fossology-devel list and share it. I work for a large
company, so it might take a week or so to get the required approvals to
share the code. Just don't keep your hopes too high - what I have is more of
a proof of concept than the fully integrated solution.


 You didn't mention about adding licenses to nomos.  That's something I'd
 think nobody would be happy with.


Well, yes, it would be great if one could add new license to nomos from the
GUI, without changing code and recompiling.

Speaking of new licenses, one thing we miss from bSAM are phrases. Thanks to
those phrases, bSAM was fairly good at detecting proprietary code, which is
important for instance when you are vetting code to be released as OSS. We
added few new regexps to our version of nomos to catch proprietary code, but
I'd need to clean it up before it could be contributed - there are too many
false positives still.


 Thanks!
 Bob Gobeille


___
fossology mailing list
fossology@fossology.org
http://fossology.org/mailman/listinfo/fossology


Re: [FOSSology] License highlighting (Bob Gobeille)

2011-08-01 Thread Bob Gobeille

On Aug 1, 2011, at 10:03 AM, Dragoslav Mitrinovic wrote:

 Speaking of new licenses, one thing we miss from bSAM are phrases. Thanks to 
 those phrases, bSAM was fairly good at detecting proprietary code, which is 
 important for instance when you are vetting code to be released as OSS.

Nomos signatures (STRINGS.in) are phrases.Unfortunately, as you know, it 
does require you to recompile.


 We added few new regexps to our version of nomos to catch proprietary code, 
 but I'd need to clean it up before it could be contributed - there are too 
 many false positives still. 

Feel free to file a bug/enhancement if you want to describe what phrases you 
want to catch.   Make sure you attach sample files.  What we do to test new 
license signatures is first run a baseline nomos from the command line on the 
100MB RedHat.tar.gz in http://fossology.org/testing/testFiles/
Then add the license/phrase, rerun, then compare results.

Bob Gobeille
___
fossology mailing list
fossology@fossology.org
http://fossology.org/mailman/listinfo/fossology