Re: [FOSSology] License highlighting (Bob Gobeille)
On Jul 29, 2011, at 3:57 PM, Dragoslav Mitrinovic wrote: I'll add my vote for license match highlighting. This was an extremely useful feature in bsam, and for us it is the single most missed feature since the transition to nomos. In fact, while I like nomos for its speed and easier review of results, I wish bsam was not removed and was instead left as an alternative scanning agent. We are still using 1.3.0, where both agents co-exist side-by-side, and we sometimes run scans with both. When we scan with both agents, we rely primarily on nomos scan, but also check bsam results for few particular licenses of high interest to us. At times, bsam would find things that nomos missed, so it's a good complement to nomos. A problem we have with bsam is that we don't have a maintainer. Thats why we don't want to include it in our release. Since nomos is your primary scanner, why don't we work on it so that you don't need bsam? If you find files where nomos misses a license, could you send them to the list (or me or any developer)? Or better yet, go to http://bugs.linux-foundation.org/ and file a bug? One particular type of files that nomos was not stellar with is a class of debian copyright files. Those files tend to be long and list many licenses aggregated from many source files, which is in contrast to single source files with which nomos heuristics were primarily developed. I was going to file this bug for you, but most of the debian copyright files I see are short and only have a single license. Could you file a bug on a specific file? Another issue we ran into with nomos is scanning of native executables. For instance, if you scan GNU tar executable with bSAM and nomos, bSAM will report GPLv3 while nomos will stay silent. GNU tar has an embedded license statement (type tar --version to see it), and bSAM finds this string. Nomos on the other hand expects files to be in a form of a single 0-terminated string, and so regexp search on a binary file will typically terminate prematurely. This was fixed in 1.4.1 http://bugs.linux-foundation.org/show_bug.cgi?id=723 This is a longish post, so I'll summarize briefly: license match highlighting is super-important for us we would love to see bSAM reappear in its 1.3.0 form, even if no further development is planned for it I'd suggest running Unix strings(1) command on native executables prior to passing the data to nomos I think I've got these addressed above. By the way, license match highlighting was important enough for us that I've spent some time studying how nomos works and thinking about how it could be done. I have a very rough proof of concept thing that works about 90% of the time. (By works I mean it gives you some idea about where the license is found in the file). I'd be happy to share the ideas (and code if there is interest). This would be probably better suited for fossology-devel mailing list. Let me know if you are interested, and I could join the list and talk there... You have modified nomos code so the highlighting works 90% of the time? That's excellent. Yes, that's a fossology-devel subject. Doing it yourself is the best way to get what you want. ;-) Please share your code. You didn't mention about adding licenses to nomos. That's something I'd think nobody would be happy with. Thanks! Bob Gobeille ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology
Re: [FOSSology] License highlighting (Bob Gobeille)
On Mon, Aug 1, 2011 at 10:17 AM, Bob Gobeille bob.gobei...@hp.com wrote: On Jul 29, 2011, at 3:57 PM, Dragoslav Mitrinovic wrote: I'll add my vote for license match highlighting. This was an extremely useful feature in bsam, and for us it is the single most missed feature since the transition to nomos. In fact, while I like nomos for its speed and easier review of results, I wish bsam was not removed and was instead left as an alternative scanning agent. We are still using 1.3.0, where both agents co-exist side-by-side, and we sometimes run scans with both. When we scan with both agents, we rely primarily on nomos scan, but also check bsam results for few particular licenses of high interest to us. At times, bsam would find things that nomos missed, so it's a good complement to nomos. A problem we have with bsam is that we don't have a maintainer. Thats why we don't want to include it in our release. Since nomos is your primary scanner, why don't we work on it so that you don't need bsam? If you find files where nomos misses a license, could you send them to the list (or me or any developer)? Or better yet, go to http://bugs.linux-foundation.org/ and file a bug? While I'd love to keep bSAM as an alternative scanner for a while, I myself don't have the resources to volunteer to be a maintainer. :-) So yeah, I understand you perfectly well. I'll start feeding specific files and examples to the bug tracking system. One particular type of files that nomos was not stellar with is a class of debian copyright files. Those files tend to be long and list many licenses aggregated from many source files, which is in contrast to single source files with which nomos heuristics were primarily developed. I was going to file this bug for you, but most of the debian copyright files I see are short and only have a single license. Could you file a bug on a specific file? Sure, next time we scan something with lot of debian copyright files, I'll note specific examples and file bugs. Another issue we ran into with nomos is scanning of native executables. For instance, if you scan GNU tar executable with bSAM and nomos, bSAM will report GPLv3 while nomos will stay silent. GNU tar has an embedded license statement (type tar --version to see it), and bSAM finds this string. Nomos on the other hand expects files to be in a form of a single 0-terminated string, and so regexp search on a binary file will typically terminate prematurely. This was fixed in 1.4.1 http://bugs.linux-foundation.org/show_bug.cgi?id=723 We are yet to switch to 1.4.x, but it's great to know this was fixed, thanks! SNIP You have modified nomos code so the highlighting works 90% of the time? That's excellent. Yes, that's a fossology-devel subject. Doing it yourself is the best way to get what you want. ;-) Please share your code. OK, I'll join the fossology-devel list and share it. I work for a large company, so it might take a week or so to get the required approvals to share the code. Just don't keep your hopes too high - what I have is more of a proof of concept than the fully integrated solution. You didn't mention about adding licenses to nomos. That's something I'd think nobody would be happy with. Well, yes, it would be great if one could add new license to nomos from the GUI, without changing code and recompiling. Speaking of new licenses, one thing we miss from bSAM are phrases. Thanks to those phrases, bSAM was fairly good at detecting proprietary code, which is important for instance when you are vetting code to be released as OSS. We added few new regexps to our version of nomos to catch proprietary code, but I'd need to clean it up before it could be contributed - there are too many false positives still. Thanks! Bob Gobeille ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology
Re: [FOSSology] License highlighting (Bob Gobeille)
On Aug 1, 2011, at 10:03 AM, Dragoslav Mitrinovic wrote: Speaking of new licenses, one thing we miss from bSAM are phrases. Thanks to those phrases, bSAM was fairly good at detecting proprietary code, which is important for instance when you are vetting code to be released as OSS. Nomos signatures (STRINGS.in) are phrases.Unfortunately, as you know, it does require you to recompile. We added few new regexps to our version of nomos to catch proprietary code, but I'd need to clean it up before it could be contributed - there are too many false positives still. Feel free to file a bug/enhancement if you want to describe what phrases you want to catch. Make sure you attach sample files. What we do to test new license signatures is first run a baseline nomos from the command line on the 100MB RedHat.tar.gz in http://fossology.org/testing/testFiles/ Then add the license/phrase, rerun, then compare results. Bob Gobeille ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology