Re: [FOSSology] License highlighting (Bob Gobeille)
On Jul 29, 2011, at 3:57 PM, Dragoslav Mitrinovic wrote: I'll add my vote for license match highlighting. This was an extremely useful feature in bsam, and for us it is the single most missed feature since the transition to nomos. In fact, while I like nomos for its speed and easier review of results, I wish bsam was not removed and was instead left as an alternative scanning agent. We are still using 1.3.0, where both agents co-exist side-by-side, and we sometimes run scans with both. When we scan with both agents, we rely primarily on nomos scan, but also check bsam results for few particular licenses of high interest to us. At times, bsam would find things that nomos missed, so it's a good complement to nomos. A problem we have with bsam is that we don't have a maintainer. Thats why we don't want to include it in our release. Since nomos is your primary scanner, why don't we work on it so that you don't need bsam? If you find files where nomos misses a license, could you send them to the list (or me or any developer)? Or better yet, go to http://bugs.linux-foundation.org/ and file a bug? One particular type of files that nomos was not stellar with is a class of debian copyright files. Those files tend to be long and list many licenses aggregated from many source files, which is in contrast to single source files with which nomos heuristics were primarily developed. I was going to file this bug for you, but most of the debian copyright files I see are short and only have a single license. Could you file a bug on a specific file? Another issue we ran into with nomos is scanning of native executables. For instance, if you scan GNU tar executable with bSAM and nomos, bSAM will report GPLv3 while nomos will stay silent. GNU tar has an embedded license statement (type tar --version to see it), and bSAM finds this string. Nomos on the other hand expects files to be in a form of a single 0-terminated string, and so regexp search on a binary file will typically terminate prematurely. This was fixed in 1.4.1 http://bugs.linux-foundation.org/show_bug.cgi?id=723 This is a longish post, so I'll summarize briefly: license match highlighting is super-important for us we would love to see bSAM reappear in its 1.3.0 form, even if no further development is planned for it I'd suggest running Unix strings(1) command on native executables prior to passing the data to nomos I think I've got these addressed above. By the way, license match highlighting was important enough for us that I've spent some time studying how nomos works and thinking about how it could be done. I have a very rough proof of concept thing that works about 90% of the time. (By works I mean it gives you some idea about where the license is found in the file). I'd be happy to share the ideas (and code if there is interest). This would be probably better suited for fossology-devel mailing list. Let me know if you are interested, and I could join the list and talk there... You have modified nomos code so the highlighting works 90% of the time? That's excellent. Yes, that's a fossology-devel subject. Doing it yourself is the best way to get what you want. ;-) Please share your code. You didn't mention about adding licenses to nomos. That's something I'd think nobody would be happy with. Thanks! Bob Gobeille ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology
Re: [FOSSology] License highlighting (Bob Gobeille)
On Mon, Aug 1, 2011 at 10:17 AM, Bob Gobeille bob.gobei...@hp.com wrote: On Jul 29, 2011, at 3:57 PM, Dragoslav Mitrinovic wrote: I'll add my vote for license match highlighting. This was an extremely useful feature in bsam, and for us it is the single most missed feature since the transition to nomos. In fact, while I like nomos for its speed and easier review of results, I wish bsam was not removed and was instead left as an alternative scanning agent. We are still using 1.3.0, where both agents co-exist side-by-side, and we sometimes run scans with both. When we scan with both agents, we rely primarily on nomos scan, but also check bsam results for few particular licenses of high interest to us. At times, bsam would find things that nomos missed, so it's a good complement to nomos. A problem we have with bsam is that we don't have a maintainer. Thats why we don't want to include it in our release. Since nomos is your primary scanner, why don't we work on it so that you don't need bsam? If you find files where nomos misses a license, could you send them to the list (or me or any developer)? Or better yet, go to http://bugs.linux-foundation.org/ and file a bug? While I'd love to keep bSAM as an alternative scanner for a while, I myself don't have the resources to volunteer to be a maintainer. :-) So yeah, I understand you perfectly well. I'll start feeding specific files and examples to the bug tracking system. One particular type of files that nomos was not stellar with is a class of debian copyright files. Those files tend to be long and list many licenses aggregated from many source files, which is in contrast to single source files with which nomos heuristics were primarily developed. I was going to file this bug for you, but most of the debian copyright files I see are short and only have a single license. Could you file a bug on a specific file? Sure, next time we scan something with lot of debian copyright files, I'll note specific examples and file bugs. Another issue we ran into with nomos is scanning of native executables. For instance, if you scan GNU tar executable with bSAM and nomos, bSAM will report GPLv3 while nomos will stay silent. GNU tar has an embedded license statement (type tar --version to see it), and bSAM finds this string. Nomos on the other hand expects files to be in a form of a single 0-terminated string, and so regexp search on a binary file will typically terminate prematurely. This was fixed in 1.4.1 http://bugs.linux-foundation.org/show_bug.cgi?id=723 We are yet to switch to 1.4.x, but it's great to know this was fixed, thanks! SNIP You have modified nomos code so the highlighting works 90% of the time? That's excellent. Yes, that's a fossology-devel subject. Doing it yourself is the best way to get what you want. ;-) Please share your code. OK, I'll join the fossology-devel list and share it. I work for a large company, so it might take a week or so to get the required approvals to share the code. Just don't keep your hopes too high - what I have is more of a proof of concept than the fully integrated solution. You didn't mention about adding licenses to nomos. That's something I'd think nobody would be happy with. Well, yes, it would be great if one could add new license to nomos from the GUI, without changing code and recompiling. Speaking of new licenses, one thing we miss from bSAM are phrases. Thanks to those phrases, bSAM was fairly good at detecting proprietary code, which is important for instance when you are vetting code to be released as OSS. We added few new regexps to our version of nomos to catch proprietary code, but I'd need to clean it up before it could be contributed - there are too many false positives still. Thanks! Bob Gobeille ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology
Re: [FOSSology] License highlighting (Bob Gobeille)
On Aug 1, 2011, at 10:03 AM, Dragoslav Mitrinovic wrote: Speaking of new licenses, one thing we miss from bSAM are phrases. Thanks to those phrases, bSAM was fairly good at detecting proprietary code, which is important for instance when you are vetting code to be released as OSS. Nomos signatures (STRINGS.in) are phrases.Unfortunately, as you know, it does require you to recompile. We added few new regexps to our version of nomos to catch proprietary code, but I'd need to clean it up before it could be contributed - there are too many false positives still. Feel free to file a bug/enhancement if you want to describe what phrases you want to catch. Make sure you attach sample files. What we do to test new license signatures is first run a baseline nomos from the command line on the 100MB RedHat.tar.gz in http://fossology.org/testing/testFiles/ Then add the license/phrase, rerun, then compare results. Bob Gobeille ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology
Re: [FOSSology] License highlighting (Bob Gobeille)
I'll add my vote for license match highlighting. This was an extremely useful feature in bsam, and for us it is the single most missed feature since the transition to nomos. In fact, while I like nomos for its speed and easier review of results, I wish bsam was not removed and was instead left as an alternative scanning agent. We are still using 1.3.0, where both agents co-exist side-by-side, and we sometimes run scans with both. When we scan with both agents, we rely primarily on nomos scan, but also check bsam results for few particular licenses of high interest to us. At times, bsam would find things that nomos missed, so it's a good complement to nomos. One particular type of files that nomos was not stellar with is a class of debian copyright files. Those files tend to be long and list many licenses aggregated from many source files, which is in contrast to single source files with which nomos heuristics were primarily developed. Another issue we ran into with nomos is scanning of native executables. For instance, if you scan GNU tar executable with bSAM and nomos, bSAM will report GPLv3 while nomos will stay silent. GNU tar has an embedded license statement (type tar --version to see it), and bSAM finds this string. Nomos on the other hand expects files to be in a form of a single 0-terminated string, and so regexp search on a binary file will typically terminate prematurely. Of course, I realize that nomos and fossology were designed primariliy for scanning source code (possibly deeply archived), but bSAM agent's ability to find license strings in native executables was very nice, and we miss it in nomos. (By the way, one possible improvement would be to filter native executables and possibly other non-text files through Unix strings command prior to passing the data to nomos - e.g. that works for my GNU tar example). This is a longish post, so I'll summarize briefly: 1. license match highlighting is super-important for us 2. we would love to see bSAM reappear in its 1.3.0 form, even if no further development is planned for it 3. I'd suggest running Unix strings(1) command on native executables prior to passing the data to nomos By the way, license match highlighting was important enough for us that I've spent some time studying how nomos works and thinking about how it could be done. I have a very rough proof of concept thing that works about 90% of the time. (By works I mean it gives you some idea about where the license is found in the file). I'd be happy to share the ideas (and code if there is interest). This would be probably better suited for fossology-devel mailing list. Let me know if you are interested, and I could join the list and talk there... Best regards, Drago Mitrinovic Motorola Mobility Open Source Review Board On Fri, Jul 29, 2011 at 10:22 AM, Laser, Mary mary.la...@hp.com wrote: Hello FOSSologists! Thank you all for your votes. It's very important for us to hear from our users so we know how to prioritize the many features and requests we have on our to-do list. We REALY do listen and value your feedback. Keep it coming! The FOSSology Project http://fossology.org -Original Message- From: fossology-boun...@fossology.org [mailto:fossology- boun...@fossology.org] On Behalf Of Dabrowski, Ivo Sent: Friday, July 29, 2011 6:12 AM Subject: Re: [FOSSology] License highlighting (Bob Gobeille) Here's my vote, too. BSAM as used in older versions of FOSSology reveals matches (and derivations) easily. Ivo -Ursprüngliche Nachricht- Von: fossology-boun...@fossology.org [mailto:fossology- boun...@fossology.org] Im Auftrag von Bob Gobeille Gesendet: Freitag, 29. Juli 2011 01:27 Betreff: Re: [FOSSology] License highlighting (Bob Gobeille) Oh - multiple pages. That is painful. Thanks for voting Dave. Bob Gobeille On Jul 28, 2011, at 4:43 PM, Dave McLoughlin wrote: I'll cast my vote, highlighting is very important to us. We spend a lot of time searching, scrolling and manually scanning contents to find a match when there's no highlighting. It's extremely painful when the contents are displayed across multiple pages. Dave Date: Wed, 27 Jul 2011 12:05:56 -0600 From: Bob Gobeille bob.gobei...@hp.com Subject: Re: [FOSSology] License highlighting Hello Volker, The database table license_file is where nomos records what license matched in what file. There are columns for where in the file the match occurred, but they are not currently populated by nomos. Would anyone else like to vote on how important highlighting the license match is? Bob Gobeille ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology ___ fossology mailing list fossology@fossology.org http://fossology.org
Re: [FOSSology] License highlighting (Bob Gobeille)
Here's my vote, too. BSAM as used in older versions of FOSSology reveals matches (and derivations) easily. Ivo -Ursprüngliche Nachricht- Von: fossology-boun...@fossology.org [mailto:fossology-boun...@fossology.org] Im Auftrag von Bob Gobeille Gesendet: Freitag, 29. Juli 2011 01:27 An: Dave McLoughlin Cc: fossology@fossology.org Betreff: Re: [FOSSology] License highlighting (Bob Gobeille) Oh - multiple pages. That is painful. Thanks for voting Dave. Bob Gobeille On Jul 28, 2011, at 4:43 PM, Dave McLoughlin wrote: I'll cast my vote, highlighting is very important to us. We spend a lot of time searching, scrolling and manually scanning contents to find a match when there's no highlighting. It's extremely painful when the contents are displayed across multiple pages. Dave On 7/28/11 12:00 PM, fossology-requ...@fossology.org fossology-requ...@fossology.org wrote: Send fossology mailing list submissions to fossology@fossology.org To subscribe or unsubscribe via the World Wide Web, visit http://fossology.org/mailman/listinfo/fossology or, via email, send a message with subject or body 'help' to fossology-requ...@fossology.org You can reach the person managing the list at fossology-ow...@fossology.org When replying, please edit your Subject line so it is more specific than Re: Contents of fossology digest... Today's Topics: 1. Re: License highlighting (Bob Gobeille) -- Message: 1 Date: Wed, 27 Jul 2011 12:05:56 -0600 From: Bob Gobeille bob.gobei...@hp.com Subject: Re: [FOSSology] License highlighting To: Mader Volker (AA-DGP/ESD2) volker.ma...@de.bosch.com Cc: fossology@fossology.org fossology@fossology.org Message-ID: a29aa748-dd4b-4fac-a615-81abef73e...@hp.com Content-Type: text/plain; charset=us-ascii Hello Volker, The database table license_file is where nomos records what license matched in what file. There are columns for where in the file the match occurred, but they are not currently populated by nomos. Would anyone else like to vote on how important highlighting the license match is? Bob Gobeille On Jul 27, 2011, at 2:49 AM, Mader Volker (AA-DGP/ESD2) wrote: yes, this is exactly what would be most useful. I think it is very important to understand why a license was detected esp. for deciding on false positives. Certainly the signature match is the interesting information there. My plan: In combination with the tags feature to mark false positives or give approval to special file/license pair... it can be used to manage the findings more properly. So of course from my perspective this is quite important compared to most other items ;-) Does nomos store the match information in the DB already? Could we just use some query in the php to display it? -- next part -- An HTML attachment was scrubbed... URL: http://fossology.org/pipermail/fossology/attachments/20110727/5d675e3f/attach ment.html -- ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology End of fossology Digest, Vol 45, Issue 13 * -- ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology
Re: [FOSSology] License highlighting (Bob Gobeille)
Hello FOSSologists! Thank you all for your votes. It's very important for us to hear from our users so we know how to prioritize the many features and requests we have on our to-do list. We REALY do listen and value your feedback. Keep it coming! The FOSSology Project http://fossology.org -Original Message- From: fossology-boun...@fossology.org [mailto:fossology- boun...@fossology.org] On Behalf Of Dabrowski, Ivo Sent: Friday, July 29, 2011 6:12 AM Subject: Re: [FOSSology] License highlighting (Bob Gobeille) Here's my vote, too. BSAM as used in older versions of FOSSology reveals matches (and derivations) easily. Ivo -Ursprüngliche Nachricht- Von: fossology-boun...@fossology.org [mailto:fossology- boun...@fossology.org] Im Auftrag von Bob Gobeille Gesendet: Freitag, 29. Juli 2011 01:27 Betreff: Re: [FOSSology] License highlighting (Bob Gobeille) Oh - multiple pages. That is painful. Thanks for voting Dave. Bob Gobeille On Jul 28, 2011, at 4:43 PM, Dave McLoughlin wrote: I'll cast my vote, highlighting is very important to us. We spend a lot of time searching, scrolling and manually scanning contents to find a match when there's no highlighting. It's extremely painful when the contents are displayed across multiple pages. Dave Date: Wed, 27 Jul 2011 12:05:56 -0600 From: Bob Gobeille bob.gobei...@hp.com Subject: Re: [FOSSology] License highlighting Hello Volker, The database table license_file is where nomos records what license matched in what file. There are columns for where in the file the match occurred, but they are not currently populated by nomos. Would anyone else like to vote on how important highlighting the license match is? Bob Gobeille ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology
Re: [FOSSology] License highlighting (Bob Gobeille)
Thank you very much for taking care on the community requests! Efficient license analysis is key for making open source software business friendly and bring down overall cost for using it. Another great thing is the evolution of the license templates and identifiers inspired by the SPDX specification. Good Job! -Roger -Ursprüngliche Nachricht- Von: fossology-boun...@fossology.org [mailto:fossology- boun...@fossology.org] Im Auftrag von Laser, Mary Gesendet: Freitag, 29. Juli 2011 17:23 An: Dabrowski, Ivo; Gobeille, Robert; Dave McLoughlin Cc: fossology@fossology.org Betreff: Re: [FOSSology] License highlighting (Bob Gobeille) Hello FOSSologists! Thank you all for your votes. It's very important for us to hear from our users so we know how to prioritize the many features and requests we have on our to-do list. We REALY do listen and value your feedback. Keep it coming! The FOSSology Project http://fossology.org -Original Message- From: fossology-boun...@fossology.org [mailto:fossology- boun...@fossology.org] On Behalf Of Dabrowski, Ivo Sent: Friday, July 29, 2011 6:12 AM Subject: Re: [FOSSology] License highlighting (Bob Gobeille) Here's my vote, too. BSAM as used in older versions of FOSSology reveals matches (and derivations) easily. Ivo -Ursprüngliche Nachricht- Von: fossology-boun...@fossology.org [mailto:fossology- boun...@fossology.org] Im Auftrag von Bob Gobeille Gesendet: Freitag, 29. Juli 2011 01:27 Betreff: Re: [FOSSology] License highlighting (Bob Gobeille) Oh - multiple pages. That is painful. Thanks for voting Dave. Bob Gobeille On Jul 28, 2011, at 4:43 PM, Dave McLoughlin wrote: I'll cast my vote, highlighting is very important to us. We spend a lot of time searching, scrolling and manually scanning contents to find a match when there's no highlighting. It's extremely painful when the contents are displayed across multiple pages. Dave Date: Wed, 27 Jul 2011 12:05:56 -0600 From: Bob Gobeille bob.gobei...@hp.com Subject: Re: [FOSSology] License highlighting Hello Volker, The database table license_file is where nomos records what license matched in what file. There are columns for where in the file the match occurred, but they are not currently populated by nomos. Would anyone else like to vote on how important highlighting the license match is? Bob Gobeille ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology
Re: [FOSSology] License highlighting
Bob, yes, this is exactly what would be most useful. I think it is very important to understand why a license was detected esp. for deciding on false positives. Certainly the signature match is the interesting information there. My plan: In combination with the tags feature to mark false positives or give approval to special file/license pair... it can be used to manage the findings more properly. So of course from my perspective this is quite important compared to most other items ;-) Does nomos store the match information in the DB already? Could we just use some query in the php to display it? Mit freundlichen Grüßen / Best regards Volker Mader Robert Bosch GmbH (AA-DGP/ESD2) www.bosch.comhttp://www.bosch.com/ Tel. +49 7153/666-182 volker.ma...@de.bosch.commailto:volker.ma...@de.bosch.com Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung: Franz Fehrenbach, Siegfried Dais; Stefan Asenkerschbaumer, Bernd Bohr, Rudolf Colm, Volkmar Denner, Wolfgang Malchow, Peter Marks, Uwe Raschke, Wolf-Henning Scheider, Peter Tyroller [cid:322564408@27072011-1B47] Von: Bob Gobeille [mailto:bob.gobei...@hp.com] Gesendet: Dienstag, 26. Juli 2011 17:01 An: Mader Volker (AA-DGP/ESD2) Cc: fossology@fossology.org Betreff: Re: [FOSSology] License highlighting Yes, this was one good feature we lost in the switch to nomos. Rather than doing pattern matching on entire licenses, nomos just looks for signatures (regular expressions in context). So it isn't possible to highlight the whole license (nomos doesn't know it). Personally, I would like nomos to at least highlight where it found the signature match. That would at lead your eyes to where the license was found, just not highlight the whole license. What do you think? How important is this to you compared to the other items in http://fossology.org/task_list#everything_else ? Thanks, Bob Gobeille On Jul 26, 2011, at 8:27 AM, Mader Volker (AA-DGP/ESD2) wrote: Hi, I am using Fossology 1.4.0 and I am missing the highlighting in the View License view (As far as I remember it was possible with earlier versions). Is it due to the switch to nomos? Is it possible to reactivate this feature? Would be really interesting to find out on which strings the license was found. Volker ATT1..txt inline: atta374.gif___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology
Re: [FOSSology] License highlighting
Hello Volker, The database table license_file is where nomos records what license matched in what file. There are columns for where in the file the match occurred, but they are not currently populated by nomos. Would anyone else like to vote on how important highlighting the license match is? Bob Gobeille On Jul 27, 2011, at 2:49 AM, Mader Volker (AA-DGP/ESD2) wrote: yes, this is exactly what would be most useful. I think it is very important to understand why a license was detected esp. for deciding on false positives. Certainly the signature match is the interesting information there. My plan: In combination with the tags feature to mark false positives or give approval to special file/license pair... it can be used to manage the findings more properly. So of course from my perspective this is quite important compared to most other items ;-) Does nomos store the match information in the DB already? Could we just use some query in the php to display it? ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology
[FOSSology] License highlighting
Hi, I am using Fossology 1.4.0 and I am missing the highlighting in the View License view (As far as I remember it was possible with earlier versions). Is it due to the switch to nomos? Is it possible to reactivate this feature? Would be really interesting to find out on which strings the license was found. Volker ___ fossology mailing list fossology@fossology.org http://fossology.org/mailman/listinfo/fossology