Re: legal questions regarding machine learning models
Hello, Francesco! You wrote to debian-legal@lists.debian.org on Fri, 29 May 2009 00:29:18 +0200: In the US and some other places, bitmap fonts can't be copyrighted. You can make a free bitmap font by rendering a non-free font at a particular size. Interesting: could you point me at the specific article that states this rule in http://www.copyright.gov/title17/ ? The issue was raised on this list before. I tried to describe my understanding of it in http://lists.debian.org/debian-legal/2003/12/msg3.html . But it was long ago and I didn't revisit it since then. Alexander Cherepanov -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
On Fri, 05 Jun 2009 23:56:29 +0400 Alexander Cherepanov wrote: Hello, Francesco! You wrote to debian-legal@lists.debian.org on Fri, 29 May 2009 00:29:18 +0200: In the US and some other places, bitmap fonts can't be copyrighted. You can make a free bitmap font by rendering a non-free font at a particular size. Interesting: could you point me at the specific article that states this rule in http://www.copyright.gov/title17/ ? The issue was raised on this list before. I tried to describe my understanding of it in http://lists.debian.org/debian-legal/2003/12/msg3.html . But it was long ago and I didn't revisit it since then. Very interesting reading. What I wonder now is: why I see many copyright holders and copyright licenses in, say, /usr/share/doc/xfonts-base/copyright ? -- New location for my website! Update your bookmarks! http://www.inventati.org/frx . Francesco Poli . GnuPG key fpr == C979 F34B 27CE 5CD8 DC12 31B5 78F4 279B DD6D FCF4 pgpP8wXFOB8HD.pgp Description: PGP signature
Re: legal questions regarding machine learning models
Hello, Francesco! You wrote to debian-legal@lists.debian.org on Fri, 5 Jun 2009 22:49:28 +0200: In the US and some other places, bitmap fonts can't be copyrighted. You can make a free bitmap font by rendering a non-free font at a particular size. Interesting: could you point me at the specific article that states this rule in http://www.copyright.gov/title17/ ? The issue was raised on this list before. I tried to describe my understanding of it in http://lists.debian.org/debian-legal/2003/12/msg3.html . But it was long ago and I didn't revisit it since then. Very interesting reading. What I wonder now is: why I see many copyright holders and copyright licenses in, say, /usr/share/doc/xfonts-base/copyright ? Because it's tradition?:-) And after you are done with bitmap fonts you can look at autotraced fonts (e.g. cm-super). Both questions of source and of copyright/license for them are interesting. BTW here is an idea how to get many free scalable fonts: rasterize all interesting non-free fonts at high resolution and autotrace them. Quality will be lost to some degree but with luck you will get good enough fonts. There are many things to wonder (apart from legal system:-)... Alexander Cherepanov -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
Francesco Poli f...@firenze.linux.it wrote: On Thu, 28 May 2009 14:11:29 -0700 (PDT) Ken Arromdee wrote: In the US and some other places, bitmap fonts can't be copyrighted. You can make a free bitmap font by rendering a non-free font at a particular size. Interesting: could you point me at the specific article that states this rule in http://www.copyright.gov/title17/ ? Like the UK, US law can be created by case law deciding any grey areas and not only rules stated in legislation. That may have happened here and then it wouldn't appear in that document. I don't know. Anyway, even assuming that those bitmap fonts are DFSG-free in the US and some other places, what about other jurisdictions? I think that the Berne Convention Article 7 part (8) http://www.law.cornell.edu/treaties/berne/7.html exports the US zero protection duration in this case. We can't rely on US Fair Use because Article 10 (2) allows national law to vary it in each country. IANAL and I could be wrong about this, so would welcome correction. -- MJR/slef My Opinion Only: see http://people.debian.org/~mjr/ Please follow http://www.uk.debian.org/MailingLists/#codeofconduct -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
On Sun, 31 May 2009 16:52:23 -0700 Steve Langasek wrote: On Wed, May 27, 2009 at 11:42:46PM +0200, Francesco Poli wrote: On Wed, 27 May 2009 11:37:56 +0200 Steve Langasek wrote: [...] Better yet: he should recognize that the reason he needs to add all these acronyms is because his posts are an inappropriate use of this mailing list and not productive, and stop posting. You're not new to such impolite replies, and I don't think your reputation benefits from them. I think it has no negative impact on my reputation with anyone whose opinion I value. Which most explicitly does not include you. You use this mailing list as your own personal soap box for advancing positions that have been *rejected* by Debian. No, I participate in the discussions of this mailing list because I care about Free Software and the Debian Project. When an opinion is asked to debian-legal participants about something, I feel to be allowed to provide my *own* opinion, while explicitly saying that it's not necessarily *identical* to the (current) official Debian position. When I disagree with a decision by the Debian Project or by the FTP masters, I think I am allowed to express my disagreement. That is not acceptable. Says who? The description of this mailing states: Discussions about legality issues such as copyrights, patents etc. This list is not moderated; posting is allowed by anyone. The purpose of this list is to help Debian developers and upstreams understand Debian's policy for the main archive, and as a forum for Debian as a whole to work on refining that policy. Exactly: how can that policy be refined, if absolutely *no* disagreement with current practice is allowed? You don't appear to contribute anything to Debian except the crap you spew on this mailing list. Please search better. I report bugs, I send patches from time to time, I've recently become co-maintainer of a package, ... It's not much, I admit. But the attitude of people like you has been constantly discouraging me from getting more involved in the Project. You are therefore *not* part of Debian. IANADD disclaimers do *not* excuse you abusing this list in order to shove your opinions down others' throats, when you know damn well that the project does not agree with you. So shut up already. Again this snob attitude: you are not part of Debian, so shut up. How open minded... Anyway, if disagreeing with FTP masters and expressing one's own opinion (while *explicitly* clarifying that what is expressed is just one's own opinion, and not necessarily the official Debian position) is an inappropriate use of this mailing list, then I suggest that the list is shut down as soon as possible and that debian-le...@l.d.o is turned into a forwarder to ftpmas...@d.o ... I would be much happier with having debian-legal shut down than with continuing a status quo that permits opinionated hangers-on like you to repeatedly twist the discussion to suit your personal agenda. I am being accused of inappropriate use of this mailing list and of twisting the discussion by a person whose only contribution to the present thread consists of two rude ad hominem attacks. Oh, the irony... -- New location for my website! Update your bookmarks! http://www.inventati.org/frx . Francesco Poli . GnuPG key fpr == C979 F34B 27CE 5CD8 DC12 31B5 78F4 279B DD6D FCF4 pgpHHes8e4RS5.pgp Description: PGP signature
Re: legal questions regarding machine learning models
On Wed, May 27, 2009 at 11:42:46PM +0200, Francesco Poli wrote: On Wed, 27 May 2009 11:37:56 +0200 Steve Langasek wrote: On Wed, May 27, 2009 at 10:33:52AM +0200, Josselin Mouette wrote: Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA). If you really feel the urge to add meaningless acronyms to all your emails, please do so in your signature. Better yet: he should recognize that the reason he needs to add all these acronyms is because his posts are an inappropriate use of this mailing list and not productive, and stop posting. You're not new to such impolite replies, and I don't think your reputation benefits from them. I think it has no negative impact on my reputation with anyone whose opinion I value. Which most explicitly does not include you. You use this mailing list as your own personal soap box for advancing positions that have been *rejected* by Debian. That is not acceptable. The purpose of this list is to help Debian developers and upstreams understand Debian's policy for the main archive, and as a forum for Debian as a whole to work on refining that policy. You don't appear to contribute anything to Debian except the crap you spew on this mailing list. You are therefore *not* part of Debian. IANADD disclaimers do *not* excuse you abusing this list in order to shove your opinions down others' throats, when you know damn well that the project does not agree with you. So shut up already. Anyway, if disagreeing with FTP masters and expressing one's own opinion (while *explicitly* clarifying that what is expressed is just one's own opinion, and not necessarily the official Debian position) is an inappropriate use of this mailing list, then I suggest that the list is shut down as soon as possible and that debian-le...@l.d.o is turned into a forwarder to ftpmas...@d.o ... I would be much happier with having debian-legal shut down than with continuing a status quo that permits opinionated hangers-on like you to repeatedly twist the discussion to suit your personal agenda. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developerhttp://www.debian.org/ slanga...@ubuntu.com vor...@debian.org -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
* xfonts-* (bitmap renderings of non-free vector fonts) Are you saying that xfonts-* are derived from non-free fonts? How can they be DFSG-free, then? In the US and some other places, bitmap fonts can't be copyrighted. You can make a free bitmap font by rendering a non-free font at a particular size. Interesting: could you point me at the specific article that states this rule in http://www.copyright.gov/title17/ ? Anyway, even assuming that those bitmap fonts are DFSG-free in the US and some other places, what about other jurisdictions? It has been often said that the Debian Project cannot (and should not) rely on the parts of copyright law which vary wildly across jurisdictions (e.g.: fair use/fair dealing and other national counterparts) in order to declare something DFSG-free. Why is a different standard being applied here? Am I missing something? I would think that even if in all jurisdictions the font is non-copyrightable, that still would not imply DFSG-freeness, only that it is fit for non-free. Best regards, Mark Weyer -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
2009/5/29 Mark Weyer we...@informatik.hu-berlin.de: Am I missing something? I would think that even if in all jurisdictions the font is non-copyrightable, that still would not imply DFSG-freeness, only that it is fit for non-free. Best regards, Mark Weyer That's what I thought as well cause source is not available in preffered form of modification. But imagine if noone in debian knew that this raster font was generated from something else, then it would DFSG-free. So just expanding on that. DFSG source requirement is concluded by judging each time what is source. And this is biased sometimes as we see in this example. The model (I presume in somekind of human or machine parsable format) if distributed under free license does allow to view all parameters and tweak them. (It will be wrong from scientific point of view but just for fun what if twist this nob without actually parsing any data for many days in a row) I would argue that is still source. Now imagine we have many of these models and someone writes a hyper-model simulator which takes all of these models and make a new one based on some cunning statistical processing... what will be source then? all input data to all models with all the generation parameters and environments? (i bet some of them use urandam as well) or will the models become source for the hyper-model? -- With best regards Dmitrijs Ledkovs (for short Dima), Ледков Дмитрий Юрьевич -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
Dmitrijs Ledkovs dmitrij.led...@gmail.com writes: That's what I thought as well cause source is not available in preffered form of modification. I don't understand this. The definition that has been used in this thread is that the preferred form of the work for modifying that work *is* the source form. But imagine if noone in debian knew that this raster font was generated from something else, then it would DFSG-free. No. If it violates DFSG, but that fact is not yet known, that does not make it DFSG-free. If people act on an assumption (as we must frequently do), that does not alter the truth. So just expanding on that. DFSG source requirement is concluded by judging each time what is source. And this is biased sometimes as we see in this example. It is indeed open to interpretation, and that interpretation is necessary in order to make a judgement of whether a work should be considered to meet the DFSG. The model (I presume in somekind of human or machine parsable format) if distributed under free license does allow to view all parameters and tweak them. Just as distributing a program as a binary blob “allows” the recipient to alter any bytes they like. That doesn't mean such a distribution is sufficient to be free under the DFSG. Now imagine we have […] what will be source then? If your intent is to demonstrate that taking something to extremes leads to absurdities, you have a very easy task that has been done many times before. That doesn't help in making judgements about *actual* works and the *actual* freedoms recipients have in them. -- \ “Know what I hate most? Rhetorical questions.” —Henry N. Camp | `\ | _o__) | Ben Finney -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
On Wed, 27 May 2009, Francesco Poli wrote: I instead think that FTP masters should change their minds about 2D images rendered from 3D models. I suggest you start your own distribution, in which you wonât ship: * xfonts-* (bitmap renderings of non-free vector fonts) Are you saying that xfonts-* are derived from non-free fonts? How can they be DFSG-free, then? In the US and some other places, bitmap fonts can't be copyrighted. You can make a free bitmap font by rendering a non-free font at a particular size. -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
On Thu, 28 May 2009 14:11:29 -0700 (PDT) Ken Arromdee wrote: On Wed, 27 May 2009, Francesco Poli wrote: I instead think that FTP masters should change their minds about 2D images rendered from 3D models. I suggest you start your own distribution, in which you wonât ship: * xfonts-* (bitmap renderings of non-free vector fonts) Are you saying that xfonts-* are derived from non-free fonts? How can they be DFSG-free, then? In the US and some other places, bitmap fonts can't be copyrighted. You can make a free bitmap font by rendering a non-free font at a particular size. Interesting: could you point me at the specific article that states this rule in http://www.copyright.gov/title17/ ? Anyway, even assuming that those bitmap fonts are DFSG-free in the US and some other places, what about other jurisdictions? It has been often said that the Debian Project cannot (and should not) rely on the parts of copyright law which vary wildly across jurisdictions (e.g.: fair use/fair dealing and other national counterparts) in order to declare something DFSG-free. Why is a different standard being applied here? -- New location for my website! Update your bookmarks! http://www.inventati.org/frx . Francesco Poli . GnuPG key fpr == C979 F34B 27CE 5CD8 DC12 31B5 78F4 279B DD6D FCF4 pgpF7gd4pV0qw.pgp Description: PGP signature
Re: legal questions regarding machine learning models
This looks very similar to distributing a picture which is a 2D rendering of a 3D model without distributing the original model. This is already accepted in the archive, and the reason is that a 2D picture is its own source, and can serve as a base for modified versions this way. I disagree with this decision by the FTP masters. I personally think that, in most cases, the 2D rendering is not the actual source, since many modifications would be best made by changing the 3D model and re-rendering the 2D image. I agree with you. In particular, in many cases a single 3D model is used to create many 2D images. If you don't have the model, you need to do the modification many times. And then there is the case of increasing the resolution... Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA). Same here. Best regards, Mark Weyer -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
I mentioned Voxforge in my previous email. Their goal is to use their free spech data to train models with HTK and use the models with Julius. You can get the source code of HTK after registration on their website but the license has severe restrictions so HTK is not free software. Julius is a free software speech recognition engine that can use models trained with HTK. Note that HTK is pretty much THE speech recognition framework in the speech recognition community. If you consider that the ultimate source of a model is not only the data but also the software used to train it, then Voxforge models built with HTK can't be free, even though the data were free. Is it forbidden for someone to release an image made with Photoshop as free? As I understand it, this depends on what you mean by free. It is quite possible to distribute these models under a free license, even under one which requires distribution of source. The source code would then be the Voxforge data plus the parameters given to HTK. It would not include the source code of HTK, as HTK acts in this process like a compiler. However, a corresponding Debian package would be in contrib at best (and that only, if HTK can be shipped in non-free), because the package would have a build-dependency on HTK. I guess, in the long run your community needs a free replacement of HTK. Again, this is only how I understand things. Best regards, Mark Weyer -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
Le mercredi 27 mai 2009 à 00:36 +0200, Francesco Poli a écrit : Of course, the decision is up to the FTP masters, but I think this should be accepted for the sake of consistency with things we already cannot decently exclude from the archive. I instead think that FTP masters should change their minds about 2D images rendered from 3D models. I suggest you start your own distribution, in which you won’t ship: * xfonts-* (bitmap renderings of non-free vector fonts) * all icons shipped without SVG source * all pictures shipped without XCF/PSD source (oh yeah, that makes a lot) * actually, all pictures that are initially photographs of an object (the preferred form of modification is the original object; if you want to see it at another angle, you need to take another photograph) * all sound files shipped without the full genetic code of the speaker You could call it something like gNewSense, and you could discuss during hours with RMS how much better it is this way. Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA). If you really feel the urge to add meaningless acronyms to all your emails, please do so in your signature. -- .''`. Josselin Mouette : :' : `. `' “I recommend you to learn English in hope that you in `- future understand things” -- Jörg Schilling signature.asc Description: Ceci est une partie de message numériquement signée
Re: legal questions regarding machine learning models
2009/5/27 Mark Weyer we...@informatik.hu-berlin.de: This looks very similar to distributing a picture which is a 2D rendering of a 3D model without distributing the original model. This is already accepted in the archive, and the reason is that a 2D picture is its own source, and can serve as a base for modified versions this way. I disagree with this decision by the FTP masters. I personally think that, in most cases, the 2D rendering is not the actual source, since many modifications would be best made by changing the 3D model and re-rendering the 2D image. I agree with you. In particular, in many cases a single 3D model is used to create many 2D images. If you don't have the model, you need to do the modification many times. And then there is the case of increasing the resolution... I don't know if it would be technically possible to go to that extremes. Having the source code of all the music and video intros for all the games, of all the sounds, could be probably 100 times bigger than the current archives. Well, you get the idea. I don't think it's a single package what we're talking about. I remember there was a thread some time ago on what would happen if we took the having a whole free source and toolchain when applied to music, and how it would be absolutely impossible to achieve, at least right now. Any idea on what to do in those situations? Greetings, Miry PS:I'm CC'ing to the Debian Games Team mailing list. -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
I agree with you. In particular, in many cases a single 3D model is used to create many 2D images. If you don't have the model, you need to do the modification many times. And then there is the case of increasing the resolution... I don't know if it would be technically possible to go to that extremes. Having the source code of all the music and video intros for all the games, of all the sounds, could be probably 100 times bigger than the current archives. Well, you get the idea. I don't think it's a single package what we're talking about. I remember there was a thread some time ago on what would happen if we took the having a whole free source and toolchain when applied to music, and how it would be absolutely impossible to achieve, at least right now. Any idea on what to do in those situations? That's a mixture of questions. I'll add my 2e-2 Euro to each separately. Archive size: The case that I had in mind is that the data is purely synthetic. In those cases the source form is negligibly small when compared to the binary form. Especially in the cases you mention: Game intros rendered from some 3D scene. Game music created from some music score. Sounds which are programmed. I assume that you have non-synthetic data in mind: Music which is actually recorded, videos which are shot with real actors, sounds recorded from the real world. And that what is shipped is a severely compressed form of the original. In that case I guess one can argue that the source requirement is void: I always understand source to be preferred form for modifications among the digital forms of the software. The kind of modifications I see for e.g. music (replace the violin player by someone who actually can play the instrument; correct a discord which is due to a typo in the score) is impossible to achieve without rerecording, so a big digital version of the music is just as useless as a small one. Building time: Coming back to purely synthetic data. building time can be a real pain. Waiting 24 hours (on fast machines) for a build is fine for me as upstream, but not something I would want to cause to your buildd when my software is just one out of thousands of packages. There, I do see a practical problem. With my upstream hat on, I will continue to ship my data under licenses that do require source, but I will not care whether you redo the building or whether you just copy the precompiled data which I give you. Provided of course, that you also ship the source. Extremes: I do not agree with this classification of my view. I value a free game for the fact, that I can fool around with the source to make it better. Adding features, levels, characters. If this means that I have to add long ears to some sprite (which is obviously generated from some 3D model), then I want to have access to that model and to the toolchain used to turn the model into the sprite. Because that is much more simple and robust, and creates a much more consistent set of sprite animation parts, than doing it with gimp on each part of each animation sequence individually. Free data is important for the very same reason that free programs are! What to do: As always it is a tradeoff between quantity and quality, in this case of packages. Maintaining a high freeness standard has an impact on the resources needed, so it limits the number of costly packages that you can support for any given amount of available resources. I value Debian because (and as long as) it puts the emphasis on freeness. PS:I'm CC'ing to the Debian Games Team mailing list. Done as well, but I am not subscribed to that list. -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
On Wed, May 27, 2009 at 10:33:52AM +0200, Josselin Mouette wrote: Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA). If you really feel the urge to add meaningless acronyms to all your emails, please do so in your signature. Better yet: he should recognize that the reason he needs to add all these acronyms is because his posts are an inappropriate use of this mailing list and not productive, and stop posting. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developerhttp://www.debian.org/ slanga...@ubuntu.com vor...@debian.org -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
I know I should not reply to polemic posts because it is just one step short of troll-feeding, but anyway: I suggest you start your own distribution, in which you won’t ship: * xfonts-* (bitmap renderings of non-free vector fonts) I agree that these do not belong in a free distribution. There should be plenty of free alternatives, ness pah? * all icons shipped without SVG source * all pictures shipped without XCF/PSD source (oh yeah, that makes a lot) I would handle these on a case-by-case basis. For a 64x64 icon which has no connection to other icons (apart from what can easily be done by copy and paste), I would say the icon itself is just as good as its source. For SVG: Yes, the ability to scale the icon to a new resolution is very important. I assume that your next move will be something like But then, we cannot ship GNOME or KDE!. I have seen such arguments before (don't know if it was from you, though). This is just blackmail. In the same way you could argue for the inclusion of insert shiny propietary software that only runs on windows. And, personally, I do not care whether GNOME or KDE are in Debian. * actually, all pictures that are initially photographs of an object (the preferred form of modification is the original object; if you want to see it at another angle, you need to take another photograph) * all sound files shipped without the full genetic code of the speaker You are being ridiculous on purpose. Source, as I understand it, is always something digital. You could call it something like gNewSense, and you could discuss during hours with RMS how much better it is this way. Just because GNU and RMS have similar views, that does not immediately make the view invalid. This has to be judged on a case-by-case basis. Best regards, Mark Weyer -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
On Wed, 27 May 2009 11:25:09 +0900 Mathieu Blondel wrote: On Wed, May 27, 2009 at 7:36 AM, Francesco Poli wrote: I think that in the case of machine learning models, source form is even more clearly distinct from compiled object. We can consider an artificial neural network, for instance (Mathieu, correct me if it's a wrong example). I am under the impression that basically nobody would change connection weights by hand, in order to modify a neural network. Yes the connection weights of an artificial neural network are a good example of the parameters I was talking about. In practice, nobody would change a connection weight by hand because it's impossible to predict the effect of this particular weight on the overall performance of the model. Training algorithms are mostly clever ways to find a good model without trying the infinity of parameter combinations. Good, this confirms my supposition. So in practice yes, a model would be barely useful for further work on the model without the original data. In that regard, the original data AND the program used to train the model (this includes the implementations and the options passed to the algorithm) can be seen as the only real source. The program used to train the model is not necessarily part of the source, IMHO. The GNU GPL v3 states (in Section 1): | However, it [the Corresponding Source for a work] does not include | the work's System Libraries, or general-purpose tools or generally | available free programs which are used unmodified in performing | those activities [generate, install, and run the object, and modify | the work] but which are not part of the work. But yet again, I could pretend that I just happened to find the model parameters by hand. Free Software is not about pretending you are a sort of oracle who can guess magic numbers! Otherwise, any source availability requirement would be moot: I could always pretend I wrote the machine code by hand, but that won't be true, in most cases. Afterall, a model is just a big set of numbers. Machine code is just a long sequence of 0s and 1s... [...] However, this is not good on the long term since that makes the model dependent on the person who holds the data. Definitely. [...] Is it forbidden for someone to release an image made with Photoshop as free? You *can* create a DFSG-free image with Adobe Photoshop. If the source form may be read and modified with DFSG-free tools (e.g.: The Gimp), then everything is OK and the image may be included in Debian main. If, on the other hand, the source form of the image may *only* be manipulated with Photoshop and other non-free tools, then I think that the image may still be DFSG-free, but belongs in the Debian contrib archive, at best. At least, this is how I understand it. Regarding Debian packaging, I think it's a wise decision to rebuild the model whenever the data and the training program are free, the data is not too large and the computation not too long. Should objective criterion of what is too large and what is too large be decided or should that be left to the DD? Then a remaining question is what to do with models for which we don't have the original data or the original training program? My personal take on the matter is that, in order for a package to be included in Debian main: * the package must comply with the DFSG * source must be distributed in the source package * tools needed to generate (or to use) the object must be DFSG-free and included in Debian main This is how I interpret Policy 2.2.1: http://www.debian.org/doc/debian-policy/ch-archive.html#s-main However, it is my understanding that, in some cases (e.g. long rebuilding times), it is acceptable to also ship pre-built (architecture-independent) objects in the source package, *along with* the corresponding source. One should however be extremely careful in doing this, since it makes it harder to check and be sure that Policy 2.2.1 requirements are satisfied. I hope I clarified my opinions. As stated before, I should stress again that what I expressed above are my own opinions. Usual disclaimers: IANAL, TINLA, IANADD, TINASOTODP. -- New location for my website! Update your bookmarks! http://www.inventati.org/frx . Francesco Poli . GnuPG key fpr == C979 F34B 27CE 5CD8 DC12 31B5 78F4 279B DD6D FCF4 pgpRedXwYAn6s.pgp Description: PGP signature
Re: legal questions regarding machine learning models
On Wed, 27 May 2009 11:36:55 +0200 Mark Weyer wrote: [...] Extremes: I do not agree with this classification of my view. I value a free game for the fact, that I can fool around with the source to make it better. Adding features, levels, characters. If this means that I have to add long ears to some sprite (which is obviously generated from some 3D model), then I want to have access to that model and to the toolchain used to turn the model into the sprite. Because that is much more simple and robust, and creates a much more consistent set of sprite animation parts, than doing it with gimp on each part of each animation sequence individually. Free data is important for the very same reason that free programs are! Exactly so. I agree that this is the key aspect to take into account when talking about this issue. Unfortunately some people seem to think that getting more games (or images, or music, or ...) is worth sacrificing the important freedoms... :-( What to do: As always it is a tradeoff between quantity and quality, in this case of packages. Maintaining a high freeness standard has an impact on the resources needed, so it limits the number of costly packages that you can support for any given amount of available resources. I value Debian because (and as long as) it puts the emphasis on freeness. 100 % agreement here. I also think that Debian *should* value Freeness standards over the mere quantity of packages in main. PS:I'm CC'ing to the Debian Games Team mailing list. Done as well, but I am not subscribed to that list. Same here: I am subscribed to debian-legal, but not to debian-devel-games. -- New location for my website! Update your bookmarks! http://www.inventati.org/frx . Francesco Poli . GnuPG key fpr == C979 F34B 27CE 5CD8 DC12 31B5 78F4 279B DD6D FCF4 pgp6qLCiuxHb9.pgp Description: PGP signature
Re: legal questions regarding machine learning models
On Wed, 27 May 2009 10:33:52 +0200 Josselin Mouette wrote: Le mercredi 27 mai 2009 à 00:36 +0200, Francesco Poli a écrit : [...] I instead think that FTP masters should change their minds about 2D images rendered from 3D models. I suggest you start your own distribution, in which you won’t ship: * xfonts-* (bitmap renderings of non-free vector fonts) Are you saying that xfonts-* are derived from non-free fonts? How can they be DFSG-free, then? * all icons shipped without SVG source When an icon is actually created in SVG format, what's so strange about insisting that its real source (i.e.: SVG) is shipped in the Debian (main) source package? * all pictures shipped without XCF/PSD source (oh yeah, that makes a lot) Again, for pictures that are created in XCF format, the preferred form for making modifications is the .xcf file, in most cases. Why are you insisting that source-less works should be accepted in Debian main? * actually, all pictures that are initially photographs of an object (the preferred form of modification is the original object; if you want to see it at another angle, you need to take another photograph) For photographs, the physical object is *not* the preferred form for making modifications to the work, it's the preferred form for *recreating* the work from scratch. I think we have already had this discussion. See http://lists.debian.org/debian-legal/2008/12/msg00085.html You may argue that the same reasoning applies to 3D models, but I think the key difference stays in the word preferred. Since you cannot transfer physical objects through a network, or copy modify them, and so forth, they are not preferred for making modifications to photographs. 3D models are instead digital information that may well be the preferred form for making modifications to a work. Of course, in some cases, the huge size of a 3D model could well move the preference to some other form. As I said, it's always a case-by-case decision, but not one that should be taken lightly, IMHO. * all sound files shipped without the full genetic code of the speaker As for photographs, I don't think that this is the actual source. You could call it something like gNewSense, and you could discuss during hours with RMS how much better it is this way. Naah, I disagree with RMS on a number of matters, so I don't think that my own distro would be more similar to gNewSense, than to Debian... Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA). If you really feel the urge to add meaningless acronyms to all your emails, please do so in your signature. Not all my messages require the same set of disclaimers, if at all. -- New location for my website! Update your bookmarks! http://www.inventati.org/frx . Francesco Poli . GnuPG key fpr == C979 F34B 27CE 5CD8 DC12 31B5 78F4 279B DD6D FCF4 pgpcFXkmmIfEC.pgp Description: PGP signature
Re: legal questions regarding machine learning models
On Wed, 27 May 2009 11:37:56 +0200 Steve Langasek wrote: On Wed, May 27, 2009 at 10:33:52AM +0200, Josselin Mouette wrote: Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA). If you really feel the urge to add meaningless acronyms to all your emails, please do so in your signature. Better yet: he should recognize that the reason he needs to add all these acronyms is because his posts are an inappropriate use of this mailing list and not productive, and stop posting. You're not new to such impolite replies, and I don't think your reputation benefits from them. Anyway, if disagreeing with FTP masters and expressing one's own opinion (while *explicitly* clarifying that what is expressed is just one's own opinion, and not necessarily the official Debian position) is an inappropriate use of this mailing list, then I suggest that the list is shut down as soon as possible and that debian-le...@l.d.o is turned into a forwarder to ftpmas...@d.o ... That way you have the guarantee that *no* reply from debian-le...@l.d.o can possibly include heretic and sacrilegious opinions that dare to disagree with the FTP masters! I am not sure that the FTP masters would be overly happy to have to deal with all the questions that are directed to debian-le...@l.d.o, but one does not have to care about little details like these... I used to think that the Debian Project cared about Free Software and maybe even about free speech, but something apparently went wrong... :-( -- New location for my website! Update your bookmarks! http://www.inventati.org/frx . Francesco Poli . GnuPG key fpr == C979 F34B 27CE 5CD8 DC12 31B5 78F4 279B DD6D FCF4 pgpLvxuASEGTr.pgp Description: PGP signature
Re: legal questions regarding machine learning models
On Thu, May 28, 2009 at 5:51 AM, Francesco Poli f...@firenze.linux.it wrote: Afterall, a model is just a big set of numbers. Machine code is just a long sequence of 0s and 1s... I knew someone would come up with this :-) Let me summarize and please correct me if I'm wrong. * The model alone can be distributed under a free license. - As a consequence of this, neither the original data nor the program to build the model need to be free. * The DFSG is more restrictive and requires the source of any software in Debian. - If you consider that the model is the source like it was accepted for a picture which is a 2D rendering of a 3D model, then you can package the model directly. - Otherwise, it is necessary that the data are included in the source package and the tools to build the model are in Debian main. - To cope with models which take too long to compute, it should be possible to ship a pre-built architecture-independent model together with the data. However this doesn't solve the problem that the data may be too large to be hosted in the archive. - If data size becomes a problem, then one could resort to use the non-free archive in order to ship the model only. Thank you, Mathieu Blondel -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
Mathieu Blondel math...@mblondel.org writes: * The model alone can be distributed under a free license. - As a consequence of this, neither the original data nor the program to build the model need to be free. Going by the FSF definition of a free work, specifically freedom 1 and 3 URL:http://www.gnu.org/philosophy/free-sw.html, a necessary precondition for a work to be free is for its recipients to have free access to the source form of the work. What does “the source form of the work” mean for these models? Whatever the answer to that is, describes something that needs to be freely available to every recipient, in order to consider the work free. * The DFSG is more restrictive and requires the source of any software in Debian. The DFSG has different restrictions from the FSF definition, true. I don't think it differs on this point though: free access to the source form of the work is part of the definition of free software. -- \ “I got some new underwear the other day. Well, new to me.” —Emo | `\ Philips | _o__) | Ben Finney -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
Mathieu Blondel writes: My first question is : is it possible to distribute the model under a free software license without distributing the original data that were used to train the model? Likewise, is it possible to package directly a model in Debian? The answer to your first question is easy: Yes. Many free software licenses do not require the distribution of source code for any generated data that is distributed. Packaging it for Debian is more complicated, because the DFSG *does* require the distribution of the source form for any software that is part of Debian. If the input data used to generate the models is not large, it can be included in a source package. On the other hand, if the input data is one of those multi-gigabyte data sets that you mention, the easiest solution might be to package just the model, and put it in the non-free archive. Depending on whether a small data set can be used to generate a default model, having a large-input-data model in non-free may imply that the executable software belongs in contrib rather than in main. Michael Poole (Neither a lawyer nor a DD.) -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
Michael Poole mdpo...@troilus.org wrote: Mathieu Blondel writes: My first question is : is it possible to distribute the model under a free software license without distributing the original data that were used to train the model? Likewise, is it possible to package directly a model in Debian? The answer to your first question is easy: Yes. Many free software licenses do not require the distribution of source code for any generated data that is distributed. Packaging it for Debian is more complicated, because the DFSG *does* require the distribution of the source form for any software that is part of Debian. As I understand it, Debian does not have to distribute the source code, but must be able to distribute it if it wishes. So the model could still go into main and the ftpmasters could decide whether they want to host the data. Regards, Walter Landry wlan...@caltech.edu -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
Le mercredi 27 mai 2009 à 01:17 +0900, Mathieu Blondel a écrit : For efficient storage, the model may be stored in binary format but human-readable formats (such as XML) may be used, thus allowing easy access to the parameters of the models. My first question is : is it possible to distribute the model under a free software license without distributing the original data that were used to train the model? Likewise, is it possible to package directly a model in Debian? Although it's very unlikely, I could pretend that I found the parameters of the models by hand. In that case, the parameters can be seen as magical numbers with no explanation whatsoever as to how I found them. This looks very similar to distributing a picture which is a 2D rendering of a 3D model without distributing the original model. This is already accepted in the archive, and the reason is that a 2D picture is its own source, and can serve as a base for modified versions this way. The same reasoning applies to the model: as long as it is useful to tune the parameters by hand to produce derived versions, there’s no reason not to consider it as the source. Of course, the decision is up to the FTP masters, but I think this should be accepted for the sake of consistency with things we already cannot decently exclude from the archive. My second question is: Given the difficulty to prove what data were actually used to train a model, how can we prevent non-free software to use free data such as those of Voxforge? A widely-used technique is to cleverly hide some minor bugs in the data. If a non-free model shows the same bugs, you can prove the data was used illegally. Of course this only works if you manage to keep the bugs secret. Cheers, -- .''`. Josselin Mouette : :' : `. `' “I recommend you to learn English in hope that you in `- future understand things” -- Jörg Schilling signature.asc Description: Ceci est une partie de message numériquement signée
Re: legal questions regarding machine learning models
On Tue, 26 May 2009 22:55:32 +0200 Josselin Mouette wrote: Le mercredi 27 mai 2009 à 01:17 +0900, Mathieu Blondel a écrit : [...] My first question is : is it possible to distribute the model under a free software license without distributing the original data that were used to train the model? Likewise, is it possible to package directly a model in Debian? [...] This looks very similar to distributing a picture which is a 2D rendering of a 3D model without distributing the original model. This is already accepted in the archive, and the reason is that a 2D picture is its own source, and can serve as a base for modified versions this way. I disagree with this decision by the FTP masters. I personally think that, in most cases, the 2D rendering is not the actual source, since many modifications would be best made by changing the 3D model and re-rendering the 2D image. The most widely-accepted definition of source is found in the GNU GPL: source code of a work is essentially defined as the preferred form for making modifications to it. Identifying the source form is always a case-by-case decision, but I think that, in most cases, someone who needs to modify a 2D image originally rendered from a 3D model, would prefer changing the 3D model and re-perform the rendering, at least for a great number of possible modifications. The same reasoning applies to the model: as long as it is useful to tune the parameters by hand to produce derived versions, there’s no reason not to consider it as the source. I think that in the case of machine learning models, source form is even more clearly distinct from compiled object. We can consider an artificial neural network, for instance (Mathieu, correct me if it's a wrong example). I am under the impression that basically nobody would change connection weights by hand, in order to modify a neural network. Of course, the decision is up to the FTP masters, but I think this should be accepted for the sake of consistency with things we already cannot decently exclude from the archive. I instead think that FTP masters should change their minds about 2D images rendered from 3D models. Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA). -- New location for my website! Update your bookmarks! http://www.inventati.org/frx . Francesco Poli . GnuPG key fpr == C979 F34B 27CE 5CD8 DC12 31B5 78F4 279B DD6D FCF4 pgpC6lPlo6eiJ.pgp Description: PGP signature
Re: legal questions regarding machine learning models
On Wed, May 27, 2009 at 7:36 AM, Francesco Poli f...@firenze.linux.it wrote: I think that in the case of machine learning models, source form is even more clearly distinct from compiled object. We can consider an artificial neural network, for instance (Mathieu, correct me if it's a wrong example). I am under the impression that basically nobody would change connection weights by hand, in order to modify a neural network. Yes the connection weights of an artificial neural network are a good example of the parameters I was talking about. In practice, nobody would change a connection weight by hand because it's impossible to predict the effect of this particular weight on the overall performance of the model. Training algorithms are mostly clever ways to find a good model without trying the infinity of parameter combinations. So in practice yes, a model would be barely useful for further work on the model without the original data. In that regard, the original data AND the program used to train the model (this includes the implementations and the options passed to the algorithm) can be seen as the only real source. But yet again, I could pretend that I just happened to find the model parameters by hand. Afterall, a model is just a big set of numbers. Who could tell what data I did use to train my model? Due to the lack of quality free data, it's quite tempting to use non-free data in order to create free models. However, this is not good on the long term since that makes the model dependent on the person who holds the data. I mentioned Voxforge in my previous email. Their goal is to use their free spech data to train models with HTK and use the models with Julius. You can get the source code of HTK after registration on their website but the license has severe restrictions so HTK is not free software. Julius is a free software speech recognition engine that can use models trained with HTK. Note that HTK is pretty much THE speech recognition framework in the speech recognition community. If you consider that the ultimate source of a model is not only the data but also the software used to train it, then Voxforge models built with HTK can't be free, even though the data were free. Is it forbidden for someone to release an image made with Photoshop as free? Regarding Debian packaging, I think it's a wise decision to rebuild the model whenever the data and the training program are free, the data is not too large and the computation not too long. Should objective criterion of what is too large and what is too large be decided or should that be left to the DD? Then a remaining question is what to do with models for which we don't have the original data or the original training program? Thank you, Mathieu -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
On Wed, May 27, 2009 at 10:25 AM, Mathieu Blondel math...@mblondel.org wrote: I mentioned Voxforge in my previous email. Their goal is to use their free spech data to train models with HTK and use the models with Julius. You can get the source code of HTK after registration on their website but the license has severe restrictions so HTK is not free software. Julius is a free software speech recognition engine that can use models trained with HTK. Note that HTK is pretty much THE speech recognition framework in the speech recognition community. If you consider that the ultimate source of a model is not only the data but also the software used to train it, then Voxforge models built with HTK can't be free, even though the data were free. Is it forbidden for someone to release an image made with Photoshop as free? Wouldn't you want speech recognition software to be trained to your specific voice? What is the practical reason for wanting these Voxforge models in Debian? Isn't human speech too diverse (many languages, each with variants and many accents) to make these models useful? -- bye, pabs http://wiki.debian.org/PaulWise -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: legal questions regarding machine learning models
Replying to Paul Wise (sorry I'm not subscribed to the mailing-list, I saw your message through the archive) Modern speech recognition engines are usually speaker independent. In order to support speaker dependent models, users would have to record their voice in order to train the models. This may be fine for small vocabularies but this is not feasible for large vocabularies. Moreover, there exists techniques to adapt the speaker-independent models to one speaker. These techniques require much less training data than creating a speaker dependent model from scratch would require. And since humans are able to recognize many different voices, shouldn't we have the same goal for speech recognition? Thank you, Mathieu Blondel -- To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org