Re: [VOTE] POM Element for Source File Encoding
Le samedi 12 avril 2008, Brian E. Fox a écrit : > Al the work is being put on a branch right? That was where I saw the > discussion with Jason going. I did the work on 2 plugins in a branch: - jxr: http://svn.apache.org/viewvc?rev=645260&view=rev - javadoc: http://svn.apache.org/viewvc?rev=645262&view=rev As you can see, the change on plugins themselves is really tiny: it's much about convention, little about code. Sample use is in the branch too, to let Maven developers see the concrete positive impact on users: - actually every plugin has to be configured separately (pom is bigger), each one having its own parameter name for encoding (confusion): http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/before/pom.xml?view=markup - after the plugin change, there is one property that every plugin uses as a default value, hiding the fact that the parameter name is different for each plugin: http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/after/pom.xml?view=markup There is still exactly the same work to be done on at least 7 other Apache plugins and 4 Codehaus ones. The change on some plugins will represent more code, since they don't even support an encoding parameter yet, but the proposal on which we need to agree is about the convention to unify the parameter's value. IMHO the actual work on 2 plugins shows everything. I think it is sufficient to adopt, or reject, or transform, any aspect of the proposal: http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding any objection? Hervé > > -Original Message- > From: Hervé BOUTEMY [mailto:[EMAIL PROTECTED] > Sent: Saturday, April 12, 2008 10:06 AM > To: Maven Developers List > Subject: Re: [VOTE] POM Element for Source File Encoding > > > I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ > > with javadoc and jxr plugins branches to test the change, and sample use > > case. > > no reaction: I suppose this is lazy consensus :) > > I'll start to merge to plugins trunks tomorrow > > regards > > Hervé > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [VOTE] POM Element for Source File Encoding
Al the work is being put on a branch right? That was where I saw the discussion with Jason going. -Original Message- From: Hervé BOUTEMY [mailto:[EMAIL PROTECTED] Sent: Saturday, April 12, 2008 10:06 AM To: Maven Developers List Subject: Re: [VOTE] POM Element for Source File Encoding > I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ > with javadoc and jxr plugins branches to test the change, and sample use > case. no reaction: I suppose this is lazy consensus :) I'll start to merge to plugins trunks tomorrow regards Hervé - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
> I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ > with javadoc and jxr plugins branches to test the change, and sample use > case. no reaction: I suppose this is lazy consensus :) I'll start to merge to plugins trunks tomorrow regards Hervé - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Le mercredi 09 avril 2008, Jason van Zyl a écrit : > All sounds fine. Just wanted you to think about the bigger picture in > mind. > > Please do the work on a branch and go through the rigor of Brian's > example and make sure it works before you merge it into something we > would release to users. This is not an insignificant change. I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ with javadoc and jxr plugins branches to test the change, and sample use case. Isn't it sufficient? Hervé > > On 9-Apr-08, at 10:36 AM, Benjamin Bentmann wrote: > >> Make sure you consider the case where you have people developing > >> the same code base all over the world, and the possible reasoning > >> of falling back to platform default encoding. Consider the team > >> spread across the US, Russia, and China and what do they do > >> normally? > > > > This international spread of developers is in particular the case we > > have in mind. I mean, how should such a team (say the Maven > > community) deliver reliable build output if not all developers have > > agreed to use the same file encoding for the sources? Say the US > > devs would have ASCII as default encoding, the Europeans Latin-1 and > > the Asians Big5 for our nice potpourri. Even if all have agreed to > > use English for coding, you still might encounter Non-ASCII > > characters that get messed up, e.g. in javadoc comments that carry > > the name of the contributor/committer. Other developers might > > experience build failures because of encoding mismatch, at best > > other people's names are disfigured which is rather impolite. > > > > The Eclipse folks had a similar problem [0]. The solution: Lock the > > encoding down for the entire project. > > > >> Is it possible to specify an encoding in one place that doesn't > >> work somewhere else? > > > > Yes, in theory you can have one user specify an encoding that > > another user's JVM does not support. As the class javadoc about > > Charset [1] states, only a few encodings - including Latin-1 and > > UTF-8 - are required to be supported, although the reference > > implementation from Sun supports quite more encodings [2]. However, > > I don't consider this as a practical concern. Given that support for > > UTF-8 is mandatory, there exists an encoding that can handle quite > > any character people would like to enter and Java can handle. Hence > > there exists a solution that works for everyone on the team. > > > >> I am fortunate in that I've never seen an encoding problem in Maven > >> personally. In your proposal you talk about aligning the encoding > >> value but my question in what cases have you found the default > >> encoding not working as you don't talk about that at all in the > >> proposal. > > > > Well, choose your favorite from a search for "encoding" on all Maven > > 2 projects in JIRA ;-) > > - http://jira.codehaus.org/browse/MNG-2932 > > - http://jira.codehaus.org/browse/MANTTASKS-14 > > - http://jira.codehaus.org/browse/MTAGLIST-27 > > - http://jira.codehaus.org/browse/MRELEASE-302 > > - http://jira.codehaus.org/browse/DOXIA-103 > > - http://jira.codehaus.org/browse/MCHANGES-71 > > - (about 300 more hits) > > > > ASCII is quite safe, but anything which requires more than those 7 > > bits just needs special care. > > > >> Do you know what happens with all the tools that people use. Like > >> checking into all SCMs, and what happens when people checkout on > >> to their system, editors, IDEs. I'm merely suggesting that their > >> might be a reason most things fall back to the default encoding on > >> the system because it's generally been a hard thing to coral. > > > > In principle you're right, most of the tools are intended for usage > > with the platform's encoding. This seems to include the popular diff/ > > patch tools used by many SCMs, they have not really support for > > different encodings [3] (yet another historic design flaw, next to > > the two-digit year format). > > > > Also, the SCMs themselves seem not to care about (file content) > > encoding yet, I have found proposals for Subversion [5] and Bazaar > > [4] but that's it. However, as far as I can tell, not knowing about > > file encoding SCMs also do not perform any conversions on the file > > content but simply assume a simple byte-to-char mapping like ASCII > > when doing EOL normalization or keyword substitution. > > > > As for editors and IDEs: Even this tiny thing "Notepad" from Windows > > supports UTF-8 nowadays and I wouldn't call that an editor. Does > > anybody know about a popular editor/IDE that calls itself mature but > > does not allow to configure file encoding? > > > > > > Benjamin > > > > > > [0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898 > > [1] http://java.sun.com/javase/6/docs/api/java/nio/charset/ > > Charset.html > > [2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html > > [3] > > http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internat >
Re: [VOTE] POM Element for Source File Encoding
Le mercredi 09 avril 2008, Benjamin Bentmann a écrit : > > I see your point. Worth another vote? Or should this switch be postponed > > to 2.1, trading consistency in minor version upgrades for a longer time > > for these Latin1 defaults to be established? > > [...] > > So while I agree that a change in default either now or in the future is > > ugly, it is not taboo, and I believe woth the gain. > > Latin-1 being the default value was part of our proposal and not many > people complained about that nor changed their previous votes. So I believe > another vote won't deliver a different outcome. > > Besides, Brian's honorable efforts to ban regressions are a good argument > to keep the already started route with Latin-1. It might not be the best > default value, but it's only a one liner to change it. I have one argument in favor of ISO-8859-1 as default: it's the default encoding of properties files, as defined by JDK java.util.Properties class. When Maven will be JDK 1.5+, we'll be able to switch to XML properties files, and then no problem for UTF-8 as default... > > > Benjamin > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
On Wed, Apr 9, 2008 at 7:36 PM, Benjamin Bentmann <[EMAIL PROTECTED]> wrote: > > > Make sure you consider the case where you have people developing the same > code base all over the world, and the possible reasoning of falling back to > platform default encoding. Consider the team spread across the US, Russia, > and China and what do they do normally? > > > > This international spread of developers is in particular the case we have > in mind. I mean, how should such a team (say the Maven community) deliver > reliable build output if not all developers have agreed to use the same file > encoding for the sources? Say the US devs would have ASCII as default > encoding, the Europeans Latin-1 and the Asians Big5 for our nice potpourri. > Even if all have agreed to use English for coding, you still might encounter > Non-ASCII characters that get messed up, e.g. in javadoc comments that carry > the name of the contributor/committer. Other developers might experience > build failures because of encoding mismatch, at best other people's names > are disfigured which is rather impolite. > > The Eclipse folks had a similar problem [0]. The solution: Lock the > encoding down for the entire project.\ just for the record, netbeans.org projects all use UTF-8. We have devs in US, Czech rep, Russia and elsewhere. Netbeans allows to set default encoding per project, for maven project I currently lookup how maven-compiler-plugin is configured. If no configuration is in place I fallback to platform encoding. Encoding is not only different across countries but also across platforms. While most Linux distributions use UTF-8, you get different encoding based on what localized version of Windows you buy I think. East european set is different from west europe. My Mac fallbacks to something called MacRoman as default encoding. Milos > > > > > Is it possible to specify an encoding in one place that doesn't work > somewhere else? > > > > Yes, in theory you can have one user specify an encoding that another > user's JVM does not support. As the class javadoc about Charset [1] states, > only a few encodings - including Latin-1 and UTF-8 - are required to be > supported, although the reference implementation from Sun supports quite > more encodings [2]. However, I don't consider this as a practical concern. > Given that support for UTF-8 is mandatory, there exists an encoding that can > handle quite any character people would like to enter and Java can handle. > Hence there exists a solution that works for everyone on the team. > > > > > I am fortunate in that I've never seen an encoding problem in Maven > personally. In your proposal you talk about aligning the encoding value but > my question in what cases have you found the default encoding not working > as you don't talk about that at all in the proposal. > > > > Well, choose your favorite from a search for "encoding" on all Maven 2 > projects in JIRA ;-) > - http://jira.codehaus.org/browse/MNG-2932 > - http://jira.codehaus.org/browse/MANTTASKS-14 > - http://jira.codehaus.org/browse/MTAGLIST-27 > - http://jira.codehaus.org/browse/MRELEASE-302 > - http://jira.codehaus.org/browse/DOXIA-103 > - http://jira.codehaus.org/browse/MCHANGES-71 > - (about 300 more hits) > > ASCII is quite safe, but anything which requires more than those 7 bits > just needs special care. > > > > > Do you know what happens with all the tools that people use. Like checking > into all SCMs, and what happens when people checkout on to their system, > editors, IDEs. I'm merely suggesting that their might be a reason most > things fall back to the default encoding on the system because it's > generally been a hard thing to coral. > > > > In principle you're right, most of the tools are intended for usage with > the platform's encoding. This seems to include the popular diff/patch tools > used by many SCMs, they have not really support for different encodings [3] > (yet another historic design flaw, next to the two-digit year format). > > Also, the SCMs themselves seem not to care about (file content) encoding > yet, I have found proposals for Subversion [5] and Bazaar [4] but that's it. > However, as far as I can tell, not knowing about file encoding SCMs also do > not perform any conversions on the file content but simply assume a simple > byte-to-char mapping like ASCII when doing EOL normalization or keyword > substitution. > > As for editors and IDEs: Even this tiny thing "Notepad" from Windows > supports UTF-8 nowadays and I wouldn't call that an editor. Does anybody > know about a popular editor/IDE that calls itself mature but does not allow > to configure file encoding? > > > Benjamin > > > [0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898 > [1] http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html > [2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html > [3] > http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internationa
Re: [VOTE] POM Element for Source File Encoding
I see your point. Worth another vote? Or should this switch be postponed to 2.1, trading consistency in minor version upgrades for a longer time for these Latin1 defaults to be established? [...] So while I agree that a change in default either now or in the future is ugly, it is not taboo, and I believe woth the gain. Latin-1 being the default value was part of our proposal and not many people complained about that nor changed their previous votes. So I believe another vote won't deliver a different outcome. Besides, Brian's honorable efforts to ban regressions are a good argument to keep the already started route with Latin-1. It might not be the best default value, but it's only a one liner to change it. Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
All sounds fine. Just wanted you to think about the bigger picture in mind. Please do the work on a branch and go through the rigor of Brian's example and make sure it works before you merge it into something we would release to users. This is not an insignificant change. On 9-Apr-08, at 10:36 AM, Benjamin Bentmann wrote: Make sure you consider the case where you have people developing the same code base all over the world, and the possible reasoning of falling back to platform default encoding. Consider the team spread across the US, Russia, and China and what do they do normally? This international spread of developers is in particular the case we have in mind. I mean, how should such a team (say the Maven community) deliver reliable build output if not all developers have agreed to use the same file encoding for the sources? Say the US devs would have ASCII as default encoding, the Europeans Latin-1 and the Asians Big5 for our nice potpourri. Even if all have agreed to use English for coding, you still might encounter Non-ASCII characters that get messed up, e.g. in javadoc comments that carry the name of the contributor/committer. Other developers might experience build failures because of encoding mismatch, at best other people's names are disfigured which is rather impolite. The Eclipse folks had a similar problem [0]. The solution: Lock the encoding down for the entire project. Is it possible to specify an encoding in one place that doesn't work somewhere else? Yes, in theory you can have one user specify an encoding that another user's JVM does not support. As the class javadoc about Charset [1] states, only a few encodings - including Latin-1 and UTF-8 - are required to be supported, although the reference implementation from Sun supports quite more encodings [2]. However, I don't consider this as a practical concern. Given that support for UTF-8 is mandatory, there exists an encoding that can handle quite any character people would like to enter and Java can handle. Hence there exists a solution that works for everyone on the team. I am fortunate in that I've never seen an encoding problem in Maven personally. In your proposal you talk about aligning the encoding value but my question in what cases have you found the default encoding not working as you don't talk about that at all in the proposal. Well, choose your favorite from a search for "encoding" on all Maven 2 projects in JIRA ;-) - http://jira.codehaus.org/browse/MNG-2932 - http://jira.codehaus.org/browse/MANTTASKS-14 - http://jira.codehaus.org/browse/MTAGLIST-27 - http://jira.codehaus.org/browse/MRELEASE-302 - http://jira.codehaus.org/browse/DOXIA-103 - http://jira.codehaus.org/browse/MCHANGES-71 - (about 300 more hits) ASCII is quite safe, but anything which requires more than those 7 bits just needs special care. Do you know what happens with all the tools that people use. Like checking into all SCMs, and what happens when people checkout on to their system, editors, IDEs. I'm merely suggesting that their might be a reason most things fall back to the default encoding on the system because it's generally been a hard thing to coral. In principle you're right, most of the tools are intended for usage with the platform's encoding. This seems to include the popular diff/ patch tools used by many SCMs, they have not really support for different encodings [3] (yet another historic design flaw, next to the two-digit year format). Also, the SCMs themselves seem not to care about (file content) encoding yet, I have found proposals for Subversion [5] and Bazaar [4] but that's it. However, as far as I can tell, not knowing about file encoding SCMs also do not perform any conversions on the file content but simply assume a simple byte-to-char mapping like ASCII when doing EOL normalization or keyword substitution. As for editors and IDEs: Even this tiny thing "Notepad" from Windows supports UTF-8 nowadays and I wouldn't call that an editor. Does anybody know about a popular editor/IDE that calls itself mature but does not allow to configure file encoding? Benjamin [0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898 [1] http://java.sun.com/javase/6/docs/api/java/nio/charset/ Charset.html [2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html [3] http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internationalization [4] http://bazaar-vcs.org/UnicodeSupport?action=show&redirect=EncodingSupport#head-43c0111da063796da433179faaf8d065bda5c42e [5] http://svn.haxx.se/dev/archive-2006-03/1182.shtml - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanks, Jason -- Jason van Zyl Founder, Apache Maven jason at sona
Re: [VOTE] POM Element for Source File Encoding
Make sure you consider the case where you have people developing the same code base all over the world, and the possible reasoning of falling back to platform default encoding. Consider the team spread across the US, Russia, and China and what do they do normally? This international spread of developers is in particular the case we have in mind. I mean, how should such a team (say the Maven community) deliver reliable build output if not all developers have agreed to use the same file encoding for the sources? Say the US devs would have ASCII as default encoding, the Europeans Latin-1 and the Asians Big5 for our nice potpourri. Even if all have agreed to use English for coding, you still might encounter Non-ASCII characters that get messed up, e.g. in javadoc comments that carry the name of the contributor/committer. Other developers might experience build failures because of encoding mismatch, at best other people's names are disfigured which is rather impolite. The Eclipse folks had a similar problem [0]. The solution: Lock the encoding down for the entire project. Is it possible to specify an encoding in one place that doesn't work somewhere else? Yes, in theory you can have one user specify an encoding that another user's JVM does not support. As the class javadoc about Charset [1] states, only a few encodings - including Latin-1 and UTF-8 - are required to be supported, although the reference implementation from Sun supports quite more encodings [2]. However, I don't consider this as a practical concern. Given that support for UTF-8 is mandatory, there exists an encoding that can handle quite any character people would like to enter and Java can handle. Hence there exists a solution that works for everyone on the team. I am fortunate in that I've never seen an encoding problem in Maven personally. In your proposal you talk about aligning the encoding value but my question in what cases have you found the default encoding not working as you don't talk about that at all in the proposal. Well, choose your favorite from a search for "encoding" on all Maven 2 projects in JIRA ;-) - http://jira.codehaus.org/browse/MNG-2932 - http://jira.codehaus.org/browse/MANTTASKS-14 - http://jira.codehaus.org/browse/MTAGLIST-27 - http://jira.codehaus.org/browse/MRELEASE-302 - http://jira.codehaus.org/browse/DOXIA-103 - http://jira.codehaus.org/browse/MCHANGES-71 - (about 300 more hits) ASCII is quite safe, but anything which requires more than those 7 bits just needs special care. Do you know what happens with all the tools that people use. Like checking into all SCMs, and what happens when people checkout on to their system, editors, IDEs. I'm merely suggesting that their might be a reason most things fall back to the default encoding on the system because it's generally been a hard thing to coral. In principle you're right, most of the tools are intended for usage with the platform's encoding. This seems to include the popular diff/patch tools used by many SCMs, they have not really support for different encodings [3] (yet another historic design flaw, next to the two-digit year format). Also, the SCMs themselves seem not to care about (file content) encoding yet, I have found proposals for Subversion [5] and Bazaar [4] but that's it. However, as far as I can tell, not knowing about file encoding SCMs also do not perform any conversions on the file content but simply assume a simple byte-to-char mapping like ASCII when doing EOL normalization or keyword substitution. As for editors and IDEs: Even this tiny thing "Notepad" from Windows supports UTF-8 nowadays and I wouldn't call that an editor. Does anybody know about a popular editor/IDE that calls itself mature but does not allow to configure file encoding? Benjamin [0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898 [1] http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html [2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html [3] http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internationalization [4] http://bazaar-vcs.org/UnicodeSupport?action=show&redirect=EncodingSupport#head-43c0111da063796da433179faaf8d065bda5c42e [5] http://svn.haxx.se/dev/archive-2006-03/1182.shtml - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Benjamin Bentmann wrote: In general, I completely agree with your preference to Unicode and fail-fast behavior. If I had been involved when the Maven story started, I would have proposed UTF-8 as the default value, no doubt. As for today, I tried to consider consistency with existing behavior. The Maven Site Plugin was already using Latin-1 as the default value for inputEncoding and outputEncoding and so I proposed this for other plugins, too. Indeed, one of the patches (MJAVADOC-165) was just released such that already two plugins teach users this default value. Therefore I fear it might be too late to introduce another default value. If the community believes this change is worth the confusion caused on users, I'm the first one running the other way round ;-) I see your point. Worth another vote? Or should this switch be postponed to 2.1, trading consistency in minor version upgrades for a longer time for these Latin1 defaults to be established? Given the failfast nature of the UTF-8 default, we won't have to worry about the switch going unnoticed. Developers switching from a version defaulting to Latin1 to UTF-8 will notice the change immediately, and for development in a heterogenous environment they can simply override the super-POM with their own default. So while I agree that a change in default either now or in the future is ugly, it is not taboo, and I believe woth the gain. That's a good point. It appears we need to do some extra homework here: The simplisitic use of InputStreamReader and OutputStreamReader will silently convert unmappable byte sequences to a default character ('?', see also [0]). I guess we could nicely hide the required implementation by means of the existing methods in Reader-/WriterFactory from plexus-utils. That works for plugins doing the conversion in code under our control. Other plugins that use external libraries or tools might be more difficult. Note that ASCII-only sources will compile cleanly no matter the default encoding Most of time, but UTF-16 or EBCDIC have not even ASCII in common. I was thinking about the default of the default, i.e. the value to be set in the super-POM. We certainly won't choose UTF-16 or EBCDIC for this global default, and as files encoded in UTF-16 or EBCDIC don't count as ASCII-only, my Martin signature.asc Description: OpenPGP digital signature
RE: [VOTE] POM Element for Source File Encoding
>As for today, I tried to consider consistency with existing behavior. The >Maven Site Plugin was already using Latin-1 as the default value for >inputEncoding and outputEncoding and so I proposed this for other plugins, >too. Indeed, one of the patches (MJAVADOC-165) was just released such that >already two plugins teach users this default value. Therefore I fear it >might be too late to introduce another default value. If the community >believes this change is worth the confusion caused on users, I'm the first >one running the other way round ;-) Don't break existing builds. "No regressions." ;-) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Taking this together, one might argue to have UTF-8 the default, not ISO-8859-1. In general, I completely agree with your preference to Unicode and fail-fast behavior. If I had been involved when the Maven story started, I would have proposed UTF-8 as the default value, no doubt. As for today, I tried to consider consistency with existing behavior. The Maven Site Plugin was already using Latin-1 as the default value for inputEncoding and outputEncoding and so I proposed this for other plugins, too. Indeed, one of the patches (MJAVADOC-165) was just released such that already two plugins teach users this default value. Therefore I fear it might be too late to introduce another default value. If the community believes this change is worth the confusion caused on users, I'm the first one running the other way round ;-) It should be checked whether plugins really die for invalid UTF-8 sequences, and what the output looks like. That's a good point. It appears we need to do some extra homework here: The simplisitic use of InputStreamReader and OutputStreamReader will silently convert unmappable byte sequences to a default character ('?', see also [0]). I guess we could nicely hide the required implementation by means of the existing methods in Reader-/WriterFactory from plexus-utils. Note that ASCII-only sources will compile cleanly no matter the default encoding Most of time, but UTF-16 or EBCDIC have not even ASCII in common. Benjamin [0] http://java.sun.com/javase/6/docs/api/java/io/OutputStreamWriter.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Benjamin Bentmann wrote: With regard to user errors, my general suggestion is to fail the build. This unforgiving attitude should not be that unfamilar to users: It has been chosen for a popular format like XML which is also employed by Maven for a few files. The problems depend on the encodings: If one feeds Latin-1 into an UTF-8 decoder, you most likely encouter invalid byte sequences, making the decoder fail. That's my favorite case as it clearly shows the user something is wrong and needs his attention. The other case is worse because more subtle: Feeding UTF-8 into a Latin-1 decoder will pass but produces output that only a human can tell being garbage by closing analyzing the few Non-ASCII characters. Taking this together, one might argue to have UTF-8 the default, not ISO-8859-1. Almost anything that passes UTF-8 encoding constraints will be indeed UTF-8, as non-ASCII files that are not UTF-8 will almost certainly contain sequences not valid in UTF-8. So if a user fails to specify the encoding he uses, and if this encoding isn't UTF-8, then things will break for him. This has two advantages: 1. fail-fast behaviour. If there is a misconfiguration, the maven run will die, and the developer can fix the issue. You don't have to wait for some other developer complaining about garbled strings or a user complaining about a broken website until you can fix things. 2. promote unicode. While there are a lot of encosings out there for historic reasons, most of them suffer severe drawbacks in an international software project, because they either can't express all needed characters, or they are not common outside a small region. So while Taiwanese developers might be happy to develop an English/Chinese project in Big5, prospective american Contributors might not get their editor to load files as Big5. UTF-8, on the other hand, is used worldwide and provides the whole Unicode range. For new projects, I guess UTF-8 would be a reasonable best practice, and making this best practice the default in maven might promote it. Of course this conflicts with previous discussions about Latin1 ensuring that any project can get compiled, as it has no invalid byte sequences. The choice is whether, in the absence of configuration, A) you want your compile to succeed all the time, possibly generating the wrong results, or B) you want your build to fail in case of a misconfiguration (including missing configuration), but ensure correct results if it does not fail. If I understood him correctly, Jason voted for A). I took his request for non-dying builds as a requirement and pointed out that this is possible with Latin1. Now that I think about it, I believe I would rather want B), as I'm all for failfast deterministic behaviour. It should be checked whether plugins really die for invalid UTF-8 sequences, and what the output looks like. If possible, plugins should point out that a misconfiguration of the encoding in the pom (either the plugin configuration or the proposed global configuration property) is possibly the cause of the error, if it's not a developer using another encoding. Note that ASCII-only sources will compile cleanly no matter the default encoding, so all projects that don't need to worry about encoding won't be forced to do so. Only international projects where encoding is relevant will force their developers to either follow best practices or explicitely state their policy. Greetings, Martin signature.asc Description: OpenPGP digital signature
Re: [VOTE] POM Element for Source File Encoding
Paul Benedict wrote: Just a proposal: Maven could loosen its parsing rules when it detects versions greater than it is configured to accept. Forward compatibility would be nice. For anyone seriously interested in interoperability , I suggest a look at http://www.w3.org/2005/05/xsd-versioning-resources.html , especially the use cases, which illustrate several issues quite well. Martin signature.asc Description: OpenPGP digital signature
Re: [VOTE] POM Element for Source File Encoding
Benjamin Bentmann wrote: You could of course write an encoding detection plugin which could examine the code and set the required property accordingly. Personally, I don't see the use case for that. If there are really users out there that don't know what file encoding they are using when writing up their sources, they are most probably happy with the proposed default value of Latin-1. Alternatively, this encoding detection plugin could be as simple as printing out the Java system property ${file.encoding} which obviously worked well enough for the user. ${file.encoding} will only work if the file originated on the same machine. I think of semi-automatic conversions of inhomogenous code into maven. E.g. some teacher collects homework from his students as a bunch of zip files containing only source, has a script to turn each into a maven project, and a master project interacting with them, like letting them compete with one another or whatever. In this case one might wish to automatically detect the encoding of every module, especially in locales with several commonly used encodings, so that string literals in these classes are handled correctly without the students even knowing what an encoding is. But that's a corner case, so I guess we should stop discussion about the use of such a program here, until someone actually requires it. Greetings, Martin signature.asc Description: OpenPGP digital signature
Re: [VOTE] POM Element for Source File Encoding
On 8-Apr-08, at 4:09 PM, Benjamin Bentmann wrote: Jason van Zyl wrote: What happens when the encoding is different then what is stated? Same problem really, in how to deal with the actual versus declared. If the declared encoding does not match the actual one, I simply call this an user error. Make sure you consider the case where you have people developing the same code base all over the world, and the possible reasoning of falling back to platform default encoding. Consider the team spread across the US, Russia, and China and what do they do normally? Is it possible to specify an encoding in one place that doesn't work somewhere else? I am fortunate in that I've never seen an encoding problem in Maven personally. In your proposal you talk about aligning the encoding value but my question in what cases have you found the default encoding not working as you don't talk about that at all in the proposal. Do you know what happens with all the tools that people use. Like checking into all SCMs, and what happens when people checkout on to their system, editors, IDEs. I'm merely suggesting that their might be a reason most things fall back to the default encoding on the system because it's generally been a hard thing to coral. Either he explicitly set the wrong value or forgot to overwrite the default value. With regard to user errors, my general suggestion is to fail the build. This unforgiving attitude should not be that unfamilar to users: It has been chosen for a popular format like XML which is also employed by Maven for a few files. That would depend on what kinds of problems can arise if things are not consistent. The problems depend on the encodings: If one feeds Latin-1 into an UTF-8 decoder, you most likely encouter invalid byte sequences, making the decoder fail. That's my favorite case as it clearly shows the user something is wrong and needs his attention. The other case is worse because more subtle: Feeding UTF-8 into a Latin-1 decoder will pass but produces output that only a human can tell being garbage by closing analyzing the few Non-ASCII characters. You have to deal with the very real possibility no one is going to set it, not know what is, and report issues related to encoding even if the whole system works. I don't think that lack of knowledge is a state that should be supported. Java is an international platform, designed for platform- independence (more or less). If developers don't know about file encoding, they are likely producing bad code. Therefore, I am easy to say: Have users report issues about encoding and let's tell them how to do it properly, i.e. teach them another best practice. Then, maybe some day, we won't ever face programs that were written without file encoding in mind ;-) For the system you are proposing there would be touch points at which you would look for encoding parameters. If those values are not state you will need a strategy to detect or you will never be able to support any encoding alignment in older versions of Maven without the encoding parameterization. Hm, maybe we talk a lot just because we didn't illustrate our proposal properly: A key point is that there will *always* be a specific encoding value. The proposal expects all affected plugins to fall back to Latin-1 (or whatever, just a fixed value) if they don't get an explicit setting from the POM. I.e. once a user employs a particular version of a plugin, he can immediately tell which encoding it will use to process text files. In other words, he can immediately tell whether the plugin will behave correctly. In contrast, if we followed your suggestion with encoding guessing, the user would have to try out the plugin and verify that is guessed correctly. The encoding parameterization is primarily a task for the individual plugins and not bound to a Maven version. Having a dedicated POM property/element is just sugar, not a requirement. The important aspect is unification of encoding handling in the plugins. Of course it is, but that doesn't negate that fact people don't necessarily follow best practices. That's right. But I believe we have to distinguish bad practice and mistake. What people call good practice might be controversial, but stating that a Latin-1 encoded file should be read using UTF-8 is in general just wrong and leaves no room for discussion. Hence I believe that Maven has all right to fail the build and report an error if a user does not properly setup the file encoding, forcing users to fix the error. Absolutely, but look at all the questions on the mailing list that expect many of these things to just be detected. I don't want to upset those users but I believe that not every request is justified and can be rejected if only properly backed by a reasonable argument. Until somebody shows me a feasible and *reliable* algo to
Re: [VOTE] POM Element for Source File Encoding
IMHO, the best hint for a user choose his encoding when the default ISO-8859-1 isn't a good valuie for him, is displaying platform encoding (in "mvn -v" output for example): it's easy, reliable, and corresponds to the value he would have got before the change +1, just created MNG-3509 for this. Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Jason van Zyl wrote: What happens when the encoding is different then what is stated? Same problem really, in how to deal with the actual versus declared. If the declared encoding does not match the actual one, I simply call this an user error. Either he explicitly set the wrong value or forgot to overwrite the default value. With regard to user errors, my general suggestion is to fail the build. This unforgiving attitude should not be that unfamilar to users: It has been chosen for a popular format like XML which is also employed by Maven for a few files. That would depend on what kinds of problems can arise if things are not consistent. The problems depend on the encodings: If one feeds Latin-1 into an UTF-8 decoder, you most likely encouter invalid byte sequences, making the decoder fail. That's my favorite case as it clearly shows the user something is wrong and needs his attention. The other case is worse because more subtle: Feeding UTF-8 into a Latin-1 decoder will pass but produces output that only a human can tell being garbage by closing analyzing the few Non-ASCII characters. You have to deal with the very real possibility no one is going to set it, not know what is, and report issues related to encoding even if the whole system works. I don't think that lack of knowledge is a state that should be supported. Java is an international platform, designed for platform-independence (more or less). If developers don't know about file encoding, they are likely producing bad code. Therefore, I am easy to say: Have users report issues about encoding and let's tell them how to do it properly, i.e. teach them another best practice. Then, maybe some day, we won't ever face programs that were written without file encoding in mind ;-) For the system you are proposing there would be touch points at which you would look for encoding parameters. If those values are not state you will need a strategy to detect or you will never be able to support any encoding alignment in older versions of Maven without the encoding parameterization. Hm, maybe we talk a lot just because we didn't illustrate our proposal properly: A key point is that there will *always* be a specific encoding value. The proposal expects all affected plugins to fall back to Latin-1 (or whatever, just a fixed value) if they don't get an explicit setting from the POM. I.e. once a user employs a particular version of a plugin, he can immediately tell which encoding it will use to process text files. In other words, he can immediately tell whether the plugin will behave correctly. In contrast, if we followed your suggestion with encoding guessing, the user would have to try out the plugin and verify that is guessed correctly. The encoding parameterization is primarily a task for the individual plugins and not bound to a Maven version. Having a dedicated POM property/element is just sugar, not a requirement. The important aspect is unification of encoding handling in the plugins. Of course it is, but that doesn't negate that fact people don't necessarily follow best practices. That's right. But I believe we have to distinguish bad practice and mistake. What people call good practice might be controversial, but stating that a Latin-1 encoded file should be read using UTF-8 is in general just wrong and leaves no room for discussion. Hence I believe that Maven has all right to fail the build and report an error if a user does not properly setup the file encoding, forcing users to fix the error. Absolutely, but look at all the questions on the mailing list that expect many of these things to just be detected. I don't want to upset those users but I believe that not every request is justified and can be rejected if only properly backed by a reasonable argument. Until somebody shows me a feasible and *reliable* algo to tell ISO-8859-1 and ISO-8859-15 apart, I don't want the dumb machine to start guessing. I, and I hope all the other users, aim for a correct build and if the machine cannot derive the required parameters, it is a user's duty to specify the proper values. Besides, this is nothing that really hurts much, add the line to your POM and be fine for the rest of your life. Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Herve, Just a proposal: Maven could loosen its parsing rules when it detects versions greater than it is configured to accept. This can't be without limits, of course, perhaps in the range of a single point release: 4.0 <= 4.0.x < 4.1. But perhaps within the 4.0.x series, it would accept undeclared elements instead of strict parsing against the XSD. So if a 4.0.0 parser is given a 4.0.1 POM, it must at least match 4.0.0 but also accepts undeclared elements. Forward compatibility would be nice. Paul On Tue, Apr 8, 2008 at 4:16 PM, Hervé BOUTEMY <[EMAIL PROTECTED]> wrote: > Le mardi 08 avril 2008, Paul Benedict a écrit : > > In Commons Validator, we updated the DTD even in point releases. I don't > > see the harm in doing the same here. After all, if the POM is 4.0.0, why > > not create a 4.0.1? It sounds like Maven 2.1 will have a 4.1 version. > > > > Paul > because if you use 4.0.1 for your project, and upload your component to a > repository, everybody depending on your component will need to support > 4.0.1 > or they'll get a failure parsing a 4.0.1 pom with their Maven runtime > supporting only 4.0.0 pom > > to support a 4.1 version, I imagine there will be some trick to implement > to > upload simultaneously the original 4.1 pom version to the repository and a > generated 4.0.0 for compatibility with Maven 2.0.x > > Hervé > >
Re: [VOTE] POM Element for Source File Encoding
Jason van Zyl wrote: Possibly, but you're guessing. Guessing about how much it will be slower, yes, guessing that it will be slower, no. Additional work, additional time. Wouldn't you agree? Then the question becomes, is it worth to take this overhead, or how much benefit do you expect from the encoding guess over the simple default value. Obviously checking the encoding on every file would be unwise. As Martin nicely illustrated, you would exactly have to do this. Otherwise, you could simply shortcut the detection to ASCII because that's what you see most of the time. The characters that require the proper encoding are in the minority. My passion for this proposal is not about "works most of the time", I would like to see "works always". Trying to detect where it's not provided (mistakes) We proposed to set a default value in the super POM such that the encoding will always be specified. To handle Maven 2.0.9-, we further proposed that each plugin consistently falls back to this agreed default value in case it doesn't get a value from the POM. Is there a case I am missing? or can't be provided (not supported as an option in the model) you're going to have to do something. So what are you going to do in those cases? I am not sure what you mean when referring to "model". Are you referring to a plugin that is currently not aware of the encoding issue, i.e. simply uses the JVM's default value and does not provide a configuration parameter to the user? For this case, we should simply fix this plugin and release a new version of it to deliver "consistently high quality software". Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Martin von Gagern wrote: if a newcomer like me is allowed to vote. The more people participate in a discussion, the more likely is the result to match public consensus rather than individual's preferences. Suppose you have a huge source tree, mostly english ASCII, but somewhere in there there is a single degree sign, '\u00b0'. How would you detect it, short of scanning every ASCII file until you hit that one? Exactly, if the automatic guessing should have any chance to deliver the proper result, it's doomed to scan all the files and this is additional I/O. Please remember, I/O is one of the most expensive operations in terms of time, in particular with a Maven build being quite sequential. You could of course write an encoding detection plugin which could examine the code and set the required property accordingly. Personally, I don't see the use case for that. If there are really users out there that don't know what file encoding they are using when writing up their sources, they are most probably happy with the proposed default value of Latin-1. Alternatively, this encoding detection plugin could be as simple as printing out the Java system property ${file.encoding} which obviously worked well enough for the user. For those users that know about file encoding, it won't be a problem to specify this in the POM. In particular, those users will not fail to specify the right encoding, unlike a dumb machine which merely tests whether a particular byte stream obeys the syntax rules of an encoding. Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Le mardi 08 avril 2008, Martin von Gagern a écrit : > +1 for the original proposal, if a newcomer like me is allowed to vote. > > The concept with the property, which can be set with the properties > until the model is updated, and which can be the default expression for > affected plugins, is simply elegant. +1 > I support concerns here that the cost of encoding detection may in many > cases be prohibitively high. Maven runs too slow as it is, imho. You > could of course write an encoding detection plugin which could examine > the code and set the required property accordingly. But enabling that by > default feels bad to me. +1 encoding detection, guessing encoding, is unreliable by nature Why not in a browser, where: - encoding can change on every page - a user looks at the rendered characters, sees a problem easily and fixes the value by simply trying another value and seeing if it is better But embedded in Maven, where encoding is not so volatile and the consequences of a bad guess will be more subtle (for example as the classes compiled will be run and display bad output), I find it a really bad idea. > It should be noted that plugins that generate code to be used by other > plugins should have their output encoding default to the general input > encoding, so that there are no breaks in the chain. it's noted in the proposal, in the list of affected plugins (modello, for example, which generates Java source code) > As Jason writes about consistency, I guess the danger of inconsistent > input handling, as different plugins might be configured to read it > using different charsets, is exactly the kind of inconsistency to be > addressed by this proposal, so I'd expect more consistency after it has > been implemented, not less. +1 until now, few people did care about encoding for non XML sources, and it worked: yes, that's the magic of platform encoding (the drawback is reproducibility) IMHO, the best hint for a user choose his encoding when the default ISO-8859-1 isn't a good valuie for him, is displaying platform encoding (in "mvn -v" output for example): it's easy, reliable, and corresponds to the value he would have got before the change Hervé - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Le mardi 08 avril 2008, Paul Benedict a écrit : > In Commons Validator, we updated the DTD even in point releases. I don't > see the harm in doing the same here. After all, if the POM is 4.0.0, why > not create a 4.0.1? It sounds like Maven 2 will have a 4.1 version. > > Paul because if you use 4.0.1 for your project, and upload your component to a repository, everybody depending on your component will need to support 4.0.1 or they'll get a failure parsing a 4.0.1 pom with their Maven runtime supporting only 4.0.0 pom to support a 4.1 version, I imagine there will be some trick to implement to upload simultaneously the original 4.1 pom version to the repository and a generated 4.0.0 for compatibility with Maven 2.0.x Hervé > > On Mon, Apr 7, 2008 at 6:03 PM, Jason van Zyl <[EMAIL PROTECTED]> wrote: > > On 7-Apr-08, at 3:58 PM, Jason van Zyl wrote: > > > Would being able to detect the encoding help with making this less > > > complicated. Something JChardet? > > > > Sorry, something like JCharet: > > > > http://jchardet.sourceforge.net/ > > > > On 7-Apr-08, at 2:31 PM, Hervé BOUTEMY wrote: > > > > Le dimanche 06 avril 2008, Jason van Zyl a écrit : > > > > > I specifically meant the core changes, but I would still > > > > > recommending > > > > > what Milos did which was to create branches for a few of the > > > > > affected > > > > > plugins to try it all together. > > > > > > > > ok, I created > > > > http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ > > > > with javadoc and jxr plugins branches to test the change, and sample > > > > use > > > > case. > > > > > > > > Most certainly to test new elements in > > > > > > > > > the POM you need to use a branch because we still don't have a > > > > > strategy for dealing with model changes. > > > > > > > > this one is more tricky, even if the change in pom.xml is a simple > > > > addition of > > > > an element... Don't really know how to handle this without breaking > > > > things > > > > for Maven 2.0 when an artifact with this addition is deployed to a > > > > repository. > > > > > > > > If plugins can be changed, used with the existing versions of Maven > > > > > > > > > with no disruption then do it in-situ. > > > > > > > > No problem here, no disruption, as proven by the test. > > > > The only risk is that the property chosen, > > > > ${project.build.sourceEncoding}, > > > > makes user think to a new element in > > > > the > > > > pom, but we still don't know how we will implement it: we bet on a > > > > solution > > > > we don't have currently. > > > > > > > > Hervé > > > > > > > > - > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > Thanks, > > > > > > Jason > > > > > > -- > > > Jason van Zyl > > > Founder, Apache Maven > > > jason at sonatype dot com > > > -- > > > > > > A man enjoys his work when he understands the whole and when he > > > is responsible for the quality of the whole > > > > > > -- Christopher Alexander, A Pattern Language > > > > > > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > Thanks, > > > > Jason > > > > -- > > Jason van Zyl > > Founder, Apache Maven > > jason at sonatype dot com > > -- > > > > Simplex sigillum veri. (Simplicity is the seal of truth.) > > > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
+1 for the original proposal, if a newcomer like me is allowed to vote. The concept with the property, which can be set with the properties until the model is updated, and which can be the default expression for affected plugins, is simply elegant. Jason van Zyl wrote: It would be reasonable to assume the detection could be based on a subset. For an organization on one project you could reasonable assume the same encoding. That would not be the case in an open source project as tools would vary. Suppose you have a huge source tree, mostly english ASCII, but somewhere in there there is a single degree sign, '\u00b0'. How would you detect it, short of scanning every ASCII file until you hit that one? I support concerns here that the cost of encoding detection may in many cases be prohibitively high. Maven runs too slow as it is, imho. You could of course write an encoding detection plugin which could examine the code and set the required property accordingly. But enabling that by default feels bad to me. What happens when the encoding is different then what is stated? Same problem really, in how to deal with the actual versus declared. Up to the plugins, I guess, as it is now. No change there, only a central place to set defaults for all plugins. Of course you could write an encoding checking plugin which ensures that your sources are valid in the specified encoding. My impression is that usage of JChardet will significantly increase code complexity without giving me a solid build. That would depend on what kinds of problems can arise if things are not consistent. There are three possible cases: 1. code agrees with setting => all right 2. code disagrees with setting, but is still valid under specified encoding => Mojibake 3. code is invalid under specified encoding => exception or unmappable character symbol, depending on context. Exception maybe handled by plugin. By specifying ISO-8859-1 as default input encoding, there are no unmappable characters, avoiding case 3. All input should be readable, though the output generated from this might not look as expected. It should be noted that plugins that generate code to be used by other plugins should have their output encoding default to the general input encoding, so that there are no breaks in the chain. As Jason writes about consistency, I guess the danger of inconsistent input handling, as different plugins might be configured to read it using different charsets, is exactly the kind of inconsistency to be addressed by this proposal, so I'd expect more consistency after it has been implemented, not less. Greetings, Martin von Gagern signature.asc Description: OpenPGP digital signature
Re: [VOTE] POM Element for Source File Encoding
On 8-Apr-08, at 11:11 AM, Milos Kleint wrote: +1 on Benjamin's objections to detection. It will slow down the build (possibly significantly) while providing little added value. Possibly, but you're guessing. Obviously checking the encoding on every file would be unwise. Trying to detect where it's not provided (mistakes), or can't be provided (not supported as an option in the model) you're going to have to do something. So what are you going to do in those cases? Milos On Tue, Apr 8, 2008 at 8:27 PM, Benjamin Bentmann <[EMAIL PROTECTED]> wrote: Jason van Zyl wrote: If it's right most of the time, and it saves the user from having to know or worry about it then yes I would use it. Could you elaborate this a little more. Say we start easy and have a build with just about 100 Java source files. Do you suggest to peek at each of them before passing them to a tool like javac or just a subset and how should this subset be determined? What should be done when the charset detection reports different encodings for the set of files to process? Will the charset detection happen over and over again for each plugin (javac, javadoc, jxr)? What do you consider "most of time", telling the various ISO-8859 families apart is not really easy. My impression is that usage of JChardet will significantly increase code complexity without giving me a solid build. Also, I believe it's a bad idea to free users from worrying about the encoding. This would be similar to the doubtful magic the JRE provides with its default encoding: It encourages developers to ignore the encoding issue, leading to platform-dependent behavior. Platform-dependent Java code is a bad practice and Maven, as far as I heard, aims at promoting best practices. File encoding is a parameter affecting your build output just like the source/target settings used for the compiler and hence should be explicitly controlled. As we talk about it: What is the agreed file encoding for the Maven sources (MNGSITE-46)? Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanks, Jason -- Jason van Zyl Founder, Apache Maven jason at sonatype dot com -- happiness is like a butterfly: the more you chase it, the more it will elude you, but if you turn your attention to other things, it will come and sit softly on your shoulder ... -- Thoreau - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
On 8-Apr-08, at 11:27 AM, Benjamin Bentmann wrote: Jason van Zyl wrote: If it's right most of the time, and it saves the user from having to know or worry about it then yes I would use it. Could you elaborate this a little more. Say we start easy and have a build with just about 100 Java source files. Do you suggest to peek at each of them before passing them to a tool like javac or just a subset and how should this subset be determined? It would be reasonable to assume the detection could be based on a subset. For an organization on one project you could reasonable assume the same encoding. That would not be the case in an open source project as tools would vary. What should be done when the charset detection reports different encodings for the set of files to process? What happens when the encoding is different then what is stated? Same problem really, in how to deal with the actual versus declared. Will the charset detection happen over and over again for each plugin (javac, javadoc, jxr)? What do you consider "most of time", telling the various ISO-8859 families apart is not really easy. My impression is that usage of JChardet will significantly increase code complexity without giving me a solid build. That would depend on what kinds of problems can arise if things are not consistent. Also, I believe it's a bad idea to free users from worrying about the encoding. You have to deal with the very real possibility no one is going to set it, not know what is, and report issues related to encoding even if the whole system works. I'm all for literal and declarative. In practice this does not happen all the time. I also didn't say use one over the other, but the detection may help in cases where it's not stated. The JChardet library was created for a reason, and this looks like one of them. For the system you are proposing there would be touch points at which you would look for encoding parameters. If those values are not state you will need a strategy to detect or you will never be able to support any encoding alignment in older versions of Maven without the encoding parameterization. This would be similar to the doubtful magic the JRE provides with its default encoding: It encourages developers to ignore the encoding issue, leading to platform-dependent behavior. Platform-dependent Java code is a bad practice and Maven, as far as I heard, aims at promoting best practices. Of course it is, but that doesn't negate that fact people don't necessarily follow best practices. But you are 1) going to need to deal with versions of Maven that don't support this encoding parameterization, and 2) you're going to have to deal with the case where it's stated wrong We should know combinations of encoding parameter that will work together and if they aren't stated, or stated wrong it's better to provide some fallback instead of just dying. File encoding is a parameter affecting your build output just like the source/target settings used for the compiler and hence should be explicitly controlled. Absolutely, but look at all the questions on the mailing list that expect many of these things to just be detected. People using Java 1.5 just expect you to be able to compile 1.5 code. That's not the case. Users in this case expect the right thing to happen. I'm willing to bet you if you asked the average user about encoding, they would have no clue and wonder why it wasn't detected. It was a suggestion based on experience of typical users. As we talk about it: What is the agreed file encoding for the Maven sources (MNGSITE-46)? Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanks, Jason -- Jason van Zyl Founder, Apache Maven jason at sonatype dot com -- We all have problems. How we deal with them is a measure of our worth. -- Unknown - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
+1 on Benjamin's objections to detection. It will slow down the build (possibly significantly) while providing little added value. Milos On Tue, Apr 8, 2008 at 8:27 PM, Benjamin Bentmann <[EMAIL PROTECTED]> wrote: > Jason van Zyl wrote: > > > If it's right most of the time, and it saves the user from having to know > > or worry about it then yes I would use it. > > > > Could you elaborate this a little more. Say we start easy and have a build > with just about 100 Java source files. Do you suggest to peek at each of > them before passing them to a tool like javac or just a subset and how > should this subset be determined? What should be done when the charset > detection reports different encodings for the set of files to process? Will > the charset detection happen over and over again for each plugin (javac, > javadoc, jxr)? What do you consider "most of time", telling the various > ISO-8859 families apart is not really easy. My impression is that usage of > JChardet will significantly increase code complexity without giving me a > solid build. > > Also, I believe it's a bad idea to free users from worrying about the > encoding. This would be similar to the doubtful magic the JRE provides with > its default encoding: It encourages developers to ignore the encoding > issue, > leading to platform-dependent behavior. Platform-dependent Java code is a > bad practice and Maven, as far as I heard, aims at promoting best > practices. > File encoding is a parameter affecting your build output just like the > source/target settings used for the compiler and hence should be explicitly > controlled. > > As we talk about it: What is the agreed file encoding for the Maven sources > (MNGSITE-46)? > > > > > Benjamin > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Jason van Zyl wrote: If it's right most of the time, and it saves the user from having to know or worry about it then yes I would use it. Could you elaborate this a little more. Say we start easy and have a build with just about 100 Java source files. Do you suggest to peek at each of them before passing them to a tool like javac or just a subset and how should this subset be determined? What should be done when the charset detection reports different encodings for the set of files to process? Will the charset detection happen over and over again for each plugin (javac, javadoc, jxr)? What do you consider "most of time", telling the various ISO-8859 families apart is not really easy. My impression is that usage of JChardet will significantly increase code complexity without giving me a solid build. Also, I believe it's a bad idea to free users from worrying about the encoding. This would be similar to the doubtful magic the JRE provides with its default encoding: It encourages developers to ignore the encoding issue, leading to platform-dependent behavior. Platform-dependent Java code is a bad practice and Maven, as far as I heard, aims at promoting best practices. File encoding is a parameter affecting your build output just like the source/target settings used for the compiler and hence should be explicitly controlled. As we talk about it: What is the agreed file encoding for the Maven sources (MNGSITE-46)? Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
On 8-Apr-08, at 1:09 AM, Benjamin Bentmann wrote: Jason van Zyl wrote: Would being able to detect the encoding help with making this less complicated. Something JChardet? I'm not really sure what you meant to say. JChardet is a library that performs a best *guess* on file encoding by peeking at a byte stream. We don't want to base our builds on heuristics, don't we? If it's right most of the time, and it saves the user from having to know or worry about it then yes I would use it. Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanks, Jason -- Jason van Zyl Founder, Apache Maven jason at sonatype dot com -- We all have problems. How we deal with them is a measure of our worth. -- Unknown - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Hervé Boutemy wrote: this one is more tricky, even if the change in pom.xml is a simple addition of an element... Don't really know how to handle this without breaking things for Maven 2.0 when an artifact with this addition is deployed to a repository. Handling POM additions is a more general concern and not really the point of our proposal. For Maven 2.0.x, adding a normal property ... to the super POM won't hurt the model validation for 4.0.0. For now, the simple question to answer is will the element by named like proposed? Once we get consensus about this name, we can continue to patch the plugins to use this property for the parameters, knowing that it will be forward-compatible with Maven 2.1. For Maven 2.1, a new model version will be introduced. Users that choose to employ this version will always experience build failures with Maven 2.0.x due to the failed model validation. Again, this is nothing specific to our proposal about . We just added another element to list of required POM additions: - custom profile activators - site directory - plugin management for reporting - ... The only risk is that the property chosen, ${project.build.sourceEncoding}, makes user think to a new element in the pom Yes, we will have to properly document this just like for the new import scope. Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Jason van Zyl wrote: Would being able to detect the encoding help with making this less complicated. Something JChardet? I'm not really sure what you meant to say. JChardet is a library that performs a best *guess* on file encoding by peeking at a byte stream. We don't want to base our builds on heuristics, don't we? Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Paul Benedict wrote: My only concern is that the encoding kind of assumes one kind of source file. We are well aware that different kind of text files may use different encodings. A simple example is using UTF-8 for Java source files and Latin-1 for properties files. However, the primary goal of the proposal is to replace the default encoding defined by the JVM (platform-dependent) with a value defined by the POM (platform-independent). Hence, we started off with a single default value. The emphasis lies on *default*, i.e. the proposed POM property/element is not intended as the final means to configure the employed file encoding throughout the entire project. It is just a value plugins can use to initialize their configuration in case the user did not explicitly specify an encoding. I am never in a position to have multiple encodings on my projects And I would argue that not too few people follow the same approach. Otherwise I can hardly understand why users did not already complain about those plugins don't provide an encoding parameter at all yet. Besides, not every IDE allows users to configure different file encodings in a single project so this seems really the major use case. but I suppose if you're compiling many differrent types of sources, people would want to tie the source to the extension type. A file extension is just one method to distinguish files, another one is context of use. I believe that having the possibility to configure file encoding on a per plugin basis is good enough to capture different types of files. If really someday the need to setup encodings per file extension arises, we can think more closely about that. But even then, I wouldn't like to write something like this in my POM to lock down the encoding for every file extension that might hang around in the project: txt,java,groovy,aj,bsh,apt,... UTF-8 I would want to have a single default value to catch the major case and this default value should in no case depend on my JVM. So I'm back on ${project.build.sourceEncoding}. Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
In Commons Validator, we updated the DTD even in point releases. I don't see the harm in doing the same here. After all, if the POM is 4.0.0, why not create a 4.0.1? It sounds like Maven 2 will have a 4.1 version. Paul On Mon, Apr 7, 2008 at 6:03 PM, Jason van Zyl <[EMAIL PROTECTED]> wrote: > > On 7-Apr-08, at 3:58 PM, Jason van Zyl wrote: > > > Would being able to detect the encoding help with making this less > > complicated. Something JChardet? > > > > > Sorry, something like JCharet: > > http://jchardet.sourceforge.net/ > > > On 7-Apr-08, at 2:31 PM, Hervé BOUTEMY wrote: > > > > > Le dimanche 06 avril 2008, Jason van Zyl a écrit : > > > > > > > I specifically meant the core changes, but I would still > > > > recommending > > > > what Milos did which was to create branches for a few of the > > > > affected > > > > plugins to try it all together. > > > > > > > ok, I created > > > http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ > > > with javadoc and jxr plugins branches to test the change, and sample > > > use > > > case. > > > > > > Most certainly to test new elements in > > > > the POM you need to use a branch because we still don't have a > > > > strategy for dealing with model changes. > > > > > > > this one is more tricky, even if the change in pom.xml is a simple > > > addition of > > > an element... Don't really know how to handle this without breaking > > > things > > > for Maven 2.0 when an artifact with this addition is deployed to a > > > repository. > > > > > > If plugins can be changed, used with the existing versions of Maven > > > > with no disruption then do it in-situ. > > > > > > > No problem here, no disruption, as proven by the test. > > > The only risk is that the property chosen, > > > ${project.build.sourceEncoding}, > > > makes user think to a new element in > > > the > > > pom, but we still don't know how we will implement it: we bet on a > > > solution > > > we don't have currently. > > > > > > Hervé > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > Thanks, > > > > Jason > > > > -- > > Jason van Zyl > > Founder, Apache Maven > > jason at sonatype dot com > > -- > > > > A man enjoys his work when he understands the whole and when he > > is responsible for the quality of the whole > > > > -- Christopher Alexander, A Pattern Language > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > Thanks, > > Jason > > -- > Jason van Zyl > Founder, Apache Maven > jason at sonatype dot com > -- > > Simplex sigillum veri. (Simplicity is the seal of truth.) > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
Re: [VOTE] POM Element for Source File Encoding
On 7-Apr-08, at 3:58 PM, Jason van Zyl wrote: Would being able to detect the encoding help with making this less complicated. Something JChardet? Sorry, something like JCharet: http://jchardet.sourceforge.net/ On 7-Apr-08, at 2:31 PM, Hervé BOUTEMY wrote: Le dimanche 06 avril 2008, Jason van Zyl a écrit : I specifically meant the core changes, but I would still recommending what Milos did which was to create branches for a few of the affected plugins to try it all together. ok, I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ with javadoc and jxr plugins branches to test the change, and sample use case. Most certainly to test new elements in the POM you need to use a branch because we still don't have a strategy for dealing with model changes. this one is more tricky, even if the change in pom.xml is a simple addition of an element... Don't really know how to handle this without breaking things for Maven 2.0 when an artifact with this addition is deployed to a repository. If plugins can be changed, used with the existing versions of Maven with no disruption then do it in-situ. No problem here, no disruption, as proven by the test. The only risk is that the property chosen, $ {project.build.sourceEncoding}, makes user think to a new element in the pom, but we still don't know how we will implement it: we bet on a solution we don't have currently. Hervé - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanks, Jason -- Jason van Zyl Founder, Apache Maven jason at sonatype dot com -- A man enjoys his work when he understands the whole and when he is responsible for the quality of the whole -- Christopher Alexander, A Pattern Language - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanks, Jason -- Jason van Zyl Founder, Apache Maven jason at sonatype dot com -- Simplex sigillum veri. (Simplicity is the seal of truth.) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Would being able to detect the encoding help with making this less complicated. Something JChardet? On 7-Apr-08, at 2:31 PM, Hervé BOUTEMY wrote: Le dimanche 06 avril 2008, Jason van Zyl a écrit : I specifically meant the core changes, but I would still recommending what Milos did which was to create branches for a few of the affected plugins to try it all together. ok, I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ with javadoc and jxr plugins branches to test the change, and sample use case. Most certainly to test new elements in the POM you need to use a branch because we still don't have a strategy for dealing with model changes. this one is more tricky, even if the change in pom.xml is a simple addition of an element... Don't really know how to handle this without breaking things for Maven 2.0 when an artifact with this addition is deployed to a repository. If plugins can be changed, used with the existing versions of Maven with no disruption then do it in-situ. No problem here, no disruption, as proven by the test. The only risk is that the property chosen, $ {project.build.sourceEncoding}, makes user think to a new element in the pom, but we still don't know how we will implement it: we bet on a solution we don't have currently. Hervé - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanks, Jason -- Jason van Zyl Founder, Apache Maven jason at sonatype dot com -- A man enjoys his work when he understands the whole and when he is responsible for the quality of the whole -- Christopher Alexander, A Pattern Language - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Le dimanche 06 avril 2008, Jason van Zyl a écrit : > I specifically meant the core changes, but I would still recommending > what Milos did which was to create branches for a few of the affected > plugins to try it all together. ok, I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ with javadoc and jxr plugins branches to test the change, and sample use case. > Most certainly to test new elements in > the POM you need to use a branch because we still don't have a > strategy for dealing with model changes. this one is more tricky, even if the change in pom.xml is a simple addition of an element... Don't really know how to handle this without breaking things for Maven 2.0 when an artifact with this addition is deployed to a repository. > If plugins can be changed, used with the existing versions of Maven > with no disruption then do it in-situ. No problem here, no disruption, as proven by the test. The only risk is that the property chosen, ${project.build.sourceEncoding}, makes user think to a new element in the pom, but we still don't know how we will implement it: we bet on a solution we don't have currently. Hervé - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Le lundi 07 avril 2008, Asgeir S. Nilsen a écrit : > 2008/4/5, Hervé BOUTEMY <[EMAIL PROTECTED]>: > > Hi, > > > > Since the discussion on the list about Maven and encoding 2 weeks ago, > > Benjamin and I worked on a proposal to have: > > 1. a central point of configuration of sources encoding, to be used by > > each and every plugin, > > 2. a default value set to ISO-8859-1 (instead of platform encoding) to > > have build reproducibility by default > > Out of curiosity, why would you go for 8859-1 and not UTF-8 or > US-ASCII? I would think it would be safer to either support any > extended character or no extended characters, and not something > halfway there? > > Asgeir US-ASCII: why limit to ASCII only when ISO-8859-1 is a superset? UTF-8: seems interesting in the first thought, but: - there are already plugins having ISO-8859-1 as default value - you can have invalid byte combinations for UTF-8, causing failures ISO-8859-1 seems the best compromise. Hervé - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
2008/4/5, Hervé BOUTEMY <[EMAIL PROTECTED]>: > Hi, > > Since the discussion on the list about Maven and encoding 2 weeks ago, > Benjamin and I worked on a proposal to have: > 1. a central point of configuration of sources encoding, to be used by each > and every plugin, > 2. a default value set to ISO-8859-1 (instead of platform encoding) to have > build reproducibility by default Out of curiosity, why would you go for 8859-1 and not UTF-8 or US-ASCII? I would think it would be safer to either support any extended character or no extended characters, and not something halfway there? Asgeir
Re: [VOTE] POM Element for Source File Encoding
+1 On Sat, Apr 5, 2008 at 2:20 PM, Hervé BOUTEMY <[EMAIL PROTECTED]> wrote: > Hi, > > Since the discussion on the list about Maven and encoding 2 weeks ago, > Benjamin and I worked on a proposal to have: > 1. a central point of configuration of sources encoding, to be used by each > and every plugin, > 2. a default value set to ISO-8859-1 (instead of platform encoding) to have > build reproducibility by default > > The full proposal is here: > > http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding > > As you'll see, we've already found 8 Apache plugins to change, and 4 Codehaus > ones. Before starting the code modifications, we need everybody to agree on > the proposal (and complete it if you know other places to change). > > The vote will be open for 72 hours. > > [ ] +1 > [ ] +0 > [ ] -1 > > Here is my +1 > > Regards, > > Hervé > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
My only concern is that the encoding kind of assumes one kind of source file. I am never in a position to have multiple encodings on my projects, but I suppose if you're compiling many differrent types of sources, people would want to tie the source to the extension type. Paul On Mon, Apr 7, 2008 at 10:10 AM, Benjamin Bentmann < [EMAIL PROTECTED]> wrote: > I'd like to know if this could also be achieved via toolchains. > > > > As Hervé already tried to explain, these two proposals have not too much > in > common. To my understanding, the toolchain proposal aims at providing a > facade on a user's development kit (native tools, boot class path, etc.) > such that projects can be build using a specific JDK regardless of the JRE > running Maven. I don't see any relation between > a) the selection of a native tool from a user's system > b) the configuration of file encoding for project source files > > Indeed, I consider this two orthogonal concerns. Each of the combinations > > | JRE 1.4 | JRE 1.5 | JRE 1.6 | ... > -+--+-+-+- > UTF-8|X |X|X| > Latin-1 |X |X|X| >... |X |X|X| > > represents a valid use case for a project configuration. > > What both proposals share is the intention to address these tasks via a > *central* configuration in the POM, i.e. configure target JRE and file > encoding once, not repeatedly for each plugin. > > If you feel that toolchains and file encoding fit nicely together and > don't > violate separation of concerns, please sketch your thoughts. > > > Benjamin > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
Re: [VOTE] POM Element for Source File Encoding
I'd like to know if this could also be achieved via toolchains. As Hervé already tried to explain, these two proposals have not too much in common. To my understanding, the toolchain proposal aims at providing a facade on a user's development kit (native tools, boot class path, etc.) such that projects can be build using a specific JDK regardless of the JRE running Maven. I don't see any relation between a) the selection of a native tool from a user's system b) the configuration of file encoding for project source files Indeed, I consider this two orthogonal concerns. Each of the combinations | JRE 1.4 | JRE 1.5 | JRE 1.6 | ... -+--+-+-+- UTF-8|X |X|X| Latin-1 |X |X|X| ... |X |X|X| represents a valid use case for a project configuration. What both proposals share is the intention to address these tasks via a *central* configuration in the POM, i.e. configure target JRE and file encoding once, not repeatedly for each plugin. If you feel that toolchains and file encoding fit nicely together and don't violate separation of concerns, please sketch your thoughts. Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Please clarify the proposal. When you say "source" files, you mean things like Java files not POM files? Yes, "source file" is meant to refer to a plain text file that does not have an encoding declaration or similar like XML. XML is fine, it's ugly to parse but provides the user with means to specify the used file encoding. Our proposal is about all the other text files that rely on external configuration to transfer the used file encoding. As such, the proposal is not about POM, FML, XDOC or whatever XML file you can imagine. Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
+1 .. I'd like to know if this could also be achieved via toolchains. Hervé BOUTEMY wrote: > > > > > Hi, > > > > > > Since the discussion on the list about Maven and encoding 2 weeks ago, > > > Benjamin and I worked on a proposal to have: > > > 1. a central point of configuration of sources encoding, to be used by > > > each and every plugin, > > > 2. a default value set to ISO-8859-1 (instead of platform encoding) to > > > have build reproducibility by default > > > > > > The full proposal is here: > > > > > > http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding > > > > > > As you'll see, we've already found 8 Apache plugins to change, and 4 > > > Codehaus ones. Before starting the code modifications, we need everybody > > > to > > > agree on the proposal (and complete it if you know other places to > > > change). > > > > > > The vote will be open for 72 hours. > > > > > > [ ] +1 > > > [ ] +0 > > > [ ] -1 > > > > > > Here is my +1 > > > > > > Regards, > > > > > > Hervé > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > -- > > Dennis Lundberg > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > >
Re: [VOTE] POM Element for Source File Encoding
Please clarify the proposal. When you say "source" files, you mean things like Java files not POM files? Paul On Sun, Apr 6, 2008 at 2:56 PM, Dennis Lundberg <[EMAIL PROTECTED]> wrote: > +1 > > Hervé BOUTEMY wrote: > > > Hi, > > > > Since the discussion on the list about Maven and encoding 2 weeks ago, > > Benjamin and I worked on a proposal to have: > > 1. a central point of configuration of sources encoding, to be used by > > each and every plugin, > > 2. a default value set to ISO-8859-1 (instead of platform encoding) to > > have build reproducibility by default > > > > The full proposal is here: > > > > http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding > > > > As you'll see, we've already found 8 Apache plugins to change, and 4 > > Codehaus ones. Before starting the code modifications, we need everybody to > > agree on the proposal (and complete it if you know other places to change). > > > > The vote will be open for 72 hours. > > > > [ ] +1 > > [ ] +0 > > [ ] -1 > > > > Here is my +1 > > > > Regards, > > > > Hervé > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > -- > Dennis Lundberg > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
Re: [VOTE] POM Element for Source File Encoding
+1 Hervé BOUTEMY wrote: Hi, Since the discussion on the list about Maven and encoding 2 weeks ago, Benjamin and I worked on a proposal to have: 1. a central point of configuration of sources encoding, to be used by each and every plugin, 2. a default value set to ISO-8859-1 (instead of platform encoding) to have build reproducibility by default The full proposal is here: http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding As you'll see, we've already found 8 Apache plugins to change, and 4 Codehaus ones. Before starting the code modifications, we need everybody to agree on the proposal (and complete it if you know other places to change). The vote will be open for 72 hours. [ ] +1 [ ] +0 [ ] -1 Here is my +1 Regards, Hervé - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dennis Lundberg - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
On 5-Apr-08, at 3:13 PM, Benjamin Bentmann wrote: Jason van Zyl wrote: You don't need a 72 hour vote, I would try it in a branch first and then get people to look at it. Just wondering: If I would fill in JIRAs for each affected plugin to request a) adding an encoding parameter if not already existent b) making this parameter default to Latin-1 would we start branches on the plugins for each of these issues? I mean this proposal is not about a revolutionary new feature, it's merely the attempt to create a guideline for consistent encoding handling in the various source processing plugins. More precisely, we're seeking consensus that a) the core team will eventually introduce a new POM element for this in Maven 2.1, named project.build.sourceEncoding or whatever we agree upon I specifically meant the core changes, but I would still recommending what Milos did which was to create branches for a few of the affected plugins to try it all together. Most certainly to test new elements in the POM you need to use a branch because we still don't have a strategy for dealing with model changes. If plugins can be changed, used with the existing versions of Maven with no disruption then do it in-situ. b) in the meantime, Maven 2.0.x will define an equally name property for this in its super POM c) it's OK to have Latin-1 as default encoding rather than the platform encoding Also, this is not going to be a code change that plops out one day as a huge merge back into trunk. Rather, it's an incremental process where the required improvements to plugin X can be made independently of the development on plugin Y. For example, MPLUGIN-101 and MINVOKER-30 already have patches for this topic pending. Is it really expected to open a branch, apply the patches to the branch and merge back (the same day) instead of applying them directly to trunk? Do I underestimate this? Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanks, Jason -- Jason van Zyl Founder, Apache Maven jason at sonatype dot com -- the course of true love never did run smooth ... -- Shakespeare - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Jason van Zyl wrote: You don't need a 72 hour vote, I would try it in a branch first and then get people to look at it. Just wondering: If I would fill in JIRAs for each affected plugin to request a) adding an encoding parameter if not already existent b) making this parameter default to Latin-1 would we start branches on the plugins for each of these issues? I mean this proposal is not about a revolutionary new feature, it's merely the attempt to create a guideline for consistent encoding handling in the various source processing plugins. More precisely, we're seeking consensus that a) the core team will eventually introduce a new POM element for this in Maven 2.1, named project.build.sourceEncoding or whatever we agree upon b) in the meantime, Maven 2.0.x will define an equally name property for this in its super POM c) it's OK to have Latin-1 as default encoding rather than the platform encoding Also, this is not going to be a code change that plops out one day as a huge merge back into trunk. Rather, it's an incremental process where the required improvements to plugin X can be made independently of the development on plugin Y. For example, MPLUGIN-101 and MINVOKER-30 already have patches for this topic pending. Is it really expected to open a branch, apply the patches to the branch and merge back (the same day) instead of applying them directly to trunk? Do I underestimate this? Benjamin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
Le samedi 05 avril 2008, nicolas de loof a écrit : > +1 > > Is there any overlap with the tool chain proposal ? as I understand the tool chain proposal, no overlap at all the tool chain is here to let a central place to configure tools on every developer environment (like where is javac 1.5) source file encoding is not tied to a developer's environment: it's precisely the contrary, it has to be configured in the project and the project only (hence the problem with default value being platform encoding, which is implicitely dependent on developer's environment) > > Nico > > 2008/4/5, Hervé BOUTEMY <[EMAIL PROTECTED]>: > > Hi, > > > > Since the discussion on the list about Maven and encoding 2 weeks ago, > > Benjamin and I worked on a proposal to have: > > 1. a central point of configuration of sources encoding, to be used by > > each > > and every plugin, > > 2. a default value set to ISO-8859-1 (instead of platform encoding) to > > have > > build reproducibility by default > > > > The full proposal is here: > > > > http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+En > >coding > > > > As you'll see, we've already found 8 Apache plugins to change, and 4 > > Codehaus > > ones. Before starting the code modifications, we need everybody to agree > > on > > the proposal (and complete it if you know other places to change). > > > > The vote will be open for 72 hours. > > > > [ ] +1 > > [ ] +0 > > [ ] -1 > > > > Here is my +1 > > > > Regards, > > > > Hervé > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
You don't need a 72 hour vote, I would try it in a branch first and then get people to look at it. It's a good idea, just don't do it on trunk directly so that we have the before and after to compare. On 5-Apr-08, at 10:20 AM, Hervé BOUTEMY wrote: Hi, Since the discussion on the list about Maven and encoding 2 weeks ago, Benjamin and I worked on a proposal to have: 1. a central point of configuration of sources encoding, to be used by each and every plugin, 2. a default value set to ISO-8859-1 (instead of platform encoding) to have build reproducibility by default The full proposal is here: http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding As you'll see, we've already found 8 Apache plugins to change, and 4 Codehaus ones. Before starting the code modifications, we need everybody to agree on the proposal (and complete it if you know other places to change). The vote will be open for 72 hours. [ ] +1 [ ] +0 [ ] -1 Here is my +1 Regards, Hervé - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanks, Jason -- Jason van Zyl Founder, Apache Maven jason at sonatype dot com -- A party which is not afraid of letting culture, business, and welfare go to ruin completely can be omnipotent for a while. -- Jakob Burckhardt - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
On Sat, Apr 5, 2008 at 7:20 PM, Hervé BOUTEMY <[EMAIL PROTECTED]> wrote: [...] > The full proposal is here: > > http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding Non-binding +1 Regards, Tomek - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
+1 Benjamin Hervé BOUTEMY wrote: Hi, Since the discussion on the list about Maven and encoding 2 weeks ago, Benjamin and I worked on a proposal to have: 1. a central point of configuration of sources encoding, to be used by each and every plugin, 2. a default value set to ISO-8859-1 (instead of platform encoding) to have build reproducibility by default The full proposal is here: http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding As you'll see, we've already found 8 Apache plugins to change, and 4 Codehaus ones. Before starting the code modifications, we need everybody to agree on the proposal (and complete it if you know other places to change). The vote will be open for 72 hours. [ ] +1 [ ] +0 [ ] -1 Here is my +1 Regards, Hervé - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] POM Element for Source File Encoding
+1 Is there any overlap with the tool chain proposal ? Nico 2008/4/5, Hervé BOUTEMY <[EMAIL PROTECTED]>: > > Hi, > > Since the discussion on the list about Maven and encoding 2 weeks ago, > Benjamin and I worked on a proposal to have: > 1. a central point of configuration of sources encoding, to be used by > each > and every plugin, > 2. a default value set to ISO-8859-1 (instead of platform encoding) to > have > build reproducibility by default > > The full proposal is here: > > http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding > > As you'll see, we've already found 8 Apache plugins to change, and 4 > Codehaus > ones. Before starting the code modifications, we need everybody to agree > on > the proposal (and complete it if you know other places to change). > > The vote will be open for 72 hours. > > [ ] +1 > [ ] +0 > [ ] -1 > > Here is my +1 > > Regards, > > Hervé > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >