Re: [VOTE] POM Element for Source File Encoding
Benjamin Bentmann wrote: In general, I completely agree with your preference to Unicode and fail-fast behavior. If I had been involved when the Maven story started, I would have proposed UTF-8 as the default value, no doubt. As for today, I tried to consider consistency with existing behavior. The Maven Site Plugin was already using Latin-1 as the default value for inputEncoding and outputEncoding and so I proposed this for other plugins, too. Indeed, one of the patches (MJAVADOC-165) was just released such that already two plugins teach users this default value. Therefore I fear it might be too late to introduce another default value. If the community believes this change is worth the confusion caused on users, I'm the first one running the other way round ;-) I see your point. Worth another vote? Or should this switch be postponed to 2.1, trading consistency in minor version upgrades for a longer time for these Latin1 defaults to be established? Given the failfast nature of the UTF-8 default, we won't have to worry about the switch going unnoticed. Developers switching from a version defaulting to Latin1 to UTF-8 will notice the change immediately, and for development in a heterogenous environment they can simply override the super-POM with their own default. So while I agree that a change in default either now or in the future is ugly, it is not taboo, and I believe woth the gain. That's a good point. It appears we need to do some extra homework here: The simplisitic use of InputStreamReader and OutputStreamReader will silently convert unmappable byte sequences to a default character ('?', see also [0]). I guess we could nicely hide the required implementation by means of the existing methods in Reader-/WriterFactory from plexus-utils. That works for plugins doing the conversion in code under our control. Other plugins that use external libraries or tools might be more difficult. Note that ASCII-only sources will compile cleanly no matter the default encoding Most of time, but UTF-16 or EBCDIC have not even ASCII in common. I was thinking about the default of the default, i.e. the value to be set in the super-POM. We certainly won't choose UTF-16 or EBCDIC for this global default, and as files encoded in UTF-16 or EBCDIC don't count as ASCII-only, my Martin signature.asc Description: OpenPGP digital signature
Re: [VOTE] POM Element for Source File Encoding
Benjamin Bentmann wrote: With regard to user errors, my general suggestion is to fail the build. This unforgiving attitude should not be that unfamilar to users: It has been chosen for a popular format like XML which is also employed by Maven for a few files. The problems depend on the encodings: If one feeds Latin-1 into an UTF-8 decoder, you most likely encouter invalid byte sequences, making the decoder fail. That's my favorite case as it clearly shows the user something is wrong and needs his attention. The other case is worse because more subtle: Feeding UTF-8 into a Latin-1 decoder will pass but produces output that only a human can tell being garbage by closing analyzing the few Non-ASCII characters. Taking this together, one might argue to have UTF-8 the default, not ISO-8859-1. Almost anything that passes UTF-8 encoding constraints will be indeed UTF-8, as non-ASCII files that are not UTF-8 will almost certainly contain sequences not valid in UTF-8. So if a user fails to specify the encoding he uses, and if this encoding isn't UTF-8, then things will break for him. This has two advantages: 1. fail-fast behaviour. If there is a misconfiguration, the maven run will die, and the developer can fix the issue. You don't have to wait for some other developer complaining about garbled strings or a user complaining about a broken website until you can fix things. 2. promote unicode. While there are a lot of encosings out there for historic reasons, most of them suffer severe drawbacks in an international software project, because they either can't express all needed characters, or they are not common outside a small region. So while Taiwanese developers might be happy to develop an English/Chinese project in Big5, prospective american Contributors might not get their editor to load files as Big5. UTF-8, on the other hand, is used worldwide and provides the whole Unicode range. For new projects, I guess UTF-8 would be a reasonable best practice, and making this best practice the default in maven might promote it. Of course this conflicts with previous discussions about Latin1 ensuring that any project can get compiled, as it has no invalid byte sequences. The choice is whether, in the absence of configuration, A) you want your compile to succeed all the time, possibly generating the wrong results, or B) you want your build to fail in case of a misconfiguration (including missing configuration), but ensure correct results if it does not fail. If I understood him correctly, Jason voted for A). I took his request for non-dying builds as a requirement and pointed out that this is possible with Latin1. Now that I think about it, I believe I would rather want B), as I'm all for failfast deterministic behaviour. It should be checked whether plugins really die for invalid UTF-8 sequences, and what the output looks like. If possible, plugins should point out that a misconfiguration of the encoding in the pom (either the plugin configuration or the proposed global configuration property) is possibly the cause of the error, if it's not a developer using another encoding. Note that ASCII-only sources will compile cleanly no matter the default encoding, so all projects that don't need to worry about encoding won't be forced to do so. Only international projects where encoding is relevant will force their developers to either follow best practices or explicitely state their policy. Greetings, Martin signature.asc Description: OpenPGP digital signature
Re: [VOTE] POM Element for Source File Encoding
Paul Benedict wrote: Just a proposal: Maven could loosen its parsing rules when it detects versions greater than it is configured to accept. Forward compatibility would be nice. For anyone seriously interested in interoperability , I suggest a look at http://www.w3.org/2005/05/xsd-versioning-resources.html , especially the use cases, which illustrate several issues quite well. Martin signature.asc Description: OpenPGP digital signature
Re: [VOTE] POM Element for Source File Encoding
Benjamin Bentmann wrote: You could of course write an encoding detection plugin which could examine the code and set the required property accordingly. Personally, I don't see the use case for that. If there are really users out there that don't know what file encoding they are using when writing up their sources, they are most probably happy with the proposed default value of Latin-1. Alternatively, this encoding detection plugin could be as simple as printing out the Java system property ${file.encoding} which obviously worked well enough for the user. ${file.encoding} will only work if the file originated on the same machine. I think of semi-automatic conversions of inhomogenous code into maven. E.g. some teacher collects homework from his students as a bunch of zip files containing only source, has a script to turn each into a maven project, and a master project interacting with them, like letting them compete with one another or whatever. In this case one might wish to automatically detect the encoding of every module, especially in locales with several commonly used encodings, so that string literals in these classes are handled correctly without the students even knowing what an encoding is. But that's a corner case, so I guess we should stop discussion about the use of such a program here, until someone actually requires it. Greetings, Martin signature.asc Description: OpenPGP digital signature
Re: [VOTE] POM Element for Source File Encoding
+1 for the original proposal, if a newcomer like me is allowed to vote. The concept with the property, which can be set with the properties until the model is updated, and which can be the default expression for affected plugins, is simply elegant. Jason van Zyl wrote: It would be reasonable to assume the detection could be based on a subset. For an organization on one project you could reasonable assume the same encoding. That would not be the case in an open source project as tools would vary. Suppose you have a huge source tree, mostly english ASCII, but somewhere in there there is a single degree sign, '\u00b0'. How would you detect it, short of scanning every ASCII file until you hit that one? I support concerns here that the cost of encoding detection may in many cases be prohibitively high. Maven runs too slow as it is, imho. You could of course write an encoding detection plugin which could examine the code and set the required property accordingly. But enabling that by default feels bad to me. What happens when the encoding is different then what is stated? Same problem really, in how to deal with the actual versus declared. Up to the plugins, I guess, as it is now. No change there, only a central place to set defaults for all plugins. Of course you could write an encoding checking plugin which ensures that your sources are valid in the specified encoding. My impression is that usage of JChardet will significantly increase code complexity without giving me a solid build. That would depend on what kinds of problems can arise if things are not consistent. There are three possible cases: 1. code agrees with setting => all right 2. code disagrees with setting, but is still valid under specified encoding => Mojibake 3. code is invalid under specified encoding => exception or unmappable character symbol, depending on context. Exception maybe handled by plugin. By specifying ISO-8859-1 as default input encoding, there are no unmappable characters, avoiding case 3. All input should be readable, though the output generated from this might not look as expected. It should be noted that plugins that generate code to be used by other plugins should have their output encoding default to the general input encoding, so that there are no breaks in the chain. As Jason writes about consistency, I guess the danger of inconsistent input handling, as different plugins might be configured to read it using different charsets, is exactly the kind of inconsistency to be addressed by this proposal, so I'd expect more consistency after it has been implemented, not less. Greetings, Martin von Gagern signature.asc Description: OpenPGP digital signature
Compiling for API compliance
Hi! I would like to compile code not only for a given class file format version, but also to the corresponding Java API specification. There should be different settings for main and test code, as the main code should be highly portable, while test code might make use of quick and dirty features that only became available more recently. I had already started a mail with this subject in users@, heard that what I want to do isn't possible so far, and now I want to change things so it becomes possible. http://www.nabble.com/Compiling-for-API-compliance-td16538018s177.html I'd like for my projects to simply set two variables, like main.java.version and test.java.version, which should affect the source and target version of the compiler, as well as either the jdk version selected, or the bootclasspath for the compiler be set accordingly. The latter would have the benefit that it would still use the newer compiler, thus allowing access to newer compiler flags, e.g. for lint, while still ensuring API compatibility to an older version. Originally I intended to write my own plugin, derived from maven-compiler-plugin, to implement this. However, I found out that the compiler plugin provides no access to its private fields, and hacking at them with reflection feels very nasty, so I'll not do that. Instead of maintaining my private branch of the java compiler, maybe this thing is interesting enough for people out there to warrant implementation in the main compiler plugin. Would you agree? My first primary concern is how to proceed with this. I could write a feature request for the compiler plugin in JIRA, start a wiki page on docs.codehaus, or continue discussion here. Or I could do several of these things at the same time. There are also some open questions, which I will list below. == 1. POM Configuration == How to configure this in the POM? I guess backwards compatibility should be preserved. The current compiler allows for and . I could think of these settings: expression="${maven.compiler.source}" default="${main.java.version}" expression="${maven.compiler.target}" default="${main.java.version}" expression="${main.java.version}" expression="${maven.compiler.source}" default="${test.java.version}" expression="${maven.compiler.target}" default="${test.java.version}" expression="${test.java.version}" This way, all current settings should continue to work, unless someone used one of the newly introduced properties for some different purpose. More complex mixing of property variables could be done in a parent plugin configuration, with the corresponding properties set in submodules as required. Do you agree that this set of configuration parameters would indeed make sense? http://jira.codehaus.org/browse/MCOMPILER-15 == 2. Toolchain == The modified plugin would need information about what compilers are available, and where the corresponding executables can be located. This information is system-specific, not project-specific, so it should reside in a maven config file. This sounds a lot like the toolchains proposal, so maybe there is some way to leverage that. http://jira.codehaus.org/browse/MNG-468 http://docs.codehaus.org/display/MAVEN/Toolchains http://docs.codehaus.org/display/MAVEN/Toolchains?showComments=true&focusedCommentId=77693099#comment-77693099 == 3. Compiler arguments == I only know about javac, so I'm not sure whether this way to set the bootclasspath would work for other compilers as well. It would be nice to have this handled in a consistent way in the plexus compiler manager component, but I guess that would mean changing quite a lot of code. Should this be targeted to javac only for the time being? Does someone know about corresponding settings for other compilers? == 4. Default behaviour == What should be done when the requirements cannot be met? I guess a warning would be a good solution. YOu might even want to get a hard error for release builds. Should there be a parameter to modify this behaviour? Let me know what you think of all this, how you would suggest I proceed, and what other resources might be useful. I don't have too much time to spare for this issue, but I believe it important enough to do some work for it from time to time, and when I do so, I might as well do so in a way that others can profit from it as well. Greetings, Martin von Gagern signature.asc Description: OpenPGP digital signature
Re: ANSI color logging in Maven
James William Dumay wrote: Rahul, Something like this library might help you in your quest... http://sourceforge.net/projects/javacurses/ James CHARVA might be useful as well: http://www.pitman.co.za/projects/charva/ It seems both require a native DLL in order to work properly. This makes sense for things like single character input, echo control and similar terminal settings. I should assume that color output would work without curses, simply using the escape sequences as mentioned. So I'd keep javacurses and charva as fallback, or use them if available without depending on them. Greetings, Martin P.S.: I just recently subscribed to the list, and didn't receive the mail I'm responding to, so maybe this answer will break the thread in some views. Sorry about that. signature.asc Description: OpenPGP digital signature