[jira] [Comment Edited] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered
[ https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159155#comment-17159155 ] Dennis Lundberg edited comment on MRESOURCES-171 at 7/16/20, 2:09 PM: -- I'm trying to work out how to warn the users properly, if they have not set the new parameter for properties file encoding. The method DefaultMavenResourcesFiltering#filterResources() in maven-filtering currently handles warning users if they have not set the regular encoding parameter. So I would like to add the new warning there as well. That way all plugins will benefit from this, without having to write any code for it. Locally I tried to add the warning in a similar way as for regular encoding. That however gets very noisy, since all the builds in the world would get this warning, if they just update to the new plugin. So I'm trying to make a fail fast, and not too time consuming, educated guess as to whether a project warrants a warning or not. Here's what I've thought of so far. Show a warning to the user if *all* of the following are true: # the propertiesEncoding parameter has not been set # properties is a filtered extension # filtering is enabled for at least one resource # there is at least one properties file in one of the resources that has filtering enabled Thoughs and comments are most welcome! was (Author: dennisl): I'm trying to work out how to warn the users properly, if they have not set the new parameter for properties file encoding. The method DefaultMavenResourcesFiltering#filterResources() in maven-filtering currently handles warning users if they have not set the regular encoding parameter. So I would like to add the new warning there as well. That way all plugins will benefit from this, without having to write any code for it. Locally I tried to add the warning in a similar way as for regular encoding. That however gets very noisy, since all the builds in the world would get this warning, if they just update to the new plugin. So I'm trying to make a fail fast, and not too time consuming, educated guess as to whether a project warrants a warning or not. Here's what I've thought of so far. Show a warning to the user if *all* of the following are true: # .properties is a filtered extension, taking into account any configured nonFilteredFileExtensions # resource.isFiltering() is true for any of the resources # there is at least one properties file among all the files in all the resources Thoughs and comments are most welcome! > ISO8859-1 properties files get changed into UTF-8 when filtered > --- > > Key: MRESOURCES-171 > URL: https://issues.apache.org/jira/browse/MRESOURCES-171 > Project: Maven Resources Plugin > Issue Type: Bug > Components: filtering >Reporter: Alex Collins >Priority: Minor > Attachments: filtering-bug.zip > > > Create: > src/main/resources/test.properties > And add a ISO8859-1 character that is not ASCII or UTF-8, do not use \u > formatting. > When adding this line: > src/main/resourcestrue > Expected: > ISO8859-1 encoded file in jar. > Actual: > UTF-8 encoded file in jar. > --- > If there are any property files (which can only be ISO8859-1) they appear to > be converted into UTF-8 in the jar. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered
[ https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159155#comment-17159155 ] Dennis Lundberg edited comment on MRESOURCES-171 at 7/16/20, 12:02 PM: --- I'm trying to work out how to warn the users properly, if they have not set the new parameter for properties file encoding. The method DefaultMavenResourcesFiltering#filterResources() in maven-filtering currently handles warning users if they have not set the regular encoding parameter. So I would like to add the new warning there as well. That way all plugins will benefit from this, without having to write any code for it. Locally I tried to add the warning in a similar way as for regular encoding. That however gets very noisy, since all the builds in the world would get this warning, if they just update to the new plugin. So I'm trying to make a fail fast, and not too time consuming, educated guess as to whether a project warrants a warning or not. Here's what I've thought of so far. Show a warning to the user if *all* of the following are true: # .properties is a filtered extension, taking into account any configured nonFilteredFileExtensions # resource.isFiltering() is true for any of the resources # there is at least one properties file among all the files in all the resources Thoughs and comments are most welcome! was (Author: dennisl): I'm trying to work out how to warn the users properly, if they have not set the new parameter for properties file encoding. The DefaultMavenResourcesFiltering#filterResources() in maven-filtering currently handles warning users if they have not set the regular encoding parameter. So I would like to add the new warning there as well. That way all plugins will benefit from this, wihtout having to write any code for it. Locally I tried to add the warning in a similar way as for regular encoding. That however gets very noisy, since all the builds in the world would get this warning, if they just update to the new plugin. So I'm trying to make a fail fast, and not too time consuming, educated guess as to whether a project warrants a warning or not. Here's what I've thought of so far. Show a warning to the user if *all* of the following are true: # .properties is a filtered extension, taking into account any configured nonFilteredFileExtensions # resource.isFiltering() is true for any of the resources # there is at least one properties file among all the files in all the resources Thoughs and comments are most welcome! > ISO8859-1 properties files get changed into UTF-8 when filtered > --- > > Key: MRESOURCES-171 > URL: https://issues.apache.org/jira/browse/MRESOURCES-171 > Project: Maven Resources Plugin > Issue Type: Bug > Components: filtering >Reporter: Alex Collins >Priority: Minor > Attachments: filtering-bug.zip > > > Create: > src/main/resources/test.properties > And add a ISO8859-1 character that is not ASCII or UTF-8, do not use \u > formatting. > When adding this line: > src/main/resourcestrue > Expected: > ISO8859-1 encoded file in jar. > Actual: > UTF-8 encoded file in jar. > --- > If there are any property files (which can only be ISO8859-1) they appear to > be converted into UTF-8 in the jar. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered
[ https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159155#comment-17159155 ] Dennis Lundberg edited comment on MRESOURCES-171 at 7/16/20, 12:01 PM: --- I'm trying to work out how to warn the users properly, if they have not set the new parameter for properties file encoding. The DefaultMavenResourcesFiltering#filterResources() in maven-filtering currently handles warning users if they have not set the regular encoding parameter. So I would like to add the new warning there as well. That way all plugins will benefit from this, wihtout having to write any code for it. Locally I tried to add the warning in a similar way as for regular encoding. That however gets very noisy, since all the builds in the world would get this warning, if they just update to the new plugin. So I'm trying to make a fail fast, and not too time consuming, educated guess as to whether a project warrants a warning or not. Here's what I've thought of so far. Show a warning to the user if *all* of the following are true: # .properties is a filtered extension, taking into account any configured nonFilteredFileExtensions # resource.isFiltering() is true for any of the resources # there is at least one properties file among all the files in all the resources Thoughs and comments are most welcome! was (Author: dennisl): I'm trying to work out how to warn the users properly, if they have not set the new parameter for properties file encoding. The in DefaultMavenResourcesFiltering#filterResources() in maven-filtering currently handles warning users if they have not set the regular encoding parameter. So I would like to add the new warning there as well. That way all plugins will benefit from this, wihtout having to write any code for it. Locally I tried to add the warning in a similar way as for regular encoding. That however gets very noisy, since all the builds in the world would get this warning, if they just update to the new plugin. So I'm trying to make a fail fast, and not too time consuming, educated guess as to whether a project warrants a warning or not. Here's what I've thought of so far. Show a warning to the user if *all* of the following are true: # .properties is a filtered extension, taking into account any configured nonFilteredFileExtensions # resource.isFiltering() is true for any of the resources # there is at least one properties file among all the files in all the resources Thoughs and comments are most welcome! > ISO8859-1 properties files get changed into UTF-8 when filtered > --- > > Key: MRESOURCES-171 > URL: https://issues.apache.org/jira/browse/MRESOURCES-171 > Project: Maven Resources Plugin > Issue Type: Bug > Components: filtering >Reporter: Alex Collins >Priority: Minor > Attachments: filtering-bug.zip > > > Create: > src/main/resources/test.properties > And add a ISO8859-1 character that is not ASCII or UTF-8, do not use \u > formatting. > When adding this line: > src/main/resourcestrue > Expected: > ISO8859-1 encoded file in jar. > Actual: > UTF-8 encoded file in jar. > --- > If there are any property files (which can only be ISO8859-1) they appear to > be converted into UTF-8 in the jar. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered
[ https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158128#comment-17158128 ] Aaron Digulla edited comment on MRESOURCES-171 at 7/15/20, 12:40 PM: - Short discussion regarding the default value: project.build.sourceEncoding: Pro: It's not a breaking change. Con: 99% of all Java developers are not aware that the problem even exists. Many are US developers who don't care about characters outside the ASCII charset, so they're not affected. This would mean that most builds will stay broken without anyone noticing. Only when translations into other languages are added, weird things will happen and people will be confused. Frankly, even many developers in Europe don't understand the problem (just look at the comments here where people argue UFT-8 is good/better/valid when it's clearly not). ISO-8859-1: Pro: That's what it should have been all along. ISO-8859-1 can process UTF-8 unchanged since the encoding is binary stable (every byte of input maps to the same byte of output). So while a human would see those UTF-8 sequences for umlauts and special characters, the computer doesn't care. This can only fail when people use resource filtering and try to replace a variable with a System property with special characters. Pure ASCII replacements still work. That's the only corner case where we get the dreaded UTF-8 sequence unrolling (where you start to see those à characters). Con: There is a chance that builds will break if people added the wrong workaround to fix the issue. One fix would be the complex config above. As far as I can tell, the fix above is compatible with ISO-8859-1 as default. It can get messy when people have changed the loading code to use UTF-8. That being said, if you would chose the default to stay UTF-8, projects would silently fail for a long time without anyone noticing. I think this is bad. When something is broken, it should blow up in a way that people can see and do something about it. So as I see it, using the correct default (as Java defines it) will break a small number of builds but the fix is easy: Remove all workarounds. If people really don't like it, they can stay with the old version of the plugin. That's just a two minute change in the POM. What I would like is a warning or error when you're affected. Maybe we should check for characters with codePoint >= 128 && check whether resource filtering is enabled and print a warning? was (Author: digulla): Short discussion regarding the default value: project.build.sourceEncoding: Pro: It's not a breaking change. Con: 99% of all Java developers are not aware that the problem even exists. Many are US developers who don't care about characters outside the ASCII charset, so they're not affected. This would mean that most builds will stay broken without anyone noticing. Only when translations into other languages are added, weird things will happen and people will be confused. ISO-8859-1: Pro: That's what it should have been all along. ISO-8859-1 can process UTF-8 unchanged since the encoding is binary stable (every byte of input maps to the same byte of output). So while a human would see those UTF-8 sequences for umlauts and special characters, the computer doesn't care. This can only fail when people use resource filtering and try to replace a variable with a System property with special characters. Pure ASCII replacements still work. That's the only corner case where we get the dreaded UTF-8 sequence unrolling (where you start to see those à characters). Con: There is a chance that builds will break if people added the wrong workaround to fix the issue. One fix would be the complex config above. As far as I can tell, the fix above is compatible with ISO-8859-1 as default. It can get messy when people have changed the loading code to use UTF-8. That being said, if you would chose the default to stay UTF-8, projects would silently fail for a long time without anyone noticing. I think this is bad. When something is broken, it should blow up in a way that people can see and do something about it. So as I see it, using the correct default (as Java defines it) will break a small number of builds but the fix is easy: Remove all workarounds. What I would like is a warning or error when you're affected. Maybe we should check for characters with codePoint >= 128 && check whether resource filtering is enabled and print a warning? > ISO8859-1 properties files get changed into UTF-8 when filtered > --- > > Key: MRESOURCES-171 > URL: https://issues.apache.org/jira/browse/MRESOURCES-171 > Project: Maven Resources Plugin > Issue Type: Bug > Components: filtering >Reporter: Alex Collins >
[jira] [Comment Edited] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered
[ https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801493#comment-16801493 ] Aaron Digulla edited comment on MRESOURCES-171 at 3/26/19 9:03 AM: --- My preferred solution is: The plugin uses ISO-8859-1 when reading and writing .properties files by default and this encoding can be overridden with a config option. So when people have found a way to work around the encoding in their code (like creating their own {{Reader}}), they can configure the plugin to use ${project.build.sourceEncoding}. was (Author: digulla): My preferred solution is: The plugin uses ISO-8859-1 when reading and writing .properties files by default and this encoding can be overridden with a config option. So when people have found a way to work around the encoding in their code (like creating their own {{Reader}}), they can configure the plugin to use ${{{project.build.sourceEncoding}}}. > ISO8859-1 properties files get changed into UTF-8 when filtered > --- > > Key: MRESOURCES-171 > URL: https://issues.apache.org/jira/browse/MRESOURCES-171 > Project: Maven Resources Plugin > Issue Type: Bug > Components: filtering >Reporter: Alex Collins >Priority: Minor > Attachments: filtering-bug.zip > > > Create: > src/main/resources/test.properties > And add a ISO8859-1 character that is not ASCII or UTF-8, do not use \u > formatting. > When adding this line: > src/main/resourcestrue > Expected: > ISO8859-1 encoded file in jar. > Actual: > UTF-8 encoded file in jar. > --- > If there are any property files (which can only be ISO8859-1) they appear to > be converted into UTF-8 in the jar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered
[ https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765910#comment-16765910 ] Dennis Lundberg edited comment on MRESOURCES-171 at 2/13/19 9:47 AM: - Hi, We stumbled across this issue today for one of our projects that has {{project.build.sourceEncoding=UTF-8}} and includes properties files that use ISO-8859-1 encoding. The properties files are filtered. We have both regular properties files and properties files that are used as ResourceBundles. There are also non-properties files in the resources directory that needs to be filtered. Here is a proposal on how such a situation can be handled. {code:xml} src/main/resources true **/*.properties org.apache.maven.plugins maven-resources-plugin resources-properties resources ISO-8859-1 src/main/resources true **/*.properties {code} The / part takes care of all the resources, except properties files. They will be filtered using whatever encoding is specified by project.build.sourceEncoding. The execution with id=resources-properties will filter all the properties files using ISO-8859-1 encoding, but will not touch any other resource file. was (Author: denn...@apache.org): Hi, We stumbled across this issue today for one of our projects that has {{project.build.sourceEncoding=UTF-8}} and includes properties files that use ISO-8859-1 encoding. The properties files are filtered. We have both regular properties files and properties files that are used as ResourceBundles. There are also non-properties files in the resources directory that needs to be filtered. Here is a proposal on how such a situation can be handled. {code:xml} org.apache.maven.plugins maven-resources-plugin default-resources resources UTF-8 src/main/resources true **/*.properties resources-properties resources ISO-8859-1 src/main/resources true **/*.properties {code} > ISO8859-1 properties files get changed into UTF-8 when filtered > --- > > Key: MRESOURCES-171 > URL: https://issues.apache.org/jira/browse/MRESOURCES-171 > Project: Maven Resources Plugin > Issue Type: Bug > Components: filtering >Reporter: Alex Collins >Priority: Minor > Attachments: filtering-bug.zip > > > Create: > src/main/resources/test.properties > And add a ISO8859-1 character that is not ASCII or UTF-8, do not use \u > formatting. > When adding this line: > src/main/resourcestrue > Expected: > ISO8859-1 encoded file in jar. > Actual: > UTF-8 encoded file in jar. > --- > If there are any property files (which can only be ISO8859-1) they appear to > be converted into UTF-8 in the jar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered
[ https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765910#comment-16765910 ] Dennis Lundberg edited comment on MRESOURCES-171 at 2/12/19 1:35 PM: - Hi, We stumbled across this issue today for one of our projects that has {{project.build.sourceEncoding=UTF-8}} and includes properties files that use ISO-8859-1 encoding. The properties files are filtered. We have both regular properties files and properties files that are used as ResourceBundles. There are also non-properties files in the resources directory that needs to be filtered. Here is a proposal on how such a situation can be handled. {code:xml} org.apache.maven.plugins maven-resources-plugin default-resources resources UTF-8 src/main/resources true **/*.properties resources-properties resources ISO-8859-1 src/main/resources true **/*.properties {code} was (Author: denn...@apache.org): Hi, We stumbled across this issue today for one of our projects that has {{project.build.sourceEncoding=UTF-8}} and includes properties files that use ISO-8859-1 encoding. The properties files are filtered. We have both regular properties files and properties files that are used as ResourceBundles. There are also non-properties files in the resources directory that needs to be filtered. Here is a proposal on how such a situation can be handled. {code:xml} org.apache.maven.plugins maven-resources-plugin resources-non-properties resources UTF-8 src/main/resources true **/*.properties resources-properties resources ISO-8859-1 src/main/resources true **/*.properties {code} > ISO8859-1 properties files get changed into UTF-8 when filtered > --- > > Key: MRESOURCES-171 > URL: https://issues.apache.org/jira/browse/MRESOURCES-171 > Project: Maven Resources Plugin > Issue Type: Bug > Components: filtering >Reporter: Alex Collins >Priority: Minor > Attachments: filtering-bug.zip > > > Create: > src/main/resources/test.properties > And add a ISO8859-1 character that is not ASCII or UTF-8, do not use \u > formatting. > When adding this line: > src/main/resourcestrue > Expected: > ISO8859-1 encoded file in jar. > Actual: > UTF-8 encoded file in jar. > --- > If there are any property files (which can only be ISO8859-1) they appear to > be converted into UTF-8 in the jar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered
[ https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076468#comment-15076468 ] Aaron Digulla edited comment on MRESOURCES-171 at 1/2/16 9:19 AM: -- We have all been in your place and thought "this can't be, can it? How is that supposed to work?" Well, it may be a bug or a feature but the fact is: The Java VM *always* uses ISO-8859-1 to read and write properties files. It's hard coded in Properties.java. There is no system property; it's a string constant. Unless you get Oracle and all the other JVMs to change the source code, that's the way it is. And all the IDEs and all the other tools and ... you get the idea. You want Chinese in a properties file? You need to use escape sequences. I don't like it either but no amount of dislike is going to change the facts. was (Author: digulla): We have all been in your place and thought "this can't be, can it? How is that supposed to work?" Well, it may be a bug or a feature but the fact is: The Java VM *always* uses ISO-8859-1 to read and write properties files. Unless you get Oracle and all the other JVMs to change the source code, that's the way it is. And all the IDEs and all the other tools and ... you get the idea. You want Chinese in a properties file? You need to use escape sequences. I don't like it either but no amount of dislike is going to change the facts. > ISO8859-1 properties files get changed into UTF-8 when filtered > --- > > Key: MRESOURCES-171 > URL: https://issues.apache.org/jira/browse/MRESOURCES-171 > Project: Maven Resources Plugin > Issue Type: Bug > Components: filtering >Reporter: Alex Collins >Priority: Minor > Attachments: filtering-bug.zip > > > Create: > src/main/resources/test.properties > And add a ISO8859-1 character that is not ASCII or UTF-8, do not use \u > formatting. > When adding this line: > src/main/resourcestrue > Expected: > ISO8859-1 encoded file in jar. > Actual: > UTF-8 encoded file in jar. > --- > If there are any property files (which can only be ISO8859-1) they appear to > be converted into UTF-8 in the jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MRESOURCES-171) ISO8859-1 properties files get changed into UTF-8 when filtered
[ https://issues.apache.org/jira/browse/MRESOURCES-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076375#comment-15076375 ] Aaron Digulla edited comment on MRESOURCES-171 at 1/1/16 9:15 PM: -- [~khmarbaise] You obviously didn't understand the bug. In a nutshell: UTF-8 encoding is *illegal* for properties files. GIF files have a defined layout, properties files have one. They *look* like text files but they are text files with ISO-8859-1 encoding *everywhere*. No exceptions, not even for Maven. If you don't like that or you think that is wrong, please get in contact with Oracle and have the file format changed. So please reopen the bug, it's real, serious, and causes data corruption. was (Author: digulla): [~khmarbaise] You obviously didn't understand the bug. In a nutshell: UTF-8 encoding is *illegal* for properties files. GIF files have a defined layout, properties files have one. They *look* like text files but they are text files with ISO-8859-1 encoding *everywhere*. No exceptions, not even for Maven. If you don't like that or you think that is wrong, please get in contact with Oracle and have the file format changed. So please reopen the bug, it's a real, serious, and causes data corruption. > ISO8859-1 properties files get changed into UTF-8 when filtered > --- > > Key: MRESOURCES-171 > URL: https://issues.apache.org/jira/browse/MRESOURCES-171 > Project: Maven Resources Plugin > Issue Type: Bug > Components: filtering >Reporter: Alex Collins >Priority: Minor > Attachments: filtering-bug.zip > > > Create: > src/main/resources/test.properties > And add a ISO8859-1 character that is not ASCII or UTF-8, do not use \u > formatting. > When adding this line: > src/main/resourcestrue > Expected: > ISO8859-1 encoded file in jar. > Actual: > UTF-8 encoded file in jar. > --- > If there are any property files (which can only be ISO8859-1) they appear to > be converted into UTF-8 in the jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)