[jira] [Commented] (COMPRESS-132) Add support for unix dump files
[ https://issues.apache.org/jira/browse/COMPRESS-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085020#comment-13085020 ] Stefan Bodewig commented on COMPRESS-132: - svn revision 1157769 contains a repackaged version of the main tree of your code. Things I've changed: * repackaged to live in org.apache.commons land * removed all @author tags and instead added you to the POM as contributor, hope this is OK with you (we don't do @author tags). Should this is a problem for you then I'll simply remove the code again. * merged POSIXArchiveEntry into DumpArchiveEntry for now * renamed getModTime to getLastModifiedDate as your class didn't implement that method (it was added in Compress 1.1) Missing for me in order to close this are tests - will add some once I have access to a machine that has dump installed - and initial documentation for the site. I'll take care of that as well. Add support for unix dump files --- Key: COMPRESS-132 URL: https://issues.apache.org/jira/browse/COMPRESS-132 Project: Commons Compress Issue Type: New Feature Components: Archivers Reporter: Bear Giles Priority: Minor Fix For: 1.3 Attachments: dump-20110722.zip, dump.zip, test-z.dump, test.dump I'm submitting a series of patches to the ext2/3/4 dump utility and noticed that the commons-compress library doesn't have an archiver for it. It's as old as tar and fills a similar niche but the later has become much more widely used. Dump includes support for sparse files, extended attributes, mac os finder, SELinux labels (I think), and more. Incremental dumps can capture that files have been deleted. I should have initial support for a decoder this weekend. I can read the directory entries and inode information (file permissions, etc.) but need a bit more work on extracting the content as an InputStream. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-621) BOBYQA is missing in optimization
[ https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dr. Dietmar Wolz updated MATH-621: -- Attachment: BOBYQAOptimizer0.4.zip No changes from the perl generated code beside the ones necessary to get INDEX_OFFSET=0 working. Introduced INDEX_OFFSET where possible but there were many other adaptions necessary (just compare the perl generated code with the attachment). Version 0.3 had some useful additional minor changes/refactorings missing here (see remarks below), but the main work for 0.3 was the index change, and this we have here again. Remarks: 1) The perl script has damaged the for loop intendation 2) n, npt and nptm should be global variables and not set separately in each method 3) System generated locals: Declare variables in the scope they are needed and not method-globally if not necessary 4) testDiagonalRosen() is a copy/paste leftover from CMAES, should be removed 5) We should shink about removing rescue as proposed by Mike Powell. BOBYQA is missing in optimization - Key: MATH-621 URL: https://issues.apache.org/jira/browse/MATH-621 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Dr. Dietmar Wolz Fix For: 3.0 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, bobyqaoptimizer0.4.zip, bobyqav0.3.zip Original Estimate: 8h Remaining Estimate: 8h During experiments with space flight trajectory optimizations I recently observed, that the direct optimization algorithm BOBYQA http://plato.asu.edu/ftp/other_software/bobyqa.zip from Mike Powell is significantly better than the simple Powell algorithm already in commons.math. It uses significantly lower function calls and is more reliable for high dimensional problems. You can replace CMA-ES in many more application cases by BOBYQA than by the simple Powell optimizer. I would like to contribute a Java port of the algorithm. I maintained the structure of the original FORTRAN code, so the code is fast but not very nice. License status: Michael Powell has sent the agreement via snail mail - it hasn't arrived yet. Progress: The attached patch relative to the trunk contains both the optimizer and the related unit tests - which are all green now. Performance: Performance difference (number of function evaluations) PowellOptimizer / BOBYQA for different test functions (taken from the unit test of BOBYQA, dimension=13 for most of the tests. Rosen = 9350 / 1283 MinusElli = 118 / 59 Elli = 223 / 58 ElliRotated = 8626 / 1379 Cigar = 353 / 60 TwoAxes = 223 / 66 CigTab = 362 / 60 Sphere = 223 / 58 Tablet = 223 / 58 DiffPow = 421 / 928 SsDiffPow = 614 / 219 Ackley = 757 / 97 Rastrigin = 340 / 64 The number for DiffPow should be dicussed with Michael Powell, I will send him the details. Open Problems: Some checkstyle violations because of the original Fortran source: - Original method comments were copied - doesn't follow javadoc standard - Multiple variable declarations in one line as in the original source - Problems related to goto conversions: gotos not convertible in loops were transated into a finite automata (switch statement) no default in switch fall through from previos case in switch which usually are bad style make no sense here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-621) BOBYQA is missing in optimization
[ https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085061#comment-13085061 ] Gilles commented on MATH-621: - Thanks for the work. However, if I change the INDEX_OFFSET constant (setting it back to 1), the tests fail. I see that you hard-coded the offset in most places instead of using INDEX_OFFSET. I still think that this place-holder would be useful to keep track of places where the index variables might have been set to fit with the Fortran 1-based counting... Don't you? {quote} The perl script has damaged the for loop intendation {quote} Sorry, I didn't see that. But that's easy to fix. I'll do it after the issue with INDEX_OFFSET is settled. {quote} n, npt and nptm should be global variables and not set separately in each method {quote} Yes, I agree. But there are probably many other variables for which this is true (zmat, bmat, etc). {quote} System generated locals: Declare variables in the scope they are needed [...] {quote} Agreed, of course. I had started to do that mainly with d__1; then there are many cases where the same variable was reused whereas we would prefer to create yet another one with a more explicit name. {quote} testDiagonalRosen() is a copy/paste leftover from CMAES, should be removed {quote} OK, I'll do it in the next commit. {quote} We should shink about removing rescue as proposed by Mike Powell. {quote} I'm all for anything that leads to removing unnecessary lines of code :) If you are indeed confident that, in most cases, the added complexity is not worth it, I'll just delete it. BOBYQA is missing in optimization - Key: MATH-621 URL: https://issues.apache.org/jira/browse/MATH-621 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Dr. Dietmar Wolz Fix For: 3.0 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, bobyqaoptimizer0.4.zip, bobyqav0.3.zip Original Estimate: 8h Remaining Estimate: 8h During experiments with space flight trajectory optimizations I recently observed, that the direct optimization algorithm BOBYQA http://plato.asu.edu/ftp/other_software/bobyqa.zip from Mike Powell is significantly better than the simple Powell algorithm already in commons.math. It uses significantly lower function calls and is more reliable for high dimensional problems. You can replace CMA-ES in many more application cases by BOBYQA than by the simple Powell optimizer. I would like to contribute a Java port of the algorithm. I maintained the structure of the original FORTRAN code, so the code is fast but not very nice. License status: Michael Powell has sent the agreement via snail mail - it hasn't arrived yet. Progress: The attached patch relative to the trunk contains both the optimizer and the related unit tests - which are all green now. Performance: Performance difference (number of function evaluations) PowellOptimizer / BOBYQA for different test functions (taken from the unit test of BOBYQA, dimension=13 for most of the tests. Rosen = 9350 / 1283 MinusElli = 118 / 59 Elli = 223 / 58 ElliRotated = 8626 / 1379 Cigar = 353 / 60 TwoAxes = 223 / 66 CigTab = 362 / 60 Sphere = 223 / 58 Tablet = 223 / 58 DiffPow = 421 / 928 SsDiffPow = 614 / 219 Ackley = 757 / 97 Rastrigin = 340 / 64 The number for DiffPow should be dicussed with Michael Powell, I will send him the details. Open Problems: Some checkstyle violations because of the original Fortran source: - Original method comments were copied - doesn't follow javadoc standard - Multiple variable declarations in one line as in the original source - Problems related to goto conversions: gotos not convertible in loops were transated into a finite automata (switch statement) no default in switch fall through from previos case in switch which usually are bad style make no sense here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-621) BOBYQA is missing in optimization
[ https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085074#comment-13085074 ] Dr. Dietmar Wolz commented on MATH-621: --- {quote} I see that you hard-coded the offset in most places instead of using INDEX_OFFSET. I still think that this place-holder would be useful to keep track of places where the index variables might have been set to fit with the Fortran 1-based counting... Don't you? I am not convinced yet. I thought INDEX_OFFSET as a tool to support the conversion. If you don't use INDEX_OFFSET in the for loops (for int i = INDEX_OFFSET ...) I don't see why to introduce it artificially in other places. The final aim should be to get rid of the Fortran-Arrays/Matrices and have 0-based access. I don't see it essential to maintain INDEX_OFFSET as a kind of back reference to the old Fortran code in the future. We have the unit tests as regression test. Just try to convert one method - lets say prelim - the way you want to have it. The working 0-based version 0.4 should make this easy. Then lets have a look at it. I suspect it to become rather ugly using INDEX_OFFSET in all places. But then we also should convert the for loops as (for int i = INDEX_OFFSET ...) so that the code runs again with INDEX_OFFSET=1. If you then really think it is better this way, I will help to convert the other methods. BOBYQA is missing in optimization - Key: MATH-621 URL: https://issues.apache.org/jira/browse/MATH-621 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Dr. Dietmar Wolz Fix For: 3.0 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, bobyqaoptimizer0.4.zip, bobyqav0.3.zip Original Estimate: 8h Remaining Estimate: 8h During experiments with space flight trajectory optimizations I recently observed, that the direct optimization algorithm BOBYQA http://plato.asu.edu/ftp/other_software/bobyqa.zip from Mike Powell is significantly better than the simple Powell algorithm already in commons.math. It uses significantly lower function calls and is more reliable for high dimensional problems. You can replace CMA-ES in many more application cases by BOBYQA than by the simple Powell optimizer. I would like to contribute a Java port of the algorithm. I maintained the structure of the original FORTRAN code, so the code is fast but not very nice. License status: Michael Powell has sent the agreement via snail mail - it hasn't arrived yet. Progress: The attached patch relative to the trunk contains both the optimizer and the related unit tests - which are all green now. Performance: Performance difference (number of function evaluations) PowellOptimizer / BOBYQA for different test functions (taken from the unit test of BOBYQA, dimension=13 for most of the tests. Rosen = 9350 / 1283 MinusElli = 118 / 59 Elli = 223 / 58 ElliRotated = 8626 / 1379 Cigar = 353 / 60 TwoAxes = 223 / 66 CigTab = 362 / 60 Sphere = 223 / 58 Tablet = 223 / 58 DiffPow = 421 / 928 SsDiffPow = 614 / 219 Ackley = 757 / 97 Rastrigin = 340 / 64 The number for DiffPow should be dicussed with Michael Powell, I will send him the details. Open Problems: Some checkstyle violations because of the original Fortran source: - Original method comments were copied - doesn't follow javadoc standard - Multiple variable declarations in one line as in the original source - Problems related to goto conversions: gotos not convertible in loops were transated into a finite automata (switch statement) no default in switch fall through from previos case in switch which usually are bad style make no sense here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085104#comment-13085104 ] Gary D. Gregory commented on CODEC-127: --- Sebb: I get errors when I try your perl script on Windows with the latest perl (64 bit) from ActiveState. Rather than use this space to figure out why, can you please run it again and check if we are done with this ticket? Thank you, Gary Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085110#comment-13085110 ] Sebb commented on CODEC-127: What error do you get? Just curious. I now get: {code} commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137 {Meyer, M├╝ller}, commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143 {ganz, G├ñnse}, commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); commons-codec-generics/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93 String[] names = { ├ícz, ├ítz, Ign├ícz, Ign├ítz, Ign├íc }; commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47 { Nu├▒ez, spanish, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49 { ─îapek, czech, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52 { K├╝├º├╝k, turkish, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55 { Ceau┼ƒescu, romanian, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57 { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58 { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59 { ÎøÎö΃, hebrew, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60 { ├ícz, any, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61 { ├ítz, any, EXACT } }); {code} and {code} commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137 {Meyer, M├╝ller}, commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143 {ganz, G├ñnse}, commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1232 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); commons-codec/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93 String[] names = { ├ícz, ├ítz, Ign├ícz, Ign├ítz, Ign├íc }; commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47 { Nu├▒ez, spanish, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49 { ─îapek, czech, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52 { K├╝├º├╝k, turkish, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55 { Ceau┼ƒescu, romanian, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57 { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58 { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59 { ÎøÎö΃, hebrew, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60 { ├ícz, any, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61 { ├ítz, any, EXACT } }); {code} This was using an updated version of the script that uses File::Find to process directory traversal better. (Some lines shortened above by manually removing leading spaces) I think all the actual errors have now been
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085115#comment-13085115 ] Gary D. Gregory commented on CODEC-127: --- That sounds good. Today, the code is not editable/maintainable. There does not seem to be anything I can do in Eclipse to fix this just for viewing the chars correctly. If the comments are left mangled, then they are not maintainable. If you change the code, then the comment should match. So I would not leave the comments mangled. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085116#comment-13085116 ] Gary D. Gregory commented on CODEC-127: --- If I run the command as is, I get: {quote} Can't open perl script ne: No such file or directory {quote} Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Description: Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) was: Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085128#comment-13085128 ] Sebb commented on CODEC-127: If you change Eclipse to set the container / resource / text file encoding to UTF-8 (since that is what the POM says) the files should display correctly assuming they really are UTF-8. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085134#comment-13085134 ] Gary D. Gregory commented on CODEC-127: --- All better with the test source folder set to UTF-8, which I thought I had done, but obviously not. I am now a lot less worried about maintenance because the files are editable given the right editor settings. I am inclined to leave things as is. Perhaps each file need a prominent Javadoc about using UTF-8 in editors. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085135#comment-13085135 ] Sebb commented on CODEC-127: See my fix to ColognePhoneticTest in trunk. That now shows native comments for all unicode escapes. Two of the otherwise lowercase names were previously converted to the Unicode for upper case umlauts; I wonder if that was a mistake? Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085137#comment-13085137 ] Gary D. Gregory commented on CODEC-127: --- If I run: {quote} perl -n -e $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {quote} I get: {quote} Can't open */*.java: Invalid argument. {quote} Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085137#comment-13085137 ] Gary D. Gregory edited comment on CODEC-127 at 8/15/11 3:51 PM: If I run: {noformat} perl -n -e $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {noformat} I get: {noformat} Can't open */*.java: Invalid argument. {noformat} was (Author: garydgregory): If I run: {quote} perl -n -e $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {quote} I get: {quote} Can't open */*.java: Invalid argument. {quote} Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085139#comment-13085139 ] Gary D. Gregory commented on CODEC-127: --- WRT: {noformat} Author: sebb Date: Mon Aug 15 15:47:42 2011 New Revision: 1157892 URL: http://svn.apache.org/viewvc?rev=1157892view=rev Log: CODEC-127 Convert to use Unicode in strings, but add comments in native encoding (utf-8) {noformat} I am having second thoughts here. If you cannot edit UTF-8, you cannot edit and maintain the files because if you change the Unicode escape in the code, you must change the comment to match. So now, I am favoring leaving the code as it was before... Thoughts? Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (POOL-99) Test for idle time exceeded in borrowObject
[ https://issues.apache.org/jira/browse/POOL-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085143#comment-13085143 ] Rob Eamon commented on POOL-99: --- In some cases for object pools, when the object idle time exceeds a threshold is it no longer a valid/usable object (e.g. a DB connection). Pool clients need to be able to determine if an object has been idle for more than X seconds so that such objects will not be used (they are no longer valid and will cause exceptions to be thrown). Either the pool itself should enforce it via settings or provide the information necessary for the pool client to do it in testOnBorrow. Test for idle time exceeded in borrowObject Key: POOL-99 URL: https://issues.apache.org/jira/browse/POOL-99 Project: Commons Pool Issue Type: Improvement Affects Versions: 1.3 Reporter: Rob Eamon Priority: Minor Fix For: 2.0 For GenericObjectPool, the evictor thread performs a calculation to determine if an idle object as expired. If it has, the object is destroyed. Would like borrowObject to perform the same test and destroy behavior. I explored using the testOnBorrow facility but the time that the object went idle is not available. Only the pool has access to the ObjectTimestampPair object that is used to record the time that the object was placed in the pool. I explored placing a timestamp in the pooled object and can do that but it would seem better if the pool managed that test itself. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085145#comment-13085145 ] Sebb commented on CODEC-127: Sorry, forgot I was using a local module which handles DOS wildcards, see http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion Either pass each file in separately, or create Wild.pm and use: {code} perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} Wild.pm only works for one level of directories. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (POOL-99) Test for idle time exceeded in borrowObject
[ https://issues.apache.org/jira/browse/POOL-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085147#comment-13085147 ] Mark Thomas commented on POOL-99: - In that scenario, simply execute a validation query (which is good practise anyway for DB connections which can fail for all sorts of reasons). Test for idle time exceeded in borrowObject Key: POOL-99 URL: https://issues.apache.org/jira/browse/POOL-99 Project: Commons Pool Issue Type: Improvement Affects Versions: 1.3 Reporter: Rob Eamon Priority: Minor Fix For: 2.0 For GenericObjectPool, the evictor thread performs a calculation to determine if an idle object as expired. If it has, the object is destroyed. Would like borrowObject to perform the same test and destroy behavior. I explored using the testOnBorrow facility but the time that the object went idle is not available. Only the pool has access to the ObjectTimestampPair object that is used to record the time that the object was placed in the pool. I explored placing a timestamp in the pooled object and can do that but it would seem better if the pool managed that test itself. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085149#comment-13085149 ] Sebb commented on CODEC-127: It's not that one cannot edit UTF-8; the problem is that it is easy to mangle non-ASCII characters by mistake. The safest is to only use ASCII, i.e. Unicode escapes, which are valid in both UTF-8 and ISO-8859-1 and all likely default encodings. However, they are difficult to read, hence the comments on the lines. If the comments get mangled, it will be obvious, because they won't look right; and it's relatively easy to fix them from the Unicode. I don't think it's an option to use native characters in the non-comment code, because we already know they can get corrupted, and the corruption won't necessarily cause errors. I don't see the harm in translating the code into commments; after all the translation can be done again. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085153#comment-13085153 ] Gary D. Gregory commented on CODEC-127: --- Roger that. I'm sold then. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions -- Key: CHAIN-53 URL: https://issues.apache.org/jira/browse/CHAIN-53 Project: Commons Chain Issue Type: Improvement Reporter: Elijah Zupancic As posted in the mailing list, I've done this work outside of an offical branch. Here is the source: http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz And here is a diff: http://elijah.zupancic.name/projects/uber-diff In this patch: * Global upgrade to the JDK 1.5 * Added @Override annotations * Upgraded to the Servlet 2.5 API * Upgraded to the Faces 2.1 API * Upgraded to the Portlet 2.0 API * Upgraded the Maven Parent POM version * Added generics support to Command so that Command's API looks like: public interface CommandT extends Context { ... boolean execute(T context) throws Exception; } I'm very much new to the ASF and I was advised to file a bug in order to get the process started for these changes to be integrated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085156#comment-13085156 ] Gary D. Gregory commented on CODEC-127: --- Perl: I did all that and I get: {noformat} C:\svn\org\apache\commons\trunks-proper\codecperl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java syntax error at -e line 1, near *. Execution of -e aborted due to compilation errors. {noformat} I also have: PERL5OPT=-MWild in my environment. Gary Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (POOL-99) Test for idle time exceeded in borrowObject
[ https://issues.apache.org/jira/browse/POOL-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085158#comment-13085158 ] Rob Eamon commented on POOL-99: --- The DB case was just an example. You're right that the testOnBorrow could do a simple validation query. But why do a validation query when one can know up front that the pool object is stale? IMO, there is no reason for the pool to not at least provide the information for when the object went idle so that the pool client can determine for itself whether or not the object is valid. The pool client developer can make the determination about what's expensive and what isn't. I understand the view that the idle notion of the pool is intended to avoid holding on to objects that are unlikely to be used, or at least not used for considerable time. But unlikely to be used is awfully close to shouldn't be used. Given that the test is the same, why not leverage the idle time facilities? Test for idle time exceeded in borrowObject Key: POOL-99 URL: https://issues.apache.org/jira/browse/POOL-99 Project: Commons Pool Issue Type: Improvement Affects Versions: 1.3 Reporter: Rob Eamon Priority: Minor Fix For: 2.0 For GenericObjectPool, the evictor thread performs a calculation to determine if an idle object as expired. If it has, the object is destroyed. Would like borrowObject to perform the same test and destroy behavior. I explored using the testOnBorrow facility but the time that the object went idle is not available. Only the pool has access to the ObjectTimestampPair object that is used to record the time that the object was placed in the pool. I explored placing a timestamp in the pooled object and can do that but it would seem better if the pool managed that test itself. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085145#comment-13085145 ] Sebb edited comment on CODEC-127 at 8/15/11 4:55 PM: - Sorry, forgot I was using a local module which handles DOS wildcards, see http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion Either pass each file in separately, or create Wild.pm and use: {code} perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} Wild.pm only works for one level of directories. was (Author: s...@apache.org): Sorry, forgot I was using a local module which handles DOS wildcards, see http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion Either pass each file in separately, or create Wild.pm and use: {code} perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} Wild.pm only works for one level of directories. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085165#comment-13085165 ] Sebb commented on CODEC-127: Sorry, closing was in the wrong place; it should have been before the file name params Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
[ https://issues.apache.org/jira/browse/CHAIN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085184#comment-13085184 ] Matt Benson commented on CHAIN-53: -- Hello again Elijah, I have looked over the diff; here are some comments: * diffs should be attached/uploaded in JIRA, with the grant/feather radio button checked indicating your intent that the patch be licensed to the ASF (I know you sent the ICLA, but a. it wouldn't have been processed yet, and b. just humor us ;) ) * I don't see anything in the Faces-related changes to warrant upgrading to JSF 2.x. MyFaces in particular makes every attempt to continue to support JSF 1.x versions, so in the spirit of good inter-ASF cooperation, we should probably just leave the API levels of the JSF dependency wherever they stood previously. * At Commons we often repackage components when their APIs change incompatibly. The changes you have submitted are overwhelmingly backward-compatible once type erasure has been taken into account. What I particularly notice as being backward-incompatible are the {{Map}} implementations. Since most of these have gone from raw {{Map}} to {{MapString, ?}} their {{put()}} methods now have different signatures. In all cases except for {{oac.chain.web.servlet.ServletApplicationScopeMap}} these keys are required to be {{String}} instances at runtime anyway, so there is quite a minimal chance that code currently using these wouldn't recompile against these binaries. In the last case, {{null}} keys are rejected and other objects are converted to {{String}} if necessary. Once again, it seems rather unlikely that existing code would be utilizing this conversion code path. The {{Map}} concerns are the only potential point of contention I see with regard to backward compatibility. It would seem to me that [chain] is likely to sit rather high in the architecture of a given application, with little chance of multiple consumers competing at runtime. For this reason my personal opinion is that the incompatibilities introduced in the process of generifying the provided {{Map}} implementations are small enough to consider the component backward-compatible _enough_ and accept this patch directly onto [chain]'s trunk. I point the situation out here, however, in case other members of the community, particularly those with actual _experience_ with [chain], have conflicting opinions. Thanks for your interest! Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions -- Key: CHAIN-53 URL: https://issues.apache.org/jira/browse/CHAIN-53 Project: Commons Chain Issue Type: Improvement Reporter: Elijah Zupancic Labels: newbie, patch As posted in the mailing list, I've done this work outside of an offical branch. Here is the source: http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz And here is a diff: http://elijah.zupancic.name/projects/uber-diff In this patch: * Global upgrade to the JDK 1.5 * Added @Override annotations * Upgraded to the Servlet 2.5 API * Upgraded to the Faces 2.1 API * Upgraded to the Portlet 2.0 API * Upgraded the Maven Parent POM version * Added generics support to Command so that Command's API looks like: public interface CommandT extends Context { ... boolean execute(T context) throws Exception; } I'm very much new to the ASF and I was advised to file a bug in order to get the process started for these changes to be integrated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
[ https://issues.apache.org/jira/browse/CHAIN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085189#comment-13085189 ] Elijah Zupancic commented on CHAIN-53: -- Thanks for the comments Matt. * I will revert back to the MyFaces 1.0 API. * I could add put methods that support Object, Object and then cast them to the K, V types. * I will upload the diff to the bug once I have reverted the MyFaces changes. * Do we want to update the version to 2.0? It seems like it would make sense because we are supporting a newer JDK. Or since it is backwards-compatible would just doing a minor upgrade would be sufficient? Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions -- Key: CHAIN-53 URL: https://issues.apache.org/jira/browse/CHAIN-53 Project: Commons Chain Issue Type: Improvement Reporter: Elijah Zupancic Labels: newbie, patch As posted in the mailing list, I've done this work outside of an offical branch. Here is the source: http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz And here is a diff: http://elijah.zupancic.name/projects/uber-diff In this patch: * Global upgrade to the JDK 1.5 * Added @Override annotations * Upgraded to the Servlet 2.5 API * Upgraded to the Faces 2.1 API * Upgraded to the Portlet 2.0 API * Upgraded the Maven Parent POM version * Added generics support to Command so that Command's API looks like: public interface CommandT extends Context { ... boolean execute(T context) throws Exception; } I'm very much new to the ASF and I was advised to file a bug in order to get the process started for these changes to be integrated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
[ https://issues.apache.org/jira/browse/CHAIN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085234#comment-13085234 ] Matt Benson commented on CHAIN-53: -- I seem to recall that simply the upgrade to generics and hence, required Java version, justifies a major version bump. Not a big deal just at the moment, however. Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions -- Key: CHAIN-53 URL: https://issues.apache.org/jira/browse/CHAIN-53 Project: Commons Chain Issue Type: Improvement Reporter: Elijah Zupancic Labels: newbie, patch As posted in the mailing list, I've done this work outside of an offical branch. Here is the source: http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz And here is a diff: http://elijah.zupancic.name/projects/uber-diff In this patch: * Global upgrade to the JDK 1.5 * Added @Override annotations * Upgraded to the Servlet 2.5 API * Upgraded to the Faces 2.1 API * Upgraded to the Portlet 2.0 API * Upgraded the Maven Parent POM version * Added generics support to Command so that Command's API looks like: public interface CommandT extends Context { ... boolean execute(T context) throws Exception; } I'm very much new to the ASF and I was advised to file a bug in order to get the process started for these changes to be integrated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
[ https://issues.apache.org/jira/browse/CHAIN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085237#comment-13085237 ] Sebb commented on CHAIN-53: --- Major version bump is not required when changing minimum Java version (though would be sensible if making a major jump) http://commons.apache.org/releases/versioning.html Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions -- Key: CHAIN-53 URL: https://issues.apache.org/jira/browse/CHAIN-53 Project: Commons Chain Issue Type: Improvement Reporter: Elijah Zupancic Labels: newbie, patch As posted in the mailing list, I've done this work outside of an offical branch. Here is the source: http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz And here is a diff: http://elijah.zupancic.name/projects/uber-diff In this patch: * Global upgrade to the JDK 1.5 * Added @Override annotations * Upgraded to the Servlet 2.5 API * Upgraded to the Faces 2.1 API * Upgraded to the Portlet 2.0 API * Upgraded the Maven Parent POM version * Added generics support to Command so that Command's API looks like: public interface CommandT extends Context { ... boolean execute(T context) throws Exception; } I'm very much new to the ASF and I was advised to file a bug in order to get the process started for these changes to be integrated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085242#comment-13085242 ] Sebb commented on CODEC-127: Actually, DoubleMetaphoneTest is still corrupt; fixing now. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-621) BOBYQA is missing in optimization
[ https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085243#comment-13085243 ] Gilles commented on MATH-621: - OK. Keeping INDEX_OFFSET might be more work than really useful. I'll remove it also. BOBYQA is missing in optimization - Key: MATH-621 URL: https://issues.apache.org/jira/browse/MATH-621 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Dr. Dietmar Wolz Fix For: 3.0 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, bobyqaoptimizer0.4.zip, bobyqav0.3.zip Original Estimate: 8h Remaining Estimate: 8h During experiments with space flight trajectory optimizations I recently observed, that the direct optimization algorithm BOBYQA http://plato.asu.edu/ftp/other_software/bobyqa.zip from Mike Powell is significantly better than the simple Powell algorithm already in commons.math. It uses significantly lower function calls and is more reliable for high dimensional problems. You can replace CMA-ES in many more application cases by BOBYQA than by the simple Powell optimizer. I would like to contribute a Java port of the algorithm. I maintained the structure of the original FORTRAN code, so the code is fast but not very nice. License status: Michael Powell has sent the agreement via snail mail - it hasn't arrived yet. Progress: The attached patch relative to the trunk contains both the optimizer and the related unit tests - which are all green now. Performance: Performance difference (number of function evaluations) PowellOptimizer / BOBYQA for different test functions (taken from the unit test of BOBYQA, dimension=13 for most of the tests. Rosen = 9350 / 1283 MinusElli = 118 / 59 Elli = 223 / 58 ElliRotated = 8626 / 1379 Cigar = 353 / 60 TwoAxes = 223 / 66 CigTab = 362 / 60 Sphere = 223 / 58 Tablet = 223 / 58 DiffPow = 421 / 928 SsDiffPow = 614 / 219 Ackley = 757 / 97 Rastrigin = 340 / 64 The number for DiffPow should be dicussed with Michael Powell, I will send him the details. Open Problems: Some checkstyle violations because of the original Fortran source: - Original method comments were copied - doesn't follow javadoc standard - Multiple variable declarations in one line as in the original source - Problems related to goto conversions: gotos not convertible in loops were transated into a finite automata (switch statement) no default in switch fall through from previos case in switch which usually are bad style make no sense here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085261#comment-13085261 ] Gary D. Gregory commented on CODEC-127: --- Arg: {noformat} C:\svn\org\apache\commons\trunks-proper\codecperl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java Can't open */*.java: Invalid argument. {noformat} Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085266#comment-13085266 ] Sebb commented on CODEC-127: Tried it here; works fine. Probably an error in your Wild.pm, because I see the same if I omit the -MWild option. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085269#comment-13085269 ] Gary D. Gregory commented on CODEC-127: --- Can you post your .pm here or email to ggregory at apache dot org? Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Sebb: I get errors when I try your perl script on Windows with the latest perl (64 bit) from ActiveState. Rather than use this space to figure out why, can you please run it again and check if we are done with this ticket? Thank you, Gary) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Sorry, closing was in the wrong place; it should have been before the file name params) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Description: Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) was: Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: If I run the command as is, I get: {quote} Can't open perl script ne: No such file or directory {quote}) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Can you post your .pm here or email to ggregory at apache dot org? ) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085110#comment-13085110 ] Sebb edited comment on CODEC-127 at 8/15/11 8:07 PM: - I now get: {code} commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137 {Meyer, M├╝ller}, commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143 {ganz, G├ñnse}, commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); commons-codec-generics/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93 String[] names = { ├ícz, ├ítz, Ign├ícz, Ign├ítz, Ign├íc }; commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47 { Nu├▒ez, spanish, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49 { ─îapek, czech, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52 { K├╝├º├╝k, turkish, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55 { Ceau┼ƒescu, romanian, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57 { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58 { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59 { ÎøÎö΃, hebrew, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60 { ├ícz, any, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61 { ├ítz, any, EXACT } }); {code} and {code} commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137 {Meyer, M├╝ller}, commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143 {ganz, G├ñnse}, commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1232 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); commons-codec/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93 String[] names = { ├ícz, ├ítz, Ign├ícz, Ign├ítz, Ign├íc }; commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47 { Nu├▒ez, spanish, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49 { ─îapek, czech, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52 { K├╝├º├╝k, turkish, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55 { Ceau┼ƒescu, romanian, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57 { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58 { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59 { ÎøÎö΃, hebrew, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60 { ├ícz, any, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61 { ├ítz, any, EXACT } }); {code} This was using an updated version of the script that uses File::Find to process directory traversal better. (Some lines shortened above by manually removing leading spaces) I think all the actual errors have now
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Typo - missing hyphen for flags) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Tried it here; works fine. Probably an error in your Wild.pm, because I see the same if I omit the -MWild option.) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Perl: I did all that and I get: {noformat} C:\svn\org\apache\commons\trunks-proper\codecperl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java syntax error at -e line 1, near *. Execution of -e aborted due to compilation errors. {noformat} I also have: PERL5OPT=-MWild in my environment. Gary) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Arg: {noformat} C:\svn\org\apache\commons\trunks-proper\codecperl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java Can't open */*.java: Invalid argument. {noformat} ) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Sorry, forgot I was using a local module which handles DOS wildcards, see http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion Either pass each file in separately, or create Wild.pm and use: {code} perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} Wild.pm only works for one level of directories.) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: If I run: {noformat} perl -n -e $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {noformat} I get: {noformat} Can't open */*.java: Invalid argument. {noformat} ) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085301#comment-13085301 ] Sebb commented on CODEC-127: I think all the files are now fixed so that the code uses Unicode escapes; the only non-ASCII characters are now in comments. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-646) Unmodifiable views of RealVector
[ https://issues.apache.org/jira/browse/MATH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085303#comment-13085303 ] Sébastien Brisard commented on MATH-646: {quote} Rather than an issue of large source file, the issue is whether this class should part of the public API. Personally I think that it shouldn't {quote} I agree, that's the reason why I suggested we make this class private. No problem, I'll make it a nested, anonymous class within the {{unmodifiableRealVector()}} method. {quote} I'm suspicious that it is possible to call setIndex on the supposedly unmodifiable entry. Maybe that it is harmless? {quote} I have checked that calling {{setIndex}} is indeed harmless while iterating over the vector in question. However, in my view, this method sould not be visible. Thanks for your detailed review of the code. I'll have these errors corrected by the end of this week, if that's OK with you. Unmodifiable views of RealVector Key: MATH-646 URL: https://issues.apache.org/jira/browse/MATH-646 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Sébastien Brisard Labels: linear, vector Attachments: MATH-646.patch The issue has been discussed on the [mailing list|http://mail-archives.apache.org/mod_mbox/commons-dev/201108.mbox/CAGRH7HqxUb2y1HmFt9VJ-kxsXwipk_MdO0D=rnuazmgpnot...@mail.gmail.com]. Please find attached a proposal for a new class {{UnmodifiableRealVector}}. I chose not to nest it in {{AbstractRealVector}} because it would make the corresponding file huge. Therefore, {{UnmodifiableRealVector}} is {{final}}. Maybe you'd like it to be {{private}} as well? A static method is provided in {{AbstractRealVector}} to build an {{UnmodifiableRealVector}} from any {{RealVector}}. Tests are also provided. Since iterating through different implementations of {{RealVector}} is actually different, a test is provided for {{UnmodifiableRealVector}} built on {{ArrayRealVector}} and {{OpenMapRealVector}}. These tests both derive from the same abstract test class. Hope everything works fine. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CONFIGURATION-460) reloadStrategy does not work for files inside additional tag using DefaultConfigurationBuilder
[ https://issues.apache.org/jira/browse/CONFIGURATION-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oliver Heger resolved CONFIGURATION-460. Resolution: Fixed Fix Version/s: 1.7 A fix was applied in SVN revision 1157982. Thank you for reporting this. reloadStrategy does not work for files inside additional tag using DefaultConfigurationBuilder Key: CONFIGURATION-460 URL: https://issues.apache.org/jira/browse/CONFIGURATION-460 Project: Commons Configuration Issue Type: Bug Components: File reloading Affects Versions: 1.6 Environment: Linux x86_64 Reporter: Azfar Kazmi Assignee: Oliver Heger Fix For: 1.7 In the configuration file that DefaultConfigurationBuilder reads to build a CombinedConfiguration, it's possible to include configuration file either inside override or additional xml elements. Each such declaration, of a file, allows a realodStrategy to be specified (see example below). It appears that the reload occurs only for the files inside override and not for the ones inside additional. Example: configuration header result forceReloadCheck=true expressionEngine config-class=org.apache.commons.configuration.tree.xpath.XPathExpressionEngine/ /result /header override properties fileName=user.properties config-optional=true reloadingStrategy refreshDelay=100 config-class=org.apache.commons.configuration.reloading.FileChangedReloadingStrategy/ /properties /override additional properties fileName=application.properties reloadingStrategy refreshDelay=100 config-class=org.apache.commons.configuration.reloading.FileChangedReloadingStrategy/ /properties /additional /configuration In above example, both user.properties and application.properties are supposed to reload upon change. However, as tested by the following code, one user.properties gets reloaded: DefaultConfigurationBuilder dcb = new DefaultConfigurationBuilder(example.xml); Configuration conf = dcb.getConfiguration(); System.out.println(user: + conf.getBoolean(user)); System.out.println(application: + conf.getBoolean(application)); System.out.println(Change files and then press to continue...); System.in.read(); System.out.println(user: + conf.getBoolean(user)); System.out.println(application: + conf.getBoolean(application)); Output from above code: user: true application: true Change files and then press to continue... 0 [main] INFO org.apache.commons.configuration.PropertiesConfiguration - Reloading configuration. URL is file:snipped/user.properties user: false application: true -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-646) Unmodifiable views of RealVector
[ https://issues.apache.org/jira/browse/MATH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085329#comment-13085329 ] Gilles commented on MATH-646: - There must be one public (or with package access) class in each Java source file. But you can have additional ones (without access qualifier), not necessarily nested. Thus, in AbstractRealVector.java: {code} public class AbstractRealVector implements RealVector { // ... public static RealVector unmodifiableRealVector(RealVector v) { return new UnmodifiableRealVector(v); } } class UnmodifiableRealVector implements RealVector { // ... } {code} This makes for slightly less cluttered code. Unmodifiable views of RealVector Key: MATH-646 URL: https://issues.apache.org/jira/browse/MATH-646 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Sébastien Brisard Labels: linear, vector Attachments: MATH-646.patch The issue has been discussed on the [mailing list|http://mail-archives.apache.org/mod_mbox/commons-dev/201108.mbox/CAGRH7HqxUb2y1HmFt9VJ-kxsXwipk_MdO0D=rnuazmgpnot...@mail.gmail.com]. Please find attached a proposal for a new class {{UnmodifiableRealVector}}. I chose not to nest it in {{AbstractRealVector}} because it would make the corresponding file huge. Therefore, {{UnmodifiableRealVector}} is {{final}}. Maybe you'd like it to be {{private}} as well? A static method is provided in {{AbstractRealVector}} to build an {{UnmodifiableRealVector}} from any {{RealVector}}. Tests are also provided. Since iterating through different implementations of {{RealVector}} is actually different, a test is provided for {{UnmodifiableRealVector}} built on {{ArrayRealVector}} and {{OpenMapRealVector}}. These tests both derive from the same abstract test class. Hope everything works fine. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (JCI-67) Dubious use of mkdirs() return code
Dubious use of mkdirs() return code --- Key: JCI-67 URL: https://issues.apache.org/jira/browse/JCI-67 Project: Commons JCI Issue Type: Bug Reporter: Sebb Priority: Minor FileRestoreStore.java uses mkdirs() as follows: {code} final File parent = file.getParentFile(); if (!parent.exists()) { if (!parent.mkdirs()) { throw new IOException(could not create + parent); } } {code} Now mkdirs() returns true *only* if the method actually created the directories; it's theoretically possible for the directory to be created in the window between the exists() and mkdirs() invocations. Also, the initial exists() call is redundant, because that's what mkdirs() does anyway (in the RI implementation, at least). I suggest the following instead: {code} final File parent = file.getParentFile(); if (!parent.mkdirs() !parent.exists()) { throw new IOException(could not create + parent); } } {code} If mkdirs() returns false, the code then checks to see if the directory exists, so the throws clause will only be invoked if the parent really cannot be created. The same code also appears in AbstractTestCase and FilesystemAlterationMonitorTestCase. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-621) BOBYQA is missing in optimization
[ https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085332#comment-13085332 ] Gilles commented on MATH-621: - 1-based indexing issue solved in revision 1158015. BOBYQA is missing in optimization - Key: MATH-621 URL: https://issues.apache.org/jira/browse/MATH-621 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Dr. Dietmar Wolz Fix For: 3.0 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, bobyqaoptimizer0.4.zip, bobyqav0.3.zip Original Estimate: 8h Remaining Estimate: 8h During experiments with space flight trajectory optimizations I recently observed, that the direct optimization algorithm BOBYQA http://plato.asu.edu/ftp/other_software/bobyqa.zip from Mike Powell is significantly better than the simple Powell algorithm already in commons.math. It uses significantly lower function calls and is more reliable for high dimensional problems. You can replace CMA-ES in many more application cases by BOBYQA than by the simple Powell optimizer. I would like to contribute a Java port of the algorithm. I maintained the structure of the original FORTRAN code, so the code is fast but not very nice. License status: Michael Powell has sent the agreement via snail mail - it hasn't arrived yet. Progress: The attached patch relative to the trunk contains both the optimizer and the related unit tests - which are all green now. Performance: Performance difference (number of function evaluations) PowellOptimizer / BOBYQA for different test functions (taken from the unit test of BOBYQA, dimension=13 for most of the tests. Rosen = 9350 / 1283 MinusElli = 118 / 59 Elli = 223 / 58 ElliRotated = 8626 / 1379 Cigar = 353 / 60 TwoAxes = 223 / 66 CigTab = 362 / 60 Sphere = 223 / 58 Tablet = 223 / 58 DiffPow = 421 / 928 SsDiffPow = 614 / 219 Ackley = 757 / 97 Rastrigin = 340 / 64 The number for DiffPow should be dicussed with Michael Powell, I will send him the details. Open Problems: Some checkstyle violations because of the original Fortran source: - Original method comments were copied - doesn't follow javadoc standard - Multiple variable declarations in one line as in the original source - Problems related to goto conversions: gotos not convertible in loops were transated into a finite automata (switch statement) no default in switch fall through from previos case in switch which usually are bad style make no sense here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-621) BOBYQA is missing in optimization
[ https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085334#comment-13085334 ] Gilles commented on MATH-621: - Removed testDiagonalRosen unit test in revision 1158017. BOBYQA is missing in optimization - Key: MATH-621 URL: https://issues.apache.org/jira/browse/MATH-621 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Dr. Dietmar Wolz Fix For: 3.0 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, bobyqaoptimizer0.4.zip, bobyqav0.3.zip Original Estimate: 8h Remaining Estimate: 8h During experiments with space flight trajectory optimizations I recently observed, that the direct optimization algorithm BOBYQA http://plato.asu.edu/ftp/other_software/bobyqa.zip from Mike Powell is significantly better than the simple Powell algorithm already in commons.math. It uses significantly lower function calls and is more reliable for high dimensional problems. You can replace CMA-ES in many more application cases by BOBYQA than by the simple Powell optimizer. I would like to contribute a Java port of the algorithm. I maintained the structure of the original FORTRAN code, so the code is fast but not very nice. License status: Michael Powell has sent the agreement via snail mail - it hasn't arrived yet. Progress: The attached patch relative to the trunk contains both the optimizer and the related unit tests - which are all green now. Performance: Performance difference (number of function evaluations) PowellOptimizer / BOBYQA for different test functions (taken from the unit test of BOBYQA, dimension=13 for most of the tests. Rosen = 9350 / 1283 MinusElli = 118 / 59 Elli = 223 / 58 ElliRotated = 8626 / 1379 Cigar = 353 / 60 TwoAxes = 223 / 66 CigTab = 362 / 60 Sphere = 223 / 58 Tablet = 223 / 58 DiffPow = 421 / 928 SsDiffPow = 614 / 219 Ackley = 757 / 97 Rastrigin = 340 / 64 The number for DiffPow should be dicussed with Michael Powell, I will send him the details. Open Problems: Some checkstyle violations because of the original Fortran source: - Original method comments were copied - doesn't follow javadoc standard - Multiple variable declarations in one line as in the original source - Problems related to goto conversions: gotos not convertible in loops were transated into a finite automata (switch statement) no default in switch fall through from previos case in switch which usually are bad style make no sense here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JCI-67) Dubious use of mkdirs() return code
[ https://issues.apache.org/jira/browse/JCI-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085339#comment-13085339 ] Sebb commented on JCI-67: - Safer would be the following, as it checks the path is actually a directory: {code} final File parent = file.getParentFile(); if (!parent.mkdirs() !parent.isDirectory()) { throw new IOException(could not create + parent); } } {code} Dubious use of mkdirs() return code --- Key: JCI-67 URL: https://issues.apache.org/jira/browse/JCI-67 Project: Commons JCI Issue Type: Bug Reporter: Sebb Priority: Minor FileRestoreStore.java uses mkdirs() as follows: {code} final File parent = file.getParentFile(); if (!parent.exists()) { if (!parent.mkdirs()) { throw new IOException(could not create + parent); } } {code} Now mkdirs() returns true *only* if the method actually created the directories; it's theoretically possible for the directory to be created in the window between the exists() and mkdirs() invocations. Also, the initial exists() call is redundant, because that's what mkdirs() does anyway (in the RI implementation, at least). I suggest the following instead: {code} final File parent = file.getParentFile(); if (!parent.mkdirs() !parent.exists()) { throw new IOException(could not create + parent); } } {code} If mkdirs() returns false, the code then checks to see if the directory exists, so the throws clause will only be invoked if the parent really cannot be created. The same code also appears in AbstractTestCase and FilesystemAlterationMonitorTestCase. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-621) BOBYQA is missing in optimization
[ https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085340#comment-13085340 ] Gilles commented on MATH-621: - Commenting out rescue (line 671) makes the testRescue test fail, as expected. So, if I also remove the test, we are fine. However, do you know whether I can also remove the whole case 190 (lines 667-697) as well as any code that references that state (e.g. lines 791-796, 846-851, 2597-2599, etc.)? BOBYQA is missing in optimization - Key: MATH-621 URL: https://issues.apache.org/jira/browse/MATH-621 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Dr. Dietmar Wolz Fix For: 3.0 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, bobyqaoptimizer0.4.zip, bobyqav0.3.zip Original Estimate: 8h Remaining Estimate: 8h During experiments with space flight trajectory optimizations I recently observed, that the direct optimization algorithm BOBYQA http://plato.asu.edu/ftp/other_software/bobyqa.zip from Mike Powell is significantly better than the simple Powell algorithm already in commons.math. It uses significantly lower function calls and is more reliable for high dimensional problems. You can replace CMA-ES in many more application cases by BOBYQA than by the simple Powell optimizer. I would like to contribute a Java port of the algorithm. I maintained the structure of the original FORTRAN code, so the code is fast but not very nice. License status: Michael Powell has sent the agreement via snail mail - it hasn't arrived yet. Progress: The attached patch relative to the trunk contains both the optimizer and the related unit tests - which are all green now. Performance: Performance difference (number of function evaluations) PowellOptimizer / BOBYQA for different test functions (taken from the unit test of BOBYQA, dimension=13 for most of the tests. Rosen = 9350 / 1283 MinusElli = 118 / 59 Elli = 223 / 58 ElliRotated = 8626 / 1379 Cigar = 353 / 60 TwoAxes = 223 / 66 CigTab = 362 / 60 Sphere = 223 / 58 Tablet = 223 / 58 DiffPow = 421 / 928 SsDiffPow = 614 / 219 Ackley = 757 / 97 Rastrigin = 340 / 64 The number for DiffPow should be dicussed with Michael Powell, I will send him the details. Open Problems: Some checkstyle violations because of the original Fortran source: - Original method comments were copied - doesn't follow javadoc standard - Multiple variable declarations in one line as in the original source - Problems related to goto conversions: gotos not convertible in loops were transated into a finite automata (switch statement) no default in switch fall through from previos case in switch which usually are bad style make no sense here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (IO-280) Dubious use of mkdirs() return code
Dubious use of mkdirs() return code --- Key: IO-280 URL: https://issues.apache.org/jira/browse/IO-280 Project: Commons IO Issue Type: Bug Reporter: Sebb Priority: Minor FileUtils.openOutputStream() has the following code: {code} File parent = file.getParentFile(); if (parent != null parent.exists() == false) { if (parent.mkdirs() == false) { throw new IOException(File ' + file + ' could not be created); } } {code} Now mkdirs() returns true only if the method actually created the directories; it's theoretically possible for the directory to be created in the window between the exists() and mkdirs() invocations. [Indeed the class actually checks for this in the forceMkdir() method] It would be safer to use: {code} File parent = file.getParentFile(); if (parent != null !parent.mkdirs() !parent.isDirectory()) { throw new IOException(Directory ' + parent + ' could not be created); // note changed text } } {code} Similarly elsewhere in the class where mkdirs() is used. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COMPRESS-132) Add support for unix dump files
[ https://issues.apache.org/jira/browse/COMPRESS-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085525#comment-13085525 ] Stefan Bodewig commented on COMPRESS-132: - Later revisions have fixed some issues detected by findbugs, made some methods less public or fixed javadocs, so it has changed quite a bit. My initial attempt to run your testcase resulted in Java spinning in an infinite loop, I'll investigate this further. I tried to create a dump file on my Linux box - preferably one that has the same contents as src/test/resources/bla.* in Compress' trunk source tree - but failed so far. Cursory reading of the manual page is obviously not enough to make it work. Right now I don't know what to make of {noformat} stefan@birdy:~/cc$ sudo dump -v -f bla.dump test1.xml test2.xml DUMP: Date of this level 0 dump: Tue Aug 16 06:34:18 2011 DUMP: Dumping /dev/sda6 (/home (dir /stefan/cc/test1.xml)) to bla.dump DUMP: Excluding inode 8 (journal inode) from dump DUMP: Excluding inode 7 (resize inode) from dump DUMP: Label: none DUMP: Writing 10 Kilobyte records DUMP: mapping (Pass I) [regular files] /dev/sda6: File not found by ext2_lookup while translating .xml {noformat} Add support for unix dump files --- Key: COMPRESS-132 URL: https://issues.apache.org/jira/browse/COMPRESS-132 Project: Commons Compress Issue Type: New Feature Components: Archivers Reporter: Bear Giles Priority: Minor Fix For: 1.3 Attachments: dump-20110722.zip, dump.zip, test-z.dump, test.dump I'm submitting a series of patches to the ext2/3/4 dump utility and noticed that the commons-compress library doesn't have an archiver for it. It's as old as tar and fills a similar niche but the later has become much more widely used. Dump includes support for sparse files, extended attributes, mac os finder, SELinux labels (I think), and more. Incremental dumps can capture that files have been deleted. I should have initial support for a decoder this weekend. I can read the directory entries and inode information (file permissions, etc.) but need a bit more work on extracting the content as an InputStream. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (DAEMON-213) procun log rotation support
[ https://issues.apache.org/jira/browse/DAEMON-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085536#comment-13085536 ] viola.lu commented on DAEMON-213: - But jsvc https://issues.apache.org/jira/browse/DAEMON-95, it also supports log rotation by catching signal user1, can we try this way on windows procun? procun log rotation support --- Key: DAEMON-213 URL: https://issues.apache.org/jira/browse/DAEMON-213 Project: Commons Daemon Issue Type: Improvement Components: Procrun Affects Versions: 1.0.4, 1.0.5, 1.0.6 Environment: os: winxp Reporter: viola.lu Priority: Minor Fix For: Nightly Builds currently, procun doesn't support log rotation. Should add an option -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (DBUTILS-79) fillStatement doesn't complain when there are too few parameters
[ https://issues.apache.org/jira/browse/DBUTILS-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henri Yandell closed DBUTILS-79. Resolution: Fixed Resolved in r1158109 per your patch in DBUTILS-78. fillStatement doesn't complain when there are too few parameters Key: DBUTILS-79 URL: https://issues.apache.org/jira/browse/DBUTILS-79 Project: Commons DbUtils Issue Type: Bug Affects Versions: 1.3 Reporter: William R. Speirs Fix For: 1.4 Unless I'm reading the code incorrectly, it appears that the fillStatement function does not complain if you provide too few parameters. For example, if you supply an SQL statement like: select * from blah where ? = ?; but only provide a single parameter test, fillStatement returns without issue. However, only the first ? is actually set. Granted, this will almost always cause an exception to be thrown by the driver, but since there is already a check for too many parameters, why not check for too few as well? (FYI: I came across this bug, and a few others in my AsyncQueryRunner implementation, while re-writing the unit tests to use Mockito.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira